JP2009282824A

JP2009282824A - Emotion estimation system and program

Info

Publication number: JP2009282824A
Application number: JP2008135290A
Authority: JP
Inventors: Ryoko Hotta; 良子堀田
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2008-05-23
Filing date: 2008-05-23
Publication date: 2009-12-03
Anticipated expiration: 2028-05-23
Also published as: JP5083033B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an emotion estimation system and program efficiently and accurately estimating an emotion by generating an emotion model while taking into account the number of learning data corresponding to an emotion model corresponding to one emotion. <P>SOLUTION: A plurality of emotion models configured such that the features of input data consisting of image features extracted from image data obtained by imaging a user, and text features and prosodic features extracted from voice data input by the speaking of the user are generated for each emotion are generated starting with the one with the greatest number of corresponding learning data. A determination as to whether the learning data corresponding to each of the emotion models generated corresponds to the emotion model generated not to be included during generation of the other emotion models is made starting with the emotion model with the greatest number of learning data corresponding to the emotion corresponding to the emotion model, to estimate the user's emotion. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、感情推定装置及びプログラムに係り、特に、ユーザを撮像した画像データ、ユーザの発話による音声データ、及びユーザにより入力されたテキストデータの少なくとも１つを用いてユーザの感情を推定するための感情推定装置及びプログラムに関する。 The present invention relates to an emotion estimation apparatus and program, and in particular, to estimate a user's emotion using at least one of image data obtained by imaging the user, voice data generated by the user's speech, and text data input by the user. The present invention relates to an emotion estimation apparatus and program.

従来、ユーザからの入力情報に基づいてユーザの感情を推定することが行われている。 Conventionally, the user's emotion is estimated based on input information from the user.

特許文献１の対話処理装置は、ユーザから入力された音声信号から抽出した韻律情報、音声信号を音声認識した結果に含まれる語句の概念情報、ユーザの顔を撮像して得られた顔画像情報、及びユーザの脈拍等の生理情報を用いてユーザの感情を推定するものである。このうち、顔画像情報を用いた感情の推定は、予め喜んでいる状態、怒っている状態、及び悲しんでいる状態等の各感情における顔の画像を用いて学習を行うことにより得られたモデルと、顔画像情報の特徴量とをマッチングすることにより行うことが提案されている。 The dialogue processing apparatus of Patent Document 1 includes prosodic information extracted from a speech signal input from a user, concept information of a phrase included in a result of speech recognition of the speech signal, and face image information obtained by imaging a user's face. The user's emotion is estimated using physiological information such as the user's pulse. Of these, estimation of emotions using face image information is a model obtained by learning using facial images in each emotion such as a glad state, an angry state, and a sad state in advance. It has been proposed that this is performed by matching the feature amount of face image information.

また、特許文献２の感情推定装置では、ユーザを撮像して得られた画像データやユーザから発せられる音声に基づく音声データから抽出した特徴量を、興味度を示す学習モデルと比較することにより、ユーザの感情を推定することが提案されている。
特開２００１−２１５９９３号公報特開２００７−３４６６４号公報 Moreover, in the emotion estimation apparatus of patent document 2, the feature-value extracted from the audio | voice data based on the image data obtained by imaging a user and the audio | voice emitted from a user is compared with the learning model which shows an interest degree, It has been proposed to estimate the user's emotions.
JP 2001-215993 A JP 2007-34664 A

しかしながら、上記特許文献１の対話処理装置及び特許文献２の感情推定装置では、学習モデルを生成する際の学習データの量が考慮されていないため、学習データの個数に偏りがある場合には正確な学習が行われていない場合がある、という問題がある。また、ポジティブまたはネガティブを表す感情極性の判定を行っていないため、例えば、ユーザの感情は「楽しい」というポジティブな感情であるのに対して、「腹立たしい」というネガティブな感情であると推定するような致命的な誤判断が生じる可能性がある、という問題がある。 However, since the dialogue processing device of Patent Literature 1 and the emotion estimation device of Patent Literature 2 do not take into account the amount of learning data when generating a learning model, it is accurate if the number of learning data is biased. There is a problem that there is a case where no proper learning is performed. Also, because the emotion polarity indicating positive or negative is not determined, for example, it is estimated that the user's emotion is a positive emotion of “fun”, but a negative emotion of “angry” There is a problem that a fatal misjudgment may occur.

本発明は、上述した問題を解決するためになされたものであり、１つの感情に対応する感情モデルに該当する学習データの個数を考慮して感情モデルを生成することにより、効率よくかつ精度よく感情を推定することができる感情推定装置及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problem, and generates an emotion model in consideration of the number of learning data corresponding to an emotion model corresponding to one emotion, thereby efficiently and accurately. An object of the present invention is to provide an emotion estimation device and a program capable of estimating emotions.

上記目的を達成するために、第１の発明に係る感情推定装置は、ユーザを撮像して得られた画像データ、前記ユーザの発話により入力された音声データ、及び前記ユーザにより前記発話以外で入力されたテキストデータの少なくとも１つの入力データの特徴を抽出する抽出手段と、各々が異なる１つの感情に対応すると共に、各々が複数のサンプルデータから特徴と感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの感情に該当するか否かを表した複数の感情モデルを生成する感情モデル生成手段と、前記抽出手段で抽出された前記入力データの特徴が、前記複数の感情モデルの各々に対応する１つの感情のいずれかに対応するかを、前記１つの感情に該当する学習データの個数が多い感情モデルから順に判断することにより、前記ユーザの感情を推定する推定手段と、を含んで構成されている。 In order to achieve the above object, an emotion estimation apparatus according to a first aspect of the present invention provides image data obtained by imaging a user, audio data input by the user's utterance, and input by the user other than the utterance. Extracting means for extracting at least one input data feature of the text data, each corresponding to one different emotion, and each of a plurality of pre-extracted features and emotions corresponding to each other from a plurality of sample data For each of the learning data, emotion model generation means for generating a plurality of emotion models that indicate whether or not each of the emotions in the learning data corresponds to the one emotion, and the extracted by the extraction means Whether the feature of the input data corresponds to one of the emotions corresponding to each of the plurality of emotion models, the number of learning data corresponding to the one emotion is large. By determining the emotion model in the order, it is configured to include a, an estimation unit for estimating the emotion of the user.

また、第１の発明に係る感情推定プログラムは、コンピュータを、ユーザを撮像して得られた画像データ、前記ユーザの発話により入力された音声データ、及び前記ユーザにより前記発話以外で入力されたテキストデータの少なくとも１つの入力データの特徴を抽出する抽出手段と、各々が異なる１つの感情に対応すると共に、各々が複数のサンプルデータから特徴と感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの感情に該当するか否かを表した複数の感情モデルを生成する感情モデル生成手段と、前記抽出手段で抽出された前記入力データの特徴が、前記複数の感情モデルの各々に対応する１つの感情のいずれかに対応するかを、前記１つの感情に該当する学習データの個数が多い感情モデルから順に判断することにより、前記ユーザの感情を推定する推定手段として機能させるためのプログラムである。 In addition, the emotion estimation program according to the first aspect of the present invention provides a computer, image data obtained by imaging a user, voice data input by the user's speech, and text input by the user other than the speech Extraction means for extracting at least one input data feature of the data, each corresponding to one different emotion, each of a plurality of learning data extracted in advance corresponding to the feature and emotion from a plurality of sample data For each, emotion model generation means for generating a plurality of emotion models representing whether each of the emotions of the learning data corresponds to the one emotion, and the input data extracted by the extraction means Whether the feature corresponds to one of the emotions corresponding to each of the plurality of emotion models is determined by the number of learning data corresponding to the one emotion. By determining from the stomach feeling model in order, a program to function as estimating means for estimating the emotion of the user.

第１の発明に係る感情推定装置及びプログラムによれば、抽出手段が、ユーザを撮像して得られた画像データ、ユーザの発話により入力された音声データ、及びユーザにより発話以外で入力されたテキストデータの少なくとも１つの入力データの特徴を抽出する。また、感情モデル生成手段が、各々が異なる１つの感情に対応すると共に、各々が複数のサンプルデータから特徴と感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が１つの感情に該当するか否かを表した複数の感情モデルを生成する。そして、推定手段が、抽出手段で抽出された入力データの特徴が、複数の感情モデルの各々に対応する１つの感情のいずれかに対応するかを、１つの感情に該当する学習データの個数が多い感情モデルから順に判断することにより、ユーザの感情を推定する。 According to the emotion estimation apparatus and the program according to the first invention, the extraction means captures image data obtained by imaging the user, voice data input by the user's utterance, and text input by the user other than the utterance. A feature of at least one input data of the data is extracted. In addition, the emotion model generation means corresponds to one different emotion, and each of the learning data for each of the plurality of learning data extracted in advance by associating features and emotions from a plurality of sample data. A plurality of emotion models expressing whether each of the emotions corresponds to one emotion is generated. The estimation means determines whether the feature of the input data extracted by the extraction means corresponds to one of the emotions corresponding to each of the plurality of emotion models. A user's emotion is estimated by judging in order from many emotion models.

このように、入力データの特徴が各感情モデルに該当するか否かを、該当する学習データの個数が多い感情モデルから順に判断するため、出現率の高い感情から判断されることになり、効率よくかつ精度よく感情を推定することができる。 In this way, whether or not the feature of the input data corresponds to each emotion model is determined in order from the emotion model with the large number of corresponding learning data, so it is determined from the emotion with a high appearance rate, and the efficiency Emotion can be estimated well and accurately.

また、第１の発明に係る感情推定装置及びプログラムの前記感情モデル生成手段は、前記複数の感情モデルの各々を生成する際に、前記該当する学習データの個数が多い順に生成すると共に、生成が終了した感情モデルに対応する感情に該当する学習データを、他の感情モデルを生成する際の学習データに含めないようにして前記他の感情モデルを生成することができる。 In addition, the emotion model generation means of the emotion estimation apparatus and program according to the first invention generates each of the plurality of emotion models in the descending order of the number of the corresponding learning data. The other emotion model can be generated such that the learning data corresponding to the emotion corresponding to the finished emotion model is not included in the learning data when generating another emotion model.

このように、該当する学習データの個数が多い順に感情モデルが生成され、生成された感情モデルに該当する学習データは他の感情モデル生成の際に学習データに含まれないようにすることで、感情モデルの各々に含まれる該当する学習データの個数と該当しない学習データの個数との偏りを軽減することができるため、精度のよい感情モデルが生成され、感情推定の精度をさらに向上させることができる。 In this way, emotion models are generated in descending order of the number of corresponding learning data, and the learning data corresponding to the generated emotion model is not included in the learning data when generating other emotion models, Since it is possible to reduce the bias between the number of corresponding learning data and the number of non-applicable learning data included in each emotion model, an accurate emotion model can be generated and the accuracy of emotion estimation can be further improved it can.

また、第２の発明に係る感情推定装置は、ユーザを撮像して得られた画像データ、前記ユーザの発話により入力された音声データ、及び前記ユーザにより前記発話以外で入力されたテキストデータの少なくとも１つの入力データの特徴を抽出する抽出手段と、前記抽出手段により抽出された前記入力データの特徴が、第１の極性を表す感情及び該第１の極性に対して反対の感情の第２の極性を表す感情のいずれを示すかを判別する極性判別手段と、各々が異なる１つの第１の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第１の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの第１の極性を表す感情に該当するか否かを表した複数の第１感情モデルと、各々が異なる１つの第２の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第２の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの第２の極性を表す感情に該当するか否かを表した複数の第２感情モデルとを生成する感情モデル生成手段と、前記極性判別手段で前記入力データの特徴が前記第１の極性を表す感情であると判別された場合には、前記抽出手段で抽出された前記入力データの特徴が、前記複数の第１感情モデルの各々に対応する１つの第１の極性を表す感情のいずれかに対応するかを、前記１つの第１の極性を表す感情に該当する学習データの個数が多い第１感情モデルから順に判断し、前記極性判別手段で前記入力データの特徴が前記第２の極性を表す感情であると判別された場合には、前記抽出手段で抽出された前記入力データの特徴が、前記複数の第２感情モデルの各々に対応する１つの第２の極性を表す感情のいずれかに対応するかを、前記１つの第２の極性を表す感情に該当する学習データの個数が多い第２感情モデルから順に判断することにより、前記ユーザの感情を推定する推定手段と、を含んで構成されている。 In addition, the emotion estimation apparatus according to the second aspect of the present invention includes at least image data obtained by imaging a user, voice data input by the user's utterance, and text data input by the user other than the utterance. Extraction means for extracting the feature of one input data, and the feature of the input data extracted by the extraction means is the second of the emotion representing the first polarity and the emotion opposite to the first polarity. Polarity discriminating means for discriminating which of the emotions representing the polarity, and emotions each corresponding to one emotion representing one different first polarity and each representing a feature and the first polarity from a plurality of sample data And a plurality of first emotions representing whether or not each of the emotions in the learning data corresponds to an emotion representing the one first polarity. Each of the plurality of learning data corresponding to Dell and emotions each representing one different second polarity and each extracted in advance corresponding to a feature and an emotion representing the second polarity from a plurality of sample data An emotion model generating means for generating a plurality of second emotion models representing whether or not each of the emotions in the learning data corresponds to an emotion representing the one second polarity, and the polarity determination When the means determines that the feature of the input data is an emotion representing the first polarity, the feature of the input data extracted by the extraction means is included in each of the plurality of first emotion models. It is judged in order from the first emotion model having a large number of learning data corresponding to the emotion representing the first polarity, which one of the corresponding emotions representing the first polarity corresponds, The input device When the feature of the input data is determined to be an emotion representing the second polarity, the feature of the input data extracted by the extraction means is one corresponding to each of the plurality of second emotion models. By determining in order from the second emotion model in which the number of learning data corresponding to the emotion representing the one second polarity corresponds to any of the emotions representing the second polarity, the user's emotions And estimating means for estimating.

また、第２の発明に係る感情推定プログラムは、コンピュータを、ユーザを撮像して得られた画像データ、前記ユーザの発話により入力された音声データ、及び前記ユーザにより前記発話以外で入力されたテキストデータの少なくとも１つの入力データの特徴を抽出する抽出手段と、前記抽出手段により抽出された前記入力データの特徴が、第１の極性を表す感情及び該第１の極性に対して反対の感情の第２の極性を表す感情のいずれを示すかを判別する極性判別手段と、各々が異なる１つの第１の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第１の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの第１の極性を表す感情に該当するか否かを表した複数の第１感情モデルと、各々が異なる１つの第２の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第２の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が前記１つの第２の極性を表す感情に該当するか否かを表した複数の第２感情モデルとを生成する感情モデル生成手段と、前記極性判別手段で前記入力データの特徴が前記第１の極性を表す感情であると判別された場合には、前記抽出手段で抽出された前記入力データの特徴が、前記複数の第１感情モデルの各々に対応する１つの第１の極性を表す感情のいずれかに対応するかを、前記１つの第１の極性を表す感情に該当する学習データの個数が多い第１感情モデルから順に判断し、前記極性判別手段で前記入力データの特徴が前記第２の極性を表す感情であると判別された場合には、前記抽出手段で抽出された前記入力データの特徴が、前記複数の第２感情モデルの各々に対応する１つの第２の極性を表す感情のいずれかに対応するかを、前記１つの第２の極性を表す感情に該当する学習データの個数が多い第２感情モデルから順に判断することにより、前記ユーザの感情を推定する推定手段として機能させるためのプログラムである。 According to a second aspect of the present invention, there is provided an emotion estimation program comprising: image data obtained by imaging a computer; audio data input by the user's utterance; and text input by the user other than the utterance. Extracting means for extracting at least one feature of input data of the data, and the feature of the input data extracted by the extracting means includes an emotion representing a first polarity and an emotion opposite to the first polarity. The polarity discrimination means for discriminating which of the emotions representing the second polarity and the emotion each representing a different first polarity, and each of the features and the first polarity from a plurality of sample data Whether or not each of the plurality of learning data previously extracted in correspondence with the emotion representing the emotion corresponds to the emotion representing the one first polarity A plurality of first emotion models represented and corresponding to emotions each representing one different second polarity, and each corresponding to a feature and an emotion representing the second polarity are extracted in advance from a plurality of sample data. An emotion model that generates, for each of the plurality of learning data, a plurality of second emotion models that indicate whether each of the emotions in the learning data corresponds to the emotion that represents the one second polarity When the generation unit and the polarity determination unit determine that the feature of the input data is an emotion representing the first polarity, the feature of the input data extracted by the extraction unit is the plurality of features The first emotion model having a large number of learning data corresponding to the emotion representing the first polarity, which corresponds to one of the emotions representing the first polarity corresponding to each of the first emotion models. Judging in order from the above When the determining means determines that the feature of the input data is an emotion representing the second polarity, the feature of the input data extracted by the extracting means is the each of the plurality of second emotion models. Is determined in order from the second emotion model in which the number of learning data corresponding to the emotion representing the one second polarity is large. This is a program for functioning as an estimation means for estimating the emotion of the user.

第２の発明に係る感情推定装置及びプログラムによれば、極性判別手段が、抽出手段により抽出された入力データの特徴が、第１の極性を表す感情及び該第１の極性に対して反対の感情の第２の極性を表す感情のいずれを示すかを判別する。また、感情も出る生成手段が、各々が異なる１つの第１の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第１の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が１つの第１の極性を表す感情に該当するか否かを表した複数の第１感情モデルと、各々が異なる１つの第２の極性を表す感情に対応すると共に、各々が複数のサンプルデータから特徴と第２の極性を表す感情とを対応させて予め抽出した複数の学習データの各々に対して、該学習データの感情の各々が１つの第２の極性を表す感情に該当するか否かを表した複数の第２感情モデルとを生成する。 According to the emotion estimation apparatus and the program according to the second invention, the polarity discrimination means has the characteristics of the input data extracted by the extraction means opposite to the emotion representing the first polarity and the first polarity. It is determined which of the emotions representing the second polarity of emotion is shown. In addition, the generating means for generating emotions corresponds to emotions each representing a different first polarity, and each of the features is extracted in advance from a plurality of sample data in association with features representing the first polarity. For each of a plurality of learning data, a plurality of first emotion models each expressing whether or not each of the emotions of the learning data corresponds to an emotion representing one first polarity, and one different first 2 for each of a plurality of pieces of learning data preliminarily extracted in association with a feature and a feeling representing a second polarity from a plurality of sample data. And a plurality of second emotion models representing whether each of them corresponds to an emotion representing one second polarity.

そして、極性判別手段で入力データの特徴が第１の極性を表す感情であると判別された場合には、抽出手段で抽出された入力データの特徴が、複数の第１感情モデルの各々に対応する１つの第１の極性を表す感情のいずれかに対応するかを、１つの第１の極性を表す感情に該当する学習データの個数が多い第１感情モデルから順に判断し、極性判別手段で入力データの特徴が第２の極性を表す感情であると判別された場合には、抽出手段で抽出された入力データの特徴が、複数の第２感情モデルの各々に対応する１つの第２の極性を表す感情のいずれかに対応するかを、１つの第２の極性を表す感情に該当する学習データの個数が多い第２感情モデルから順に判断することにより、ユーザの感情を推定する。 When the polarity determining unit determines that the feature of the input data is an emotion representing the first polarity, the feature of the input data extracted by the extracting unit corresponds to each of the plurality of first emotion models. Which one of the emotions representing one first polarity is determined in order from the first emotion model having a large number of learning data corresponding to the emotion representing one first polarity. When it is determined that the feature of the input data is an emotion representing the second polarity, the feature of the input data extracted by the extraction unit is one second corresponding to each of the plurality of second emotion models. The user's emotion is estimated by sequentially determining from the second emotion model in which the number of learning data corresponding to one emotion representing the second polarity corresponds to one of the emotions representing the polarity.

このように、第１の極性を表す感情について複数の第１感情モデルを生成し、第１の極性と反対の感情の第２の極性を表す感情について複数の第２感情モデルを生成し、まず入力データの特徴の極性を判別した上で、該当する極性を表す感情についての感情モデルに該当するか否かを判断するため、第１の極性を表す感情を第２の極性を表す感情であると推定したり、第２の極性を表す感情を第１の極性を表す感情であると推定したりという致命的な誤判断を防止することができる。 In this way, a plurality of first emotion models are generated for the emotion representing the first polarity, a plurality of second emotion models are generated for the emotion representing the second polarity of the emotion opposite to the first polarity, After determining the polarity of the feature of the input data, the emotion representing the first polarity is the emotion representing the second polarity in order to determine whether the emotion model representing the emotion representing the corresponding polarity is applicable. It is possible to prevent a fatal misjudgment such as presuming that the emotion representing the second polarity is estimated as the emotion representing the first polarity.

また、第２の発明に係る感情推定装置及びプログラムは、前記複数の学習データの各々に対して、該学習データの感情の各々が前記第１の極性を表す感情に該当するか前記第２の極性を表す感情に該当するかを表した感情極性モデルを生成する感情極性モデル生成手段をさらに含み、前記極性判別手段は、前記感情極性モデルに基づいて、前記入力データの特徴が、前記第１の極性を表す感情及び前記第２の極性を表す感情のいずれを示すかを判別するようにすることができる。 In addition, the emotion estimation apparatus and the program according to the second invention may be configured so that, for each of the plurality of learning data, each of the emotions in the learning data corresponds to an emotion representing the first polarity. An emotion polarity model generating means for generating an emotion polarity model representing whether the emotion corresponds to an emotion representing a polarity, wherein the polarity determining means is configured such that the feature of the input data is based on the emotion polarity model; It is possible to discriminate which one of the emotions representing the polarity of the second one and the emotions representing the second polarity are shown.

また、第２の発明に係る感情推定装置及びプログラムの前記感情モデル生成手段は、前記複数の第１感情モデルの各々を生成する際に、前記該当する学習データの個数が多い順に生成すると共に、生成が終了した第１感情モデルに対応する第１の極性を表す感情に該当する学習データを、他の第１感情モデルを生成する際の学習データに含めないようにして前記他の第１感情モデルを生成し、前記複数の第２感情モデルの各々を生成する際に、前記該当する学習データの個数が多い順に生成すると共に、生成が終了した第２感情モデルに対応する第２の極性を表す感情に該当する学習データを、他の第２感情モデルを生成する際の学習データに含めないようにして前記他の第２感情モデルを生成するようにすることができる。 In addition, the emotion model generation means of the emotion estimation device and the program according to the second invention generates each of the plurality of first emotion models in order of increasing number of the corresponding learning data, The learning data corresponding to the emotion representing the first polarity corresponding to the first emotion model that has been generated is not included in the learning data when generating the other first emotion model, and the other first emotions are not included. When the model is generated and each of the plurality of second emotion models is generated, the second learning model is generated in descending order of the number of corresponding learning data, and the second polarity corresponding to the second emotion model that has been generated is set. The other second emotion model can be generated such that the learning data corresponding to the emotion to be expressed is not included in the learning data when the other second emotion model is generated.

このように、極性毎に該当する学習データの個数が多い順に感情モデルが生成され、生成された感情モデルに該当する学習データは他の感情モデル生成の際に含まれないようにすることで、感情モデルの各々に含まれる該当する学習データの個数と該当しない学習データの個数との偏りを軽減することができるため、精度のよい感情モデルが生成され、感情推定の精度をさらに向上させることができる。 In this way, emotion models are generated in descending order of the number of corresponding learning data for each polarity, and learning data corresponding to the generated emotion model is not included when generating other emotion models, Since it is possible to reduce the bias between the number of corresponding learning data and the number of non-applicable learning data included in each emotion model, an accurate emotion model can be generated and the accuracy of emotion estimation can be further improved it can.

以上説明したように、本発明の感情推定装置及びプログラムによれば、１つの感情に対応する感情モデルに該当する学習データの個数を考慮して学習モデルを生成することにより、効率よくかつ精度よく感情を推定することができる、という効果が得られる。 As described above, according to the emotion estimation apparatus and program of the present invention, the learning model is generated in consideration of the number of learning data corresponding to the emotion model corresponding to one emotion, thereby efficiently and accurately. The effect that emotion can be estimated is obtained.

以下、図面を参照して本発明の実施の形態について詳細に説明する。なお、以下では、本発明の感情推定装置を、ユーザからの入力に対応した応答を生成してユーザと対話を行う感情推定対話装置に適用した場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Hereinafter, a case will be described in which the emotion estimation device of the present invention is applied to an emotion estimation dialogue device that generates a response corresponding to an input from the user and interacts with the user.

図１に示すように、第１の実施の形態に係る感情推定対話装置１０は、ユーザの音声を入力するためのマイク１２、ユーザの顔を撮像するための撮像装置１４、応答を音声で出力するためのスピーカ１６、及び感情推定及び応答生成の制御を実行するコンピュータ１８を備えている。 As shown in FIG. 1, the emotion estimation dialogue apparatus 10 according to the first embodiment includes a microphone 12 for inputting a user's voice, an imaging apparatus 14 for imaging a user's face, and a response output by voice. And a computer 18 that controls emotion estimation and response generation.

コンピュータ１８は、感情推定対話装置１０全体の制御を司るＣＰＵ２４、後述する感情推定モデル生成処理及び対話処理のプログラム等各種プログラムを記憶した記憶媒体としてのＲＯＭ２６、ワークエリアとしてデータを一時的に格納するＲＡＭ２８、各種情報が記憶された記憶手段としてのＨＤＤ（ハードディスク）３０、ネットワークと接続するためのネットワークＩ／Ｆ（インタフェース）部３２、Ｉ／Ｏ（入出力）ポート３４、及びこれらを接続するバスを含んで構成されている。Ｉ／Ｏポート３４には、マイク１２、撮像装置１４及びスピーカ１６が接続されている。 The computer 18 is a CPU 24 that controls the entire emotion estimation dialogue apparatus 10, a ROM 26 that stores various programs such as a program for emotion estimation model generation processing and dialogue processing described later, and temporarily stores data as a work area. RAM 28, HDD (hard disk) 30 as storage means storing various information, network I / F (interface) unit 32 for connecting to a network, I / O (input / output) port 34, and a bus connecting them It is comprised including. The microphone 12, the imaging device 14, and the speaker 16 are connected to the I / O port 34.

まず、後述する感情推定モデル生成処理に使用される学習元データベースについて説明する。学習元データベースは、例えば、図２に示されるようなものある。このような学習元データベース４０を得るためには、まず、対話中の人物の顔画像を撮影して得られる画像データ及び発話に基づく音声データを取得する。音声データと画像データとは略同時に取得される。また、音声データ及び画像データが取得された際の人物の感情を人物に対してヒアリングするなどして得ておく。なお、感情は予め定めたｎ（ｎは自然数）種類の感情に限定するものとし、ここでは、例えば、「嫌」「嬉しい」「残念」「楽しい」「恐い」「不安」「寂しい」「腹立たしい」「悲しい」の９種類（ｎ＝９）とする。なお、感情の種類は、１０以上であっても、８以下であってもよい。 First, the learning source database used for the emotion estimation model generation process described later will be described. The learning source database is, for example, as shown in FIG. In order to obtain such a learning source database 40, first, image data obtained by photographing a face image of a person in conversation and voice data based on an utterance are acquired. Audio data and image data are acquired substantially simultaneously. Also, the emotion of the person when the voice data and the image data are acquired is obtained by hearing the person. Note that emotions are limited to n (n is a natural number) emotions that have been set in advance. Here, for example, “dislike”, “happy”, “sorry”, “fun”, “fear”, “anxiety”, “lonely”, “angry” 9 types of “sad” (n = 9). The type of emotion may be 10 or more, or 8 or less.

画像データについては、エッジ処理などの画像処理を施して表情を認識するなどして画像特徴Ｉを抽出する。音声データについては、音声認識処理によりテキストデータに変換し、変換したテキストデータから、例えば、「ので」「ため」などの手がかり語を用いた方法で感情状態を示す感情語をテキスト特徴Ｔとして抽出する。同一の音声データについて、韻律を分析するなどして韻律特徴Ｒを抽出する。 For image data, the image feature I is extracted by performing image processing such as edge processing to recognize facial expressions. For voice data, it is converted into text data by voice recognition processing, and an emotion word indicating an emotional state is extracted from the converted text data as a text feature T by a method using a clue word such as “So” and “For”. To do. For the same voice data, prosodic features R are extracted by analyzing the prosody.

この画像特徴Ｉ、テキスト特徴Ｔ、及び韻律特徴Ｒをまとめて１つの特徴４２とし、これらの特徴が抽出された画像データ及び音声データが取得された際の人物の感情４４と特徴４２とを対応付けて１つの学習データ４６とする。上述の音声データ及び画像データと、音声データ及び画像データが取得された際の感情とを大量に取得しておき、学習データ４６を大量に生成することで学習元データベース４０を構築する。 The image feature I, the text feature T, and the prosody feature R are combined into one feature 42, and the human emotion 44 and the feature 42 when the image data and the voice data from which these features are extracted are associated with each other. One learning data 46 is added. The learning source database 40 is constructed by acquiring a large amount of the above-described audio data and image data and feelings when the audio data and image data are acquired, and generating a large amount of learning data 46.

なお、学習元データベース４０は、人物から取得される情報から抽出される特徴と感情とを対応付けた学習データから構成されていればよく、取得するデータの種類、抽出される特徴の種類、特徴の抽出方法、及び感情の種類などは上記の内容に限定されるものではない。また、学習元データベース４０は、本実施の形態の感情推定対話装置１０で構築してもよいし、他の外部装置により構築してもよい。感情推定対話装置１０で構築した場合には、構築された学習元データベース４０は、ＨＤＤ３０に記憶しておく。外部装置で構築した場合には、ネットワークを介して取得することができる。本実施の形態では、学習元データベース４０は、外部装置にて構築されて外部装置に記憶されている場合について説明する。 The learning source database 40 only needs to be configured by learning data in which features extracted from information acquired from a person and emotions are associated with each other. The type of data to be acquired, the type of features to be extracted, and the features The extraction method and the type of emotion are not limited to the above-described contents. Further, the learning source database 40 may be constructed by the emotion estimation dialogue apparatus 10 of the present embodiment or may be constructed by another external apparatus. When constructed by the emotion estimation dialogue apparatus 10, the constructed learning source database 40 is stored in the HDD 30. When it is constructed by an external device, it can be acquired via a network. In the present embodiment, a case where the learning source database 40 is constructed by an external device and stored in the external device will be described.

次に、図３を参照して、第１の実施の形態における感情推定モデル生成の処理ルーチンについて説明する。 Next, with reference to FIG. 3, a processing routine for generating an emotion estimation model in the first embodiment will be described.

ステップ１００で、ネットワークＩ／Ｆ３２を介してネットワークに接続された外部装置から学習元データベース４０を取得する。 In step 100, the learning source database 40 is acquired from an external device connected to the network via the network I / F 32.

次に、ステップ１０２で、感情４４毎に学習データ４６の個数をカウントして、次に、ステップ１０４で、最もカウント数が多かった感情をパラメータＸに設定する。例えば、本実施の形態では「嫌」という感情４４の学習データ４６の個数が最も多かった場合を例としているので、Ｘ＝「嫌」と設定する。次に、ステップ１０６の学習処理を実行する。 Next, in step 102, the number of learning data 46 is counted for each emotion 44. Next, in step 104, the emotion having the largest count is set as the parameter X. For example, in the present embodiment, the case where the number of learning data 46 of the emotion 44 “dislike” is the largest is taken as an example, and therefore X = “dislike” is set. Next, the learning process of step 106 is executed.

ここで、図４を参照して、学習処理の処理ルーチンについて説明する。 Here, the processing routine of the learning process will be described with reference to FIG.

ステップ２００で、学習元データベース４０の学習データ４６を１つずつ学習していく。ここでは、学習の手法としてＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）の手法を用いる。まず、１つ目の学習データ４６の感情４４がＸか否かを判断する。Ｘの場合、すなわち学習データ４６の感情４４が感情Ｘに該当する場合には、ステップ２０２へ進んで正例として学習を行い、Ｘではない場合、すなわち学習データ４６の感情４４が感情Ｘに該当しない場合には、ステップ２０４へ進んで負例として学習を行う。図２の学習元データベース４０の場合では、１つ目の学習データ４６の感情４４は「楽しい」であるので、ステップ２００で否定されてステップ２０４で負例として学習される。 In step 200, the learning data 46 in the learning source database 40 is learned one by one. Here, the SVM (Support Vector Machine) method is used as a learning method. First, it is determined whether or not the emotion 44 of the first learning data 46 is X. In the case of X, that is, when the emotion 44 of the learning data 46 corresponds to the emotion X, the process proceeds to step 202 and learning is performed as a positive example. When it is not X, that is, the emotion 44 of the learning data 46 corresponds to the emotion X If not, the process proceeds to step 204 where learning is performed as a negative example. In the case of the learning source database 40 in FIG. 2, the emotion 44 of the first learning data 46 is “fun”, so it is denied in step 200 and learned as a negative example in step 204.

次に、ステップ２０６で、学習元データベース４０のすべての学習データ４６について学習を終了したか否かを判断する。未学習の学習データ４６が残っている場合には、ステップ２００へ戻って、次の学習データ４６について学習を繰り返す。図２の学習元データベース４０の場合では、２つ目の学習データ４６の感情４４は「嫌」であるので、ステップ２００で肯定されてステップ２０２で正例として学習される。 Next, in step 206, it is determined whether or not learning has been completed for all the learning data 46 in the learning source database 40. If unlearned learning data 46 remains, the process returns to step 200 and learning is repeated for the next learning data 46. In the case of the learning source database 40 in FIG. 2, since the emotion 44 of the second learning data 46 is “dislike”, it is affirmed in step 200 and learned as a positive example in step 202.

学習元データベース４０内のすべての学習データ４６について学習が終了した場合には、ステップ２０６で肯定されてリターンする。この学習処理により、１つ目の感情Ｘ（＝１）についての感情モデルＭ（１）が生成される。ここでは、感情モデルＭ（１）は、１つの感情「嫌」に対応する感情モデル（「嫌」モデル）である。 When learning is completed for all the learning data 46 in the learning source database 40, the determination in step 206 is affirmative and the process returns. Through this learning process, an emotion model M (1) for the first emotion X (= 1) is generated. Here, the emotion model M (1) is an emotion model (“dislike” model) corresponding to one emotion “dislike”.

次に、感情推定モデル生成処理ルーチン（図３）のステップ１０８で、学習元データベース４０から感情Ｘに該当する学習データ４６、すなわち正例の学習データ４６のすべてを削除する。なお、本実施の形態では、感情Ｘに対して正例の学習データ４６を削除することとしたが、学習元データベース４０からは削除せず、次の感情モデル生成の際に、既に生成された感情モデルに該当する感情の学習データ４６か否かを判断するステップを設け、否定判断される学習データ４６のみを感情モデルの生成に使用するようにしてもよい。 Next, in step 108 of the emotion estimation model generation processing routine (FIG. 3), all of the learning data 46 corresponding to the emotion X, that is, all of the positive example learning data 46 is deleted from the learning source database 40. In the present embodiment, the positive learning data 46 is deleted from the emotion X, but it is not deleted from the learning source database 40, and has already been generated when the next emotion model is generated. A step of determining whether or not the emotion learning data 46 corresponds to the emotion model may be provided, and only the learning data 46 determined to be negative may be used for generating the emotion model.

次に、ステップ１１０で、学習元データベース４０に残っている学習データ４６の感情４４の種類が１種類か否かを判断する。２種類以上残っている場合には、ステップ１０６へ戻り、残っている学習データ４６のうち、最も学習データの個数が多い感情をパラメータＸに設定して以降の処理を繰り返し、感情毎の感情モデルＭ（ｉ）（ｉは感情モデルが生成された順に付与される通し番号）を生成する。 Next, in step 110, it is determined whether or not the type of emotion 44 in the learning data 46 remaining in the learning source database 40 is one. If two or more types remain, the process returns to step 106, and among the remaining learning data 46, the emotion having the largest number of learning data is set as the parameter X, and the subsequent processing is repeated, and the emotion model for each emotion M (i) (i is a serial number assigned in the order in which emotion models are generated) is generated.

ステップ１１０で、残りの感情が１種類であると判断された場合には、ステップ１１２へ進んで、生成した感情毎の感情モデルＭ（ｉ）を、感情モデルＭ（１）、感情モデルＭ（２）、・・・、感情モデルＭ（ｎ−１）のように配列した感情推定モデルを構築する。なお、感情の種類がｎ種類の場合には、最後の感情については感情モデルが生成されないため、配列の最後は感情モデルＭ（ｎ−１）になっている。 If it is determined in step 110 that the remaining emotion is one type, the process proceeds to step 112, and the emotion model M (i) for each generated emotion is converted into the emotion model M (1) and the emotion model M ( 2), ..., an emotion estimation model arranged as an emotion model M (n-1) is constructed. Note that when there are n types of emotions, no emotion model is generated for the last emotion, so the last of the array is the emotion model M (n−1).

例えば、学習データの個数が「嫌」「嬉しい」「残念」「楽しい」「恐い」「不安」「寂しい」「腹立たしい」「悲しい」の順で多かったとすると、Ｍ（１）＝「嫌」モデル、Ｍ（２）＝「嬉しい」モデル、・・・、Ｍ（８）＝「腹立たしい」モデルとなり、図５に示すように、「嫌」モデル、「嬉しい」モデル、「残念」モデル、「楽しい」モデル、「恐い」モデル、「不安」モデル、「寂しい」モデル及び「腹立たしい」モデルの順で各感情モデル５０が配列された感情推定モデルが構築される。感情推定モデルをＨＤＤ３０に記憶して処理を終了する。 For example, if the number of learning data is “dislike”, “happy”, “sorry”, “fun”, “scary”, “anxiety”, “lonely”, “angry”, “sad”, M (1) = “dislike” model , M (2) = “joyful” model,..., M (8) = “angry” model, as shown in FIG. 5, “dislike” model, “happy” model, “sorry” model, “fun” The emotion estimation model in which the emotion models 50 are arranged in the order of the “model”, “scary” model, “anxiety” model, “lonely” model, and “angry” model is constructed. The emotion estimation model is stored in HDD 30 and the process is terminated.

次に、図６を参照して、感情推定を含む対話処理の処理ルーチンについて説明する。 Next, a processing routine for dialogue processing including emotion estimation will be described with reference to FIG.

ステップ３００で、ユーザによりマイク１２から入力される音声データ及びユーザの顔を撮像装置１４で撮像した画像データを取り込む。次に、ステップ３０２で、取り込んだ画像データから画像特徴Ｉ_０を抽出する。次に、ステップ３０４で、音声データを音声認識してテキストデータに変換し、変換したテキストデータからテキスト特徴Ｔ_０を抽出する。次に、ステップ３０６で同一の音声データから韻律特徴Ｒ_０を抽出する。画像特徴Ｉ_０、テキスト特徴Ｔ_０及び韻律特徴Ｒ_０の抽出方法は、学習元データベース４０を構築する際に画像特徴Ｉ、テキスト特徴Ｔ及び韻律特徴Ｒを抽出した方法と同じ方法を用いる。 In step 300, audio data input from the microphone 12 by the user and image data obtained by capturing the user's face with the imaging device 14 are captured. Next, in step 302, the image feature I ₀ is extracted from the captured image data. Next, in step 304, the voice data is converted into text data by voice recognition, extracting a text feature T ₀ from the converted text data. Next, in step 306, the prosodic feature _R0 is extracted from the same voice data. The method for extracting the image feature I ₀ , the text feature T _0, and the prosody feature R ₀ is the same as the method for extracting the image feature I, the text feature T, and the prosody feature R when the learning source database 40 is constructed.

次に、ステップ３０８で、画像特徴Ｉ_０、テキスト特徴Ｔ_０及び韻律特徴Ｒ_０をまとめて入力データの特徴を求める。次に、ステップ３１０で、後述する感情推定処理を実行し、次に、ステップ３１２で、応答生成出力処理を実行して、推定された感情に応じた応答を生成して出力する。応答生成出力処理については従来の技術を用いることができるため、説明を省略する。 Next, in step 308, the image feature I ₀ , the text feature T _0, and the prosody feature R ₀ are collected to obtain the feature of the input data. Next, in step 310, an emotion estimation process, which will be described later, is executed. Next, in step 312, a response generation output process is executed to generate and output a response corresponding to the estimated emotion. Since the conventional technique can be used for the response generation output process, the description is omitted.

ここで、図７を参照して、感情推定処理の処理ルーチンについて説明する。 Here, the processing routine of the emotion estimation process will be described with reference to FIG.

ステップ４００で、カウンタ値ｉに「１」をセットする。このカウンタ値ｉは、感情推定モデルに含まれる各感情モデルに付与された通し番号に対応するものである。カウンタ値ｉ＝１として、以下のステップで通し番号「１」の感情モデルから順に比較することにより、感情推定モデルを構築した際に使用した学習元データベース４０の中の学習データの個数が多い感情に対応する感情モデルから順に判断が行われることになる。 In step 400, "1" is set to the counter value i. This counter value i corresponds to a serial number assigned to each emotion model included in the emotion estimation model. By setting the counter value i = 1 in the following steps in order from the emotion model with the serial number “1”, the emotion data having a large number of learning data in the learning source database 40 used when the emotion estimation model is constructed is obtained. Judgment is performed in order from the corresponding emotion model.

次に、ステップ４０２で、対話処理（図６）のステップ３０８で求められた入力データの特徴が示す感情が感情推定モデルの最初の感情モデルＭ（１）に対応する感情に該当するか否かを、感情モデルを生成した際の手法に対応した手法を用いて判断する。該当する場合には、ステップ４０４へ進み、推定結果として感情モデルＭ（１）に対応する感情Ｆ（１）を出力する。 Next, in step 402, whether or not the emotion indicated by the feature of the input data obtained in step 308 of the dialogue process (FIG. 6) corresponds to the emotion corresponding to the first emotion model M (1) of the emotion estimation model. Is determined using a method corresponding to the method used when the emotion model is generated. When it corresponds, it progresses to step 404 and outputs emotion F (1) corresponding to emotion model M (1) as an estimation result.

ステップ４０２で該当しないと判断された場合には、ステップ４０６へ進み、感情モデルＭ（１）が感情推定モデルの最後の感情モデルか否かを判断する。最後の感情モデルではない場合には、次の感情モデルとの比較を行うため、ステップ４０８へ進んでカウンタ値ｉをインクリメントしてステップ４０２へ戻る。 If it is determined in step 402 that it is not applicable, the process proceeds to step 406, where it is determined whether or not the emotion model M (1) is the last emotion model of the emotion estimation model. If it is not the last emotion model, the process proceeds to step 408 to increment the counter value i and return to step 402 in order to compare with the next emotion model.

上記ステップを繰り返し、最後の感情モデルＭ（ｎ−１）に対応する感情にも該当しなかった場合には、ステップ４０６で肯定されてステップ４１０へ進み、感情モデルの生成されていなかった感情、すなわち該当する学習データの個数が最も少なかった感情を推定結果Ｆ（ｎ）として出力してリターンする。 If the above steps are repeated and the emotion corresponding to the last emotion model M (n−1) does not correspond to the emotion, the affirmative determination is made in step 406 and the processing proceeds to step 410. That is, the emotion with the smallest number of corresponding learning data is output as the estimation result F (n) and the process returns.

上記処理を図５に示す感情推定モデルを例にして説明すると、まず、入力データの特徴が示す感情が「嫌」モデルに該当するか否かを判断し、該当する場合には、推定結果「嫌」を出力し、該当しない場合には、次の「嬉しい」モデルに該当するか否かを判断する。該当する場合には、推定結果「嬉しい」を出力する。該当しない場合には、順次次の感情モデル５０との判断を行う。最後の「腹立たしい」モデルに該当するか否かを判断し、該当する場合には、推定結果「腹立たしい」を出力し、該当しない場合には、推定結果「悲しい」を出力して終了する。 The above process will be described by taking the emotion estimation model shown in FIG. 5 as an example. First, it is determined whether or not the emotion indicated by the feature of the input data corresponds to the “dislike” model. If it is not applicable, it is determined whether or not it falls under the following “happy” model. If applicable, the estimation result “happy” is output. If not, the next emotion model 50 is sequentially determined. It is determined whether or not the last “angry” model is satisfied. If yes, the estimation result “angry” is output. If not, the estimation result “sad” is output and the process ends.

このように、該当する学習データの個数が多い感情モデルから順に、入力データの特徴が該当するか否かを判断するため、出現率の高い感情から該当するか否かの判断を行うこととなり、効率よく感情推定を行うことができる。また、該当する学習データの個数が多い順に感情モデルが生成され、かつ生成を終了した感情モデルに該当する学習データを削除して次の感情モデルを生成して感情推定モデルを構築することにより、各感情モデルを生成する際の正例の学習データの個数と負例の学習データの個数との偏りが解消された精度の高い感情モデルにより感情推定モデルが構築されることとなり、感情推定の精度が向上する。 In this way, in order from the emotion model in which the number of corresponding learning data is large, in order to determine whether or not the feature of the input data corresponds, it is determined whether or not it corresponds from the emotion with a high appearance rate, Emotion estimation can be performed efficiently. In addition, by creating an emotion estimation model by generating the next emotion model by deleting the learning data corresponding to the emotion model that has been generated and generating the emotion model in the order of the number of corresponding learning data, The emotion estimation model is constructed from a highly accurate emotion model that eliminates the bias between the number of positive learning data and the number of negative learning data when generating each emotion model. Will improve.

次に、第２の実施の形態に係る感情推定対話装置について説明する。第２の実施の形態では、感情極性を判別する点が第１の実施の形態とは異なる。なお、第１の実施の形態と同一の構成及び処理については、同一の符号を付して説明を省略する。 Next, an emotion estimation dialogue apparatus according to the second embodiment will be described. The second embodiment is different from the first embodiment in that the emotion polarity is discriminated. In addition, about the structure and process same as 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

まず、図８を参照して、第２の実施の形態における感情推定モデル生成の処理ルーチンについて説明する。 First, with reference to FIG. 8, a processing routine for generating an emotion estimation model in the second embodiment will be described.

ステップ１００で、学習元データベース４０を取得し、次に、ステップ５００で、学習元データベース４０に含まれる学習データ４６を感情の示す極性に基づいて分類する。例えば、感情の種類として「嫌」「嬉しい」「残念」「楽しい」「恐い」「不安」「寂しい」「腹立たしい」「悲しい」及び「安心」が学習元データベース４０に含まれている場合、「嬉しい」「楽しい」及び「安心」をポジティブ極性、「嫌」「残念」「恐い」「不安」「寂しい」「腹立たしい」及び「悲しい」をネガティブ極性とする。このポジティブ極性及びネガティブ極性の一方を第１の極性、他方を第２の極性とすることができ、第１の極性と第２の極性とは反対の感情を有することになる。この極性に基づいて学習元データベース４０の学習データ４６を分類し、ポジティブ極性の学習元データベース及びネガティブ極性の学習元データベースを構築する。 In step 100, the learning source database 40 is acquired, and in step 500, the learning data 46 included in the learning source database 40 is classified based on the polarity indicated by emotion. For example, if the learning source database 40 includes “dislike”, “happy”, “sorry”, “fun”, “scary”, “anxiety”, “lonely”, “angry”, “sad”, and “safe” as emotion types, “Pleasant”, “fun” and “reliable” are positive polarities, and “dislike”, “sorry”, “scary”, “anxiety”, “lonely”, “angry” and “sad” are negative polarities. One of the positive polarity and the negative polarity can be a first polarity, and the other can be a second polarity, and the first polarity and the second polarity have opposite feelings. The learning data 46 of the learning source database 40 is classified based on this polarity, and a positive polarity learning source database and a negative polarity learning source database are constructed.

次に、ステップ５０２〜ステップ５１０で、ポジティブ極性の学習元データベースの学習データを使用して、第１の実施の形態の感情推定モデル生成処理（図３）のステップ１０２〜ステップ１１０と同様の処理により、ポジティブ極性の感情モデルを生成する。 Next, in steps 502 to 510, processing similar to steps 102 to 110 of the emotion estimation model generation processing (FIG. 3) of the first embodiment is performed using the learning data of the positive polarity learning source database. Thus, a positive polarity emotion model is generated.

次に、ステップ５１２〜ステップ５２０で、同様に、ネガティブ極性の学習元データベースの学習データを利用して、ネガティブ極性の感情モデルを生成する。 Next, in steps 512 to 520, similarly, the negative polarity emotion model is generated using the learning data of the negative polarity learning source database.

ステップ５２０で、肯定判定されたステップ５２２へ進んで、生成したポジティブ極性の感情毎の感情モデルＭＰ（ｉ）を、感情モデルＭＰ（１）、感情モデルＭＰ（２）、・・・、感情モデルＭＰ（ｎ−１）のように配列し、生成したネガティブ極性の感情毎の感情モデルＭＮ（ｉ）を、感情モデルＭＮ（１）、感情モデルＭＮ（２）、・・・、感情モデルＭＮ（ｎ−１）のように配列した感情推定モデルを構築する。 In step 520, the process proceeds to step 522 where an affirmative determination is made, and the generated emotion model MP (i) for each emotion of positive polarity is converted into an emotion model MP (1), an emotion model MP (2),. The emotion model MN (i) for each negative-polarity emotion that is arranged as MP (n-1) is expressed as an emotion model MN (1), an emotion model MN (2), ..., an emotion model MN ( An emotion estimation model arranged as in n-1) is constructed.

例えば、ポジティブ極性の学習データの個数が「嬉しい」「楽しい」「安心」の順で多かったとすると、ＭＰ（１）＝「嬉しい」モデル、ＭＰ（２）＝「楽しい」モデルとなり、ネガティブ極性の学習データの個数が「嫌」「残念」「恐い」「不安」「寂しい」「腹立たしい」「悲しい」の順で多かったとすると、ＭＮ（１）＝「嫌」モデル、ＭＰ（２）＝「残念」モデル、・・・、ＭＮ（６）＝「腹立たしい」モデルとなり、図９に示すように、ポジティブ極性については、「嬉しい」モデル及び「楽しい」モデルの順で各感情モデル５０が配列され、ネガティブ極性については、「嫌」モデル、「残念」モデル、「恐い」モデル、「不安」モデル、「寂しい」モデル及び「腹立たしい」モデルの順で各感情モデル５０が配列された感情推定モデルが構築される。感情推定モデルをＨＤＤ３０に記憶して処理を終了する。 For example, if there are many positive polarity learning data in the order of “joyful”, “fun”, “reliable”, MP (1) = “joyful” model, MP (2) = “fun” model, and negative polarity If the number of learning data is “dislike”, “sorry”, “fear”, “anxiety”, “lonely”, “offended”, “sad”, MN (1) = “dislike” model, MP (2) = “sorry ”Model,..., MN (6) =“ annoyed ”model, and as shown in FIG. 9, with respect to positive polarity, the emotion models 50 are arranged in the order of“ happy ”model and“ fun ”model, As for the negative polarity, the emotion estimation in which the emotion models 50 are arranged in the order of “dislike” model, “sorry” model, “fear” model, “anxiety” model, “lonely” model, and “angry” model. Model is built. The emotion estimation model is stored in HDD 30 and the process is terminated.

次に、図１０を参照して、第２の実施の形態における感情推定を含む対話処理の処理ルーチンについて説明する。 Next, with reference to FIG. 10, a processing routine for dialogue processing including emotion estimation in the second embodiment will be described.

ステップ３００〜ステップ３０６で、画像データから画像特徴Ｉ_０と、音声データからテキスト特徴Ｔ_０、及び韻律特徴Ｒ_０とを抽出し、これらの特徴をまとめて入力データの特徴を求める。次に、ステップ６００で、入力データの特徴が示す感情がポジティブか否かを判断する。この判断は、例えば、入力データの特徴の１つであるテキスト特徴Ｔ_０の持つ概念から判断するなど、周知の技術を用いることができる。 In steps 300 to 306, the image feature I ₀ is extracted from the image data, the text feature T ₀ and the prosodic feature R ₀ are extracted from the speech data, and these features are collected to obtain the feature of the input data. Next, in step 600, it is determined whether or not the emotion indicated by the characteristics of the input data is positive. For this determination, for example, a well-known technique can be used such as determining from the concept of the text feature T ₀ that is one of the features of the input data.

ステップ６００で肯定判定された場合には、ステップ６０２へ進んで、入力データの特徴が示す感情が感情推定モデルのポジティブ極性として配列された感情モデルに該当するか否かを感情モデルが配列された順に判断することによりユーザの感情を推定する。例えば、入力データの特徴が示す極性がポジティブであった場合には、図９に示す感情推定モデルにおいて、「嬉しい」モデル及び「楽しい」モデルの順に該当するか否かを判断していく。 If an affirmative determination is made in step 600, the process proceeds to step 602, where the emotion model is arranged as to whether or not the emotion indicated by the characteristics of the input data corresponds to the emotion model arranged as the positive polarity of the emotion estimation model. The user's emotion is estimated by judging in order. For example, when the polarity indicated by the feature of the input data is positive, it is determined whether or not the emotion estimation model shown in FIG. 9 corresponds to the “joyful” model and the “fun” model in this order.

また、ステップ６００で否定判定された場合には、ステップ６０４へ進んで、入力データの特徴が示す感情が感情推定モデルのネガティブ極性として配列された感情モデルに該当するか否かを感情モデルが配列された順に判断することにより感情を推定する。例えば、入力データの特徴が示す極性がポジティブではなかった場合（ネガティブの場合）には、図９に示す感情推定モデルにおいて、「嫌」モデル、「残念」モデル、「恐い」モデル、「不安」モデル、「寂しい」モデル及び「腹立たしい」モデルの順に該当するか否かを判断していく。感情推定処理の詳細については、第１の実施の形態と同様である。 If a negative determination is made in step 600, the process proceeds to step 604, where the emotion model arranges whether or not the emotion indicated by the feature of the input data corresponds to the emotion model arranged as the negative polarity of the emotion estimation model. Estimate emotions by judging in the order in which they were made. For example, when the polarity indicated by the characteristics of the input data is not positive (in the negative case), in the emotion estimation model shown in FIG. 9, the “dislike” model, “sorry” model, “fear” model, “anxiety” It is determined whether or not the model, the “lonely” model, and the “angry” model fall under this order. The details of the emotion estimation process are the same as those in the first embodiment.

次に、ステップ３１２で、応答生成出力処理を実行して、推定された感情に応じた応答を生成して出力する。 Next, in step 312, a response generation output process is executed to generate and output a response according to the estimated emotion.

このように、入力データの特徴が示す感情がポジティブかネガティブかという反対の極性のいずれであるかを判別し、ポジティブの場合には、ポジティブ極性の感情モデルに該当するか否か、ネガティブの場合には、ネガティブ極性の感情モデルに該当するか否かを、該当する学習データの個数が多い順に判断するため、第１の実施の形態の効果に加え、ポジティブの感情をネガティブの感情であると推定したり、ネガティブの感情をポジティブの感情であると推定したりという致命的な誤判断を防止することができる。 In this way, it is determined whether the emotion indicated by the characteristics of the input data is positive or negative, and if it is positive, if it is positive, whether it falls under the positive polarity emotion model or not In order to determine whether or not it falls under the negative polarity emotion model, in order from the largest number of corresponding learning data, in addition to the effects of the first embodiment, positive emotions are negative emotions. It is possible to prevent fatal misjudgments such as estimation or estimation of negative emotions as positive emotions.

なお、上記対話処理の処理ルーチンのステップ６００では、テキストの持つ概念から感情の極性を判断する等の周知技術を用いる場合について説明したが、感情モデル５０を生成した手法と同じ手法により、感情極性モデルを生成して、入力データの特徴がポジティブ極性及びネガティブ極性のいずれに該当するかを判別するようにしてもよい。感情極性モデルは、学習元データベース４０の学習データ４６の各々を、学習データ４６の感情４４がポジティブ極性の場合には正例として、ネガティブ極性の場合には負例として学習することにより生成することができる。また、極性は、ポジティブ極性及びネガティブ極性の２値であるため、ネガティブ極性の場合には正例として、ポジティブ極性の場合には負例として学習することによっても、同様に感情極性モデルを生成することができる。 In step 600 of the processing routine of the dialog processing, the case where a known technique such as determining the polarity of the emotion from the concept of the text has been described, but the emotion polarity is determined by the same method as the method of generating the emotion model 50. A model may be generated to determine whether the feature of the input data corresponds to positive polarity or negative polarity. The emotion polarity model is generated by learning each of the learning data 46 in the learning source database 40 as a positive example when the emotion 44 of the learning data 46 is a positive polarity and as a negative example when the emotion 44 is a negative polarity. Can do. In addition, since the polarity is a binary value of a positive polarity and a negative polarity, an emotion polarity model is similarly generated by learning as a positive example in the case of negative polarity and as a negative example in the case of positive polarity. be able to.

次に、第３の実施の形態に係る感情推定対話装置について説明する。第３の実施の形態では、スコアを用いて各感情モデルに該当するか否かを判断する点が第１の実施の形態及び第２の実施形態と異なる。なお、第１の実施の形態及び第２の実施の形態と同一の構成及び処理については、同一の符号を付して説明を省略する。 Next, an emotion estimation dialogue apparatus according to the third embodiment will be described. The third embodiment is different from the first embodiment and the second embodiment in that it is determined whether or not each emotion model falls under the score. In addition, about the structure and process same as 1st Embodiment and 2nd Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

図１１を参照して、第３の実施の形態における感情推定モデル生成の処理ルーチンについて説明する。 With reference to FIG. 11, a processing routine for generating an emotion estimation model in the third embodiment will be described.

ステップ１００で、学習元データベース４０を取得し、次に、ステップ７００で、後述する生成済み感情モデルに該当する学習データ４６に設定されるフラグを確認し、フラグが立っていない学習データ４６の個数を感情４４毎にカウントして、次に、ステップ１０４で、最もカウント数の多かった学習データの感情をパラメータＸに設定し、次に、ステップ１０６の学習処理を実行する。 In step 100, the learning source database 40 is acquired. Next, in step 700, the flag set in the learning data 46 corresponding to the generated emotion model described later is confirmed, and the number of learning data 46 in which the flag is not set. Then, in step 104, the emotion of the learning data having the largest count is set as the parameter X, and then the learning process in step 106 is executed.

次に、ステップ７０２で、学習元データベース４０の感情Ｘに該当する学習データ４６、すなわち正例の学習データ４６に対応する感情モデルが生成されたことを示すフラグを立てる。次に、ステップ７０４で、フラグが立っていない学習データ４６の感情４４の種類が１種類か否かを判断する。２種類以上残っている場合には、ステップ７００へ戻り、フラグが立っていない学習データ４６について学習処理を繰り返し、感情毎の感情モデルＭ（ｉ）（ｉは感情モデルが生成された順に付与される通し番号）を生成する。 Next, in step 702, a flag is set indicating that the learning data 46 corresponding to the emotion X in the learning source database 40, that is, the emotion model corresponding to the positive learning data 46 has been generated. Next, in step 704, it is determined whether or not there is only one type of emotion 44 in the learning data 46 that is not flagged. If two or more types remain, the process returns to step 700 and the learning process is repeated for the learning data 46 with no flag set, and emotion models M (i) (i is assigned in the order in which the emotion models are generated). Serial number).

ステップ７０４で、フラグが立っていない学習データ４６の感情４４が１種類であると判断された場合には、ステップ７０６へ進んで、最後の１種類の感情をパラメータＸに設定する。次に、ステップ７０８で、フラグが立っている学習データ４６の中からランダムに所定数の学習データ４６を選択してフラグをはずす。ここで所定数は、最後の感情Ｘに対応する感情モデルを生成する際の負例とするのに適当な個数とし、例えば、最後の感情Ｘに該当する学習データ４６の個数と同数とする。 If it is determined in step 704 that there is only one type of emotion 44 in the learning data 46 that is not flagged, the process proceeds to step 706 and the last one type of emotion is set as the parameter X. Next, in step 708, a predetermined number of learning data 46 are randomly selected from the learning data 46 with the flag set, and the flag is removed. Here, the predetermined number is an appropriate number to be a negative example when the emotion model corresponding to the last emotion X is generated. For example, the predetermined number is the same as the number of learning data 46 corresponding to the last emotion X.

次に、ステップ７１０で学習処理を実行し、次に、ステップ１１２で、生成した感情毎の感情モデルＭ（ｉ）を、感情モデルＭ（１）、感情モデルＭ（２）、・・・、感情モデルＭ（ｎ）のように配列した感情推定モデルを構築する。なお、第３の実施の形態では、感情の種類がｎ種類の場合に、最後の１種類の感情についても感情モデルが生成されるため、配列の最後は感情モデルＭ（ｎ）となる。 Next, learning processing is executed in step 710, and then in step 112, the generated emotion model M (i) for each emotion is changed to emotion model M (1), emotion model M (2),. An emotion estimation model arranged like the emotion model M (n) is constructed. In the third embodiment, when there are n types of emotions, an emotion model is also generated for the last one type of emotion, so the last of the array is the emotion model M (n).

例えば、学習データの個数が「嫌」「嬉しい」「残念」「楽しい」「恐い」「不安」「寂しい」「腹立たしい」「悲しい」の順で多かったとすると、Ｍ（１）＝「嫌」モデル、Ｍ（２）＝「嬉しい」モデル、・・・、Ｍ（９）＝「悲しい」モデルとなり、図１２に示すように、「嫌」モデル、「嬉しい」モデル、「残念」モデル、「楽しい」モデル、「恐い」モデル、「不安」モデル、「寂しい」モデル、「腹立たしい」モデル及び「悲しい」モデルの順で各感情モデル５０が配列された感情推定モデルが構築される。感情推定モデルをＨＤＤ３０に記憶して処理を終了する。 For example, if the number of learning data is “dislike”, “happy”, “sorry”, “fun”, “scary”, “anxiety”, “lonely”, “angry”, “sad”, M (1) = “dislike” model , M (2) = “happy” model,..., M (9) = “sad” model, as shown in FIG. 12, “dislike” model, “happy” model, “sorry” model, “fun” The emotion estimation model in which the emotion models 50 are arranged in the order of the “model”, the “scary” model, the “anxiety” model, the “lonely” model, the “angry” model, and the “sad” model is constructed. The emotion estimation model is stored in HDD 30 and the process is terminated.

次に、図１３を参照して、第３の実施の形態における対話処理（図６）ステップ３０８の感情推定の処理ルーチンについて説明する。対話処理の他のステップについては、第１の実施の形態と同様の処理であるので説明を省略する。 Next, with reference to FIG. 13, an emotion estimation processing routine in step 308 of the interactive processing (FIG. 6) in the third embodiment will be described. Since the other steps of the dialogue process are the same as those in the first embodiment, description thereof will be omitted.

ステップ４００で、カウンタ値ｉに「１」をセットし、次に、ステップ８００で、入力データの特徴が示す感情の感情モデルＭ（ｉ）に対する当てはまりの度合いを示すスコアを算出する。スコアの算出には、感情モデルを生成した際の手法に対応した手法を用いる。例えば、ＳＶＭの手法では、正例の学習データと負例の学習データとの分離超平面上をスコア０とし、入力データの特徴が分離超平面から離れるほどスコアの絶対値を大きくし、正例側に離れるのであればプラス、負例側に離れるのであればマイナスとなるようにスコアを算出することができる。 In step 400, "1" is set to the counter value i. Next, in step 800, a score indicating the degree of fit of the emotion indicated by the feature of the input data with respect to the emotion model M (i) is calculated. For the calculation of the score, a method corresponding to the method used when the emotion model is generated is used. For example, in the SVM method, the score on the separation hyperplane of the positive example learning data and the negative example learning data is set to 0, and the absolute value of the score is increased as the feature of the input data moves away from the separation hyperplane. The score can be calculated so as to be positive if it is away from the side and negative if it is away from the negative example side.

次に、ステップ８０２で、算出したスコアが所定値以上か否かを判断する。ここで、所定値は、各感情モデルに該当するか否かを高い精度で判断するために、０より大きな値とすることができる。スコアが所定値以上の場合には、ステップ４０４へ進み、推定結果として感情モデルＭ（１）に該当する感情Ｆ（１）を出力する。 Next, in step 802, it is determined whether the calculated score is equal to or greater than a predetermined value. Here, the predetermined value can be set to a value larger than 0 in order to determine with high accuracy whether or not it corresponds to each emotion model. If the score is equal to or greater than the predetermined value, the process proceeds to step 404, and the emotion F (1) corresponding to the emotion model M (1) is output as the estimation result.

ステップ８０２で、スコアが所定値より小さいと判断された場合には、ステップ４０６へ進み、感情モデルＭ（１）が感情推定モデルの最後の感情モデルか否かを判断する。最後の感情モデルではない場合には、次の感情モデルとの比較を行うため、ステップ４０８へ進んでカウンタ値ｉをインクリメントしてステップ８００へ戻る。 If it is determined in step 802 that the score is smaller than the predetermined value, the process proceeds to step 406, where it is determined whether or not the emotion model M (1) is the last emotion model of the emotion estimation model. If it is not the last emotion model, the process proceeds to step 408 to increment the counter value i and return to step 800 for comparison with the next emotion model.

上記ステップを繰り返し、最後の感情モデルＭ（ｎ）に対してもスコアが所定値以上とならなかった場合には、ステップ４０６で肯定されてステップ８０４へ進み、所定種類のいずれの感情にも該当しなかったため、推定結果「不明」を出力してリターンする。 If the above steps are repeated and the score does not exceed the predetermined value even for the last emotion model M (n), the result is affirmative in step 406 and proceeds to step 804, which corresponds to any emotion of the predetermined type Since it did not, it outputs an estimation result “unknown” and returns.

推定結果「不明」の場合には、応答生成出力処理で、例えば相槌を打つなどの曖昧な応答を生成して出力するようにするとよい。 When the estimation result is “unknown”, it is preferable to generate and output an ambiguous response such as hitting a conflict in response generation output processing.

なお、第３の実施の形態の処理を第２の実施の形態の処理に適用することもできる。 The process of the third embodiment can also be applied to the process of the second embodiment.

また、上記実施の形態では、ユーザの顔を撮像して得られる画像データから抽出される画像特徴、ユーザの発話により入力される音声データから抽出されるテキスト特徴及び韻律特徴をまとめて入力データの特徴とする場合について説明したが、入力データの特徴は、画像特徴、テキスト特徴及び韻律特徴のいずれか１つでもよし、これらの中から選択した２つを組み合わせたものでもよい。また、脈拍や発汗などの人体の生理情報等その他の情報を取得し、この情報から抽出される特徴を用いてもよい。 In the above embodiment, the image data extracted from the image data obtained by capturing the user's face, the text features extracted from the speech data input by the user's utterance, and the prosodic features are collected together. Although the case of the feature has been described, the feature of the input data may be any one of an image feature, a text feature, and a prosodic feature, or may be a combination of two selected from these. In addition, other information such as physiological information of a human body such as a pulse or sweat may be acquired, and features extracted from this information may be used.

また、上記実施の形態では、テキスト特徴を取得した音声データを音声認識してテキストデータに変換してから抽出する場合について説明した、キーボードなどの入力手段から入力されるテキストデータを取得してテキスト特徴を抽出するようにしてもよい。 In the above-described embodiment, the text data input from the input means such as a keyboard is described for the case where the voice data from which the text feature has been acquired is voice-recognized and converted into text data and then extracted. Features may be extracted.

また、上記実施の形態では、学習方法としてＳＶＭの手法を用いる場合について説明したが、これに限定されるものではない。 Moreover, although the case where the SVM method is used as the learning method has been described in the above embodiment, the present invention is not limited to this.

第１の実施の形態に係る感情推定対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the emotion estimation dialogue apparatus which concerns on 1st Embodiment. 学習元データベースの一例を示す図である。It is a figure which shows an example of a learning origin database. 第１の実施の形態における感情推定モデル生成処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the processing routine of the emotion estimation model production | generation process in 1st Embodiment. 第１の実施の形態における学習処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the processing routine of the learning process in 1st Embodiment. 第１の実施の形態における感情推定モデルの一例を示す図である。It is a figure which shows an example of the emotion estimation model in 1st Embodiment. 第１の実施の形態における対話処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the process routine of the dialogue process in 1st Embodiment. 第１の実施の形態における感情推定処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the processing routine of the emotion estimation process in 1st Embodiment. 第２の実施の形態における感情推定モデル生成処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the process routine of the emotion estimation model production | generation process in 2nd Embodiment. 第２の実施の形態における感情推定モデルの一例を示す図である。It is a figure which shows an example of the emotion estimation model in 2nd Embodiment. 第２の実施の形態における対話処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the process routine of the dialogue process in 2nd Embodiment. 第３の実施の形態における感情推定モデル生成処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the processing routine of the emotion estimation model production | generation process in 3rd Embodiment. 第３の実施の形態における感情推定モデルの一例を示す図である。It is a figure which shows an example of the emotion estimation model in 3rd Embodiment. 第３の実施の形態における感情推定処理の処理ルーチンを示すフローチャートである。It is a flowchart which shows the processing routine of the emotion estimation process in 3rd Embodiment.

Explanation of symbols

１０感情推定対話装置
１２マイク
１４撮像装置
１６スピーカ
１８コンピュータ
５０感情モデル DESCRIPTION OF SYMBOLS 10 Emotion estimation dialogue apparatus 12 Microphone 14 Imaging apparatus 16 Speaker 18 Computer 50 Emotion model

Claims

Extraction means for extracting features of at least one input data of image data obtained by imaging the user, voice data input by the user's utterance, and text data input by the user other than the utterance;
Each of the emotions of the learning data corresponds to one emotion that is different from each other, and each of the plurality of learning data that is extracted in advance by associating features and emotions from a plurality of sample data. An emotion model generating means for generating a plurality of emotion models representing whether or not the emotion falls,
Whether the feature of the input data extracted by the extraction means corresponds to one of the emotions corresponding to each of the plurality of emotion models, an emotion having a large number of learning data corresponding to the one emotion Estimating means for estimating the user's emotion by judging in order from the model;
Emotion estimation device including

When generating each of the plurality of emotion models, the emotion model generation means generates the learning data corresponding to the emotion corresponding to the emotion model for which generation has been completed and generates the corresponding learning data in descending order. The emotion estimation apparatus according to claim 1, wherein the other emotion model is generated so as not to be included in the learning data when the other emotion model is generated.

Extraction means for extracting features of at least one input data of image data obtained by imaging the user, voice data input by the user's utterance, and text data input by the user other than the utterance;
A polarity for determining whether the feature of the input data extracted by the extraction means indicates an emotion representing the first polarity or an emotion representing the second polarity of the emotion opposite to the first polarity Discrimination means;
Each of the plurality of learning data corresponding to emotions each representing a different first polarity, and each of the plurality of learning data pre-extracted in association with the features and emotions representing the first polarity from a plurality of sample data , Corresponding to a plurality of first emotion models representing whether each of the emotions of the learning data corresponds to the emotion representing the one first polarity, and an emotion representing one different second polarity In addition, for each of the plurality of learning data, each of which is extracted in advance by associating the feature and the emotion representing the second polarity from the plurality of sample data, each of the emotions of the learning data is the one second An emotion model generating means for generating a plurality of second emotion models that indicate whether or not the emotion represents the polarity of
If the polarity determining unit determines that the feature of the input data is an emotion representing the first polarity, the feature of the input data extracted by the extracting unit is the plurality of first emotion models. It is judged in order from the first emotion model having a large number of learning data corresponding to the emotion representing the first polarity, which one of the emotions representing the first polarity corresponding to each of the emotions. When the polarity determination unit determines that the feature of the input data is an emotion representing the second polarity, the feature of the input data extracted by the extraction unit is the plurality of second emotions. It is determined in order from the second emotion model in which the number of learning data corresponding to the emotion representing one second polarity is large, corresponding to one of the emotions representing one second polarity corresponding to each of the models. To And estimating means for estimating the emotion of The,
Emotion estimation device including

For each of the plurality of learning data, an emotion polarity model representing whether each of the emotions in the learning data corresponds to an emotion representing the first polarity or an emotion representing the second polarity The apparatus further includes an emotion polarity model generating means for generating, wherein the polarity determining means is configured such that, based on the emotion polarity model, the characteristics of the input data include an emotion representing the first polarity and an emotion representing the second polarity. The emotion estimation apparatus according to claim 3, which determines which one is indicated.

When generating each of the plurality of first emotion models, the emotion model generation means generates the first learning model corresponding to the first emotion model for which generation has been completed and generates the corresponding learning data in descending order. The other first emotion model is generated so that the learning data corresponding to the emotion representing polarity is not included in the learning data when generating the other first emotion model, and each of the plurality of second emotion models is generated. Is generated in order of increasing number of the corresponding learning data, and learning data corresponding to the emotion representing the second polarity corresponding to the second emotion model for which generation has been completed is set as another second emotion. The emotion estimation apparatus according to claim 3 or 4, wherein the other second emotion model is generated so as not to be included in the learning data when the model is generated.

Computer
Extraction means for extracting features of at least one input data of image data obtained by imaging the user, voice data input by the user's utterance, and text data input by the user other than the utterance;
Each of the emotions of the learning data corresponds to one emotion that is different from each other, and each of the plurality of learning data that is extracted in advance by associating features and emotions from a plurality of sample data. An emotion model generating means for generating a plurality of emotion models representing whether or not the emotion falls,
Whether the feature of the input data extracted by the extraction means corresponds to one of the emotions corresponding to each of the plurality of emotion models, an emotion having a large number of learning data corresponding to the one emotion Estimating means for estimating the user's emotion by judging in order from the model;
Emotion estimation program to make it function.

Computer
Extraction means for extracting features of at least one input data of image data obtained by imaging the user, voice data input by the user's utterance, and text data input by the user other than the utterance;
A polarity for determining whether the feature of the input data extracted by the extraction means indicates an emotion representing the first polarity or an emotion representing the second polarity of the emotion opposite to the first polarity Discrimination means;
Each of the plurality of learning data corresponding to emotions each representing a different first polarity, and each of the plurality of learning data pre-extracted in association with the features and emotions representing the first polarity from a plurality of sample data , Corresponding to a plurality of first emotion models representing whether each of the emotions of the learning data corresponds to the emotion representing the one first polarity, and an emotion representing one different second polarity In addition, for each of the plurality of learning data, each of which is extracted in advance by associating the feature and the emotion representing the second polarity from the plurality of sample data, each of the emotions of the learning data is the one second An emotion model generating means for generating a plurality of second emotion models that indicate whether or not the emotion represents the polarity of
If the polarity determining unit determines that the feature of the input data is an emotion representing the first polarity, the feature of the input data extracted by the extracting unit is the plurality of first emotion models. It is judged in order from the first emotion model having a large number of learning data corresponding to the emotion representing the first polarity, which one of the emotions representing the first polarity corresponding to each of the emotions. When the polarity determination unit determines that the feature of the input data is an emotion representing the second polarity, the feature of the input data extracted by the extraction unit is the plurality of second emotions. It is determined in order from the second emotion model in which the number of learning data corresponding to the emotion representing one second polarity is large, corresponding to one of the emotions representing one second polarity corresponding to each of the models. To And estimating means for estimating the emotion of The,
Emotion estimation program to make it function.