JP2020187262A

JP2020187262A - Emotion estimation device, emotion estimation system, and emotion estimation method

Info

Publication number: JP2020187262A
Application number: JP2019091864A
Authority: JP
Inventors: 博子進藤; Hiroko Shindo; 秀行窪田; Hideyuki Kubota; 友基伊藤; Tomoki Ito; 昌治上田; Shoji Ueda; 幸子宮城; Sachiko Miyagi; 和也川口; Kazuya Kawaguchi
Original assignee: Omron Corp; NTT Docomo Inc; Omron Tateisi Electronics Co
Current assignee: Omron Corp; NTT Docomo Inc
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2020-11-19
Anticipated expiration: 2039-05-15
Also published as: JP6782329B1

Abstract

To accurately estimate emotions of a speaker.SOLUTION: The user device 1 includes: an acquisition unit 21 which acquires voice data VD indicating a sound including a voice of a speaker; a first estimation unit 252 which estimates whether emotion of the speaker belongs to a positive group GE1 to which positive emotions belong or a negative group GE2 to which negative emotions belong based on a recognition character string SD obtained by performing a voice recognition process on the voice data VD; a second estimation unit 253 which estimates whether the emotion of the speaker belongs to an excitement group GE3 to which excitement emotions belong or a non-excitement group GE4 to which non-excitement emotions belong based on a sound feature quantity of the voice data VD; and an emotion estimation unit 254 which estimates the speaker's emotion based on the estimation result of the first estimation unit 252 and the estimation result of the second estimation unit 253.SELECTED DRAWING: Figure 5

Description

本発明は、感情推定装置、感情推定システム、及び感情推定方法に関する。 The present invention relates to an emotion estimation device, an emotion estimation system, and an emotion estimation method.

近年、喜び、怒り及び悲しみ等の感情を推定するサービスが普及している。例えば、特許文献１には、発話者の音声を含む音を示す音声データに対して音声認識処理を施し、音声認識処理から得られる認識文字列に基づいて、各感情に対して、発話者の感情である可能性の程度を示すスコアを算出し、音声データが示す音の特徴量に基づいて各感情のスコアを算出し、文字列から得られた各感情のスコアと、音の特徴量から得られた各感情のスコアとの平均値のうち最も大きいスコアの感情を、発話者の感情として推定することが開示されている。 In recent years, services for estimating emotions such as joy, anger, and sadness have become widespread. For example, in Patent Document 1, voice recognition processing is performed on voice data indicating a sound including the voice of the speaker, and based on the recognition character string obtained from the voice recognition processing, the speaker's voice is applied to each emotion. A score indicating the degree of possibility of being an emotion is calculated, a score of each emotion is calculated based on a sound feature amount indicated by voice data, and each emotion score obtained from a character string and a sound feature amount are used. It is disclosed that the emotion with the highest score among the average values of the obtained emotion scores is estimated as the speaker's emotion.

特開２０１２−７３９４１号公報Japanese Unexamined Patent Publication No. 2012-73941

しかしながら、上述した従来技術では、音声認識処理から得られる認識文字列に基づいて感情を推定する場合、認識文字列からは、音声の抑揚が失われており、感情の推定を誤る場合がある。一方、音の特徴量に基づいて感情を推定する場合、音の特徴量には、音声の内容が肯定的であるか否定的であるかが含まれていなく、感情の推定を誤る場合がある。従って、認識文字列に基づく各感情のスコアと、音の特徴量に基づく各感情のスコアとには、誤推定のスコアが含まれ得る。このため、文字列から得られた各感情のスコアと音の特徴量から得られた各感情のスコアの平均値に誤差が含まれることがある。この結果、平均値に基づく、従来の感情推定方法は、感情の推定を誤る場合がある。 However, in the above-mentioned conventional technique, when the emotion is estimated based on the recognition character string obtained from the voice recognition process, the intonation of the voice is lost from the recognition character string, and the emotion estimation may be erroneous. On the other hand, when the emotion is estimated based on the sound feature, the sound feature does not include whether the content of the voice is positive or negative, and the emotion estimation may be erroneous. .. Therefore, the score of each emotion based on the recognition character string and the score of each emotion based on the sound feature amount may include a misestimated score. Therefore, an error may be included in the average value of the score of each emotion obtained from the character string and the score of each emotion obtained from the feature amount of the sound. As a result, the conventional emotion estimation method based on the average value may erroneously estimate the emotion.

本発明の好適な態様にかかる感情推定装置は、発話者の音声を含む音を示す音声データを取得する取得部と、前記音声データに音声認識処理を施して得られた認識文字列に基づいて、前記発話者の感情が、肯定的な感情が属する第１グループと、否定的な感情が属する第２グループとの何れに属するかを推定する第１推定部と、前記音声データが示す音の特徴量に基づいて、前記発話者の感情が、興奮時の感情が属する第３グループと、興奮時ではない感情が属する第４グループとの何れに属するかを推定する第２推定部と、前記第１推定部の推定結果と前記第２推定部の推定結果とに基づいて、前記発話者の感情を推定する感情推定部とを備える。 The emotion estimation device according to the preferred embodiment of the present invention is based on an acquisition unit that acquires voice data indicating a sound including the voice of the speaker and a recognition character string obtained by performing voice recognition processing on the voice data. , The first estimation unit that estimates whether the speaker's emotion belongs to the first group to which the positive emotion belongs or the second group to which the negative emotion belongs, and the sound indicated by the voice data. Based on the feature quantity, the second estimation unit that estimates whether the speaker's emotion belongs to the third group to which the emotion during excitement belongs or the fourth group to which the emotion not during excitement belongs, and the above. It includes an emotion estimation unit that estimates the emotion of the speaker based on the estimation result of the first estimation unit and the estimation result of the second estimation unit.

本発明の好適な態様にかかる感情推定システムは、前述の感情推定装置と、前記感情推定装置と通信可能な端末装置とを備える感情推定システムであって、前記端末装置は、前記発話者の音声を含む音を集音する集音部と、前記発話者の音声を含む音を示す前記音声データを、前記感情推定装置に送信する送信部と、前記感情推定装置から、前記認識文字列と、前記感情推定部が推定した前記発話者の感情を示す感情データとを受信する受信部と、前記認識文字列に対して、前記感情データが示す感情に応じた処理を施して得られるデータを出力する出力部とを備える。 The emotion estimation system according to a preferred embodiment of the present invention is an emotion estimation system including the above-mentioned emotion estimation device and a terminal device capable of communicating with the emotion estimation device, and the terminal device is a voice of the speaker. A sound collecting unit that collects sounds including the above, a transmitting unit that transmits the voice data indicating the sound including the voice of the speaker to the emotion estimation device, and the recognition character string from the emotion estimation device. The receiving unit that receives the emotion data indicating the emotion of the speaker estimated by the emotion estimation unit and the recognition character string are processed according to the emotion indicated by the emotion data, and the data obtained is output. It is provided with an output unit.

本発明の好適な態様にかかる感情推定方法は、発話者の音声を含む音を示す音声データを取得し、前記音声データに音声認識処理を施して得られた認識文字列に基づいて、前記発話者の感情が、肯定的な感情が属する第１グループと、否定的な感情が属する第２グループとの何れに属するかを推定し、前記音声データが示す音の特徴量に基づいて、前記発話者の感情が、興奮時の感情が属する第３グループと、興奮時ではない感情が属する第４グループとの何れに属するかを推定し、前記発話者の感情が前記第１グループと前記第２グループとの何れに属するかを示す推定結果と、前記発話者の感情が前記第３グループと前記第４グループとの何れに属するかを示す推定結果とに基づいて、前記発話者の感情を推定する処理をコンピュータが実行する。 In the emotion estimation method according to a preferred embodiment of the present invention, the utterance is based on a recognition character string obtained by acquiring voice data indicating a sound including the voice of the speaker and performing voice recognition processing on the voice data. It is estimated whether the person's emotion belongs to the first group to which the positive emotion belongs or the second group to which the negative emotion belongs, and the utterance is based on the characteristic amount of the sound indicated by the voice data. It is estimated whether the emotions of the person belong to the third group to which the emotions at the time of excitement belong or the fourth group to which the emotions not at the time of excitement belong, and the emotions of the speaker belong to the first group and the second group. The emotion of the speaker is estimated based on the estimation result indicating which of the groups the speaker belongs to and the estimation result of which of the third group and the fourth group the speaker's emotion belongs to. The computer executes the processing to be performed.

本発明によれば、発話者の感情を精度良く推定することができる。 According to the present invention, the emotion of the speaker can be estimated with high accuracy.

本発明の第１実施形態にかかるユーザ装置１を示すブロック図。The block diagram which shows the user apparatus 1 which concerns on 1st Embodiment of this invention. 感情のグループ分けを示す図。The figure which shows the grouping of emotions. 解析用辞書データ３１の記憶内容の一例を示す図。The figure which shows an example of the storage contents of the dictionary data 31 for analysis. 感情分類データ３３の記憶内容の一例を示す図。The figure which shows an example of the memory content of the emotion classification data 33. ユーザ装置１の機能の概要を示す図。The figure which shows the outline of the function of the user apparatus 1. 推定部２５の処理を示すフローチャートを示す図。The figure which shows the flowchart which shows the process of the estimation part 25. 第２実施形態にかかるユーザ装置１ａを示すブロック図。The block diagram which shows the user apparatus 1a which concerns on 2nd Embodiment. 第２実施形態にかかるユーザ装置１ａの機能の概要を示す図。The figure which shows the outline of the function of the user apparatus 1a which concerns on 2nd Embodiment. 第２実施形態にかかる推定部２５ａの処理を示すフローチャートを示す図。The figure which shows the flowchart which shows the process of the estimation part 25a which concerns on 2nd Embodiment. 感情推定システム１００を示すブロック図。The block diagram which shows the emotion estimation system 100.

１．第１実施形態
図１は、本発明の第１実施形態にかかるユーザ装置１を示すブロック図である。ユーザ装置１は、スマートフォンを想定する。ユーザ装置１が、「感情推定装置」の一例である。ただし、ユーザ装置１としては、任意の情報処理装置を採用することができ、例えば、パーソナルコンピュータ等の端末型の情報機器であってもよいし、ノートパソコン、ウェアラブル端末及びタブレット端末等の可搬型の情報端末であってもよい。 1. 1. 1st Embodiment FIG. 1 is a block diagram showing a user apparatus 1 according to the first embodiment of the present invention. The user device 1 is assumed to be a smartphone. The user device 1 is an example of an “emotion estimation device”. However, any information processing device can be adopted as the user device 1, and for example, it may be a terminal-type information device such as a personal computer, or a portable type such as a notebook computer, a wearable terminal, or a tablet terminal. It may be an information terminal of.

ユーザ装置１は、処理装置２、記憶装置３、表示装置４、操作装置５、通信装置６、放音装置７、及び、集音装置８を具備するコンピュータシステムにより実現される。ユーザ装置１の各要素は、情報を通信するための単体又は複数のバス９で相互に接続される。なお、本明細書における「装置」という用語は、回路、デバイス又はユニット等の他の用語に読替えてもよい。また、ユーザ装置１の各要素は、単数又は複数の機器で構成され、ユーザ装置１の一部の要素は省略されてもよい。集音装置８は、「集音部」の一例である。 The user device 1 is realized by a computer system including a processing device 2, a storage device 3, a display device 4, an operating device 5, a communication device 6, a sound emitting device 7, and a sound collecting device 8. Each element of the user device 1 is connected to each other by a single unit or a plurality of buses 9 for communicating information. The term "device" in the present specification may be read as another term such as a circuit, a device, or a unit. Further, each element of the user device 1 may be composed of a single device or a plurality of devices, and some elements of the user device 1 may be omitted. The sound collecting device 8 is an example of a “sound collecting unit”.

ユーザ装置１は、ユーザ装置１のユーザである発話者の音声を含む音を示す音声データに対して音声認識処理を施して得られた認識文字列を、他者が利用する装置に送信する機能、又は、認識文字列を読み上げて他者に聞かせる機能を有する。さらに、ユーザ装置１は、発話者の音声に基づいて発話者の感情を推定し、認識文字列に対して、推定した感情に応じた絵文字を認識文字列に付加する、又は、推定した感情に応じた抑揚で認識文字列を読み上げることにより、コミュニケーションに必要な感情表現を付加することが可能になる。コミュニケーションをより円滑にするため、発話者の感情の推定精度が向上することが好ましい。
第１実施形態では、ユーザ装置１は、人が取り得る複数の感情を、肯定的であるか又は否定的であるかによってグループ分けした結果と、興奮時か否かとによってグループ分けした結果とに基づいて、発話者の感情を推定する。 The user device 1 has a function of transmitting a recognition character string obtained by performing voice recognition processing on voice data indicating a sound including a voice of a speaker who is a user of the user device 1 to a device used by another person. Or, it has a function to read out the recognition character string and let others hear it. Further, the user device 1 estimates the emotion of the speaker based on the voice of the speaker, and adds a pictogram corresponding to the estimated emotion to the recognition character string to the recognition character string, or adds a pictogram corresponding to the estimated emotion to the recognition character string, or to the estimated emotion. By reading out the recognition character string with the corresponding intonation, it is possible to add emotional expressions necessary for communication. In order to facilitate communication, it is preferable to improve the estimation accuracy of the speaker's emotions.
In the first embodiment, the user device 1 has a result of grouping a plurality of emotions that a person can take according to whether they are positive or negative, and a result of grouping them according to whether they are excited or not. Estimate the speaker's emotions based on.

図２は、感情のグループ分けを示す図である。人が取り得る複数の感情を、肯定的な感情が属するポジティブグループＧＥ１と、否定的な感情が属するネガティブグループＧＥ２とに分類すると、ポジティブグループＧＥ１には喜びが属し、ネガティブグループＧＥ２には怒り及び悲しみが属する。ポジティブグループＧＥ１は、「第１グループ」の一例である。ネガティブグループＧＥ２は、「第２グループ」の一例である。 FIG. 2 is a diagram showing emotional grouping. When multiple emotions that a person can take are classified into a positive group GE1 to which positive emotions belong and a negative group GE2 to which negative emotions belong, joy belongs to positive group GE1 and anger and anger belong to negative group GE2. Sadness belongs. The positive group GE1 is an example of the "first group". Negative group GE2 is an example of a "second group".

また、人が取り得る複数の感情を、興奮時の感情が属する興奮グループＧＥ３と、非興奮時の感情が属する非興奮グループＧＥ４とに分類すると、興奮グループＧＥ３には喜び及び怒りが属し、興奮時でない感情が属する非興奮グループＧＥ４には悲しみが属する。興奮グループＧＥ３は、「第３グループ」の一例である。非興奮グループＧＥ４は、「第４グループ」の一例である。 Further, when a plurality of emotions that a person can take are classified into an excitement group GE3 to which emotions at the time of excitement belong and a non-excitement group GE4 to which emotions at the time of non-excitement belong, the excitement group GE3 belongs to joy and anger and is excited. Sadness belongs to the non-excited group GE4 to which non-time emotions belong. The excitement group GE3 is an example of the "third group". The non-excited group GE4 is an example of the "fourth group".

説明を図１に戻す。処理装置２は、ユーザ装置１の全体を制御するプロセッサであり、例えば、単数又は複数のチップで構成される。処理装置２は、例えば、周辺装置とのインタフェース、演算装置及びレジスタ等を含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成される。なお、処理装置２の機能の一部又は全部を、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）等のハードウェアによって実現してもよい。処理装置２は、各種の処理を並列的又は逐次的に実行する。 The explanation is returned to FIG. The processing device 2 is a processor that controls the entire user device 1, and is composed of, for example, a single chip or a plurality of chips. The processing device 2 is composed of, for example, a central processing unit (CPU) including an interface with peripheral devices, an arithmetic unit, registers, and the like. Part or all of the functions of the processing device 2 are realized by hardware such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), FPGA (Field Programmable Gate Array), etc. You may. The processing device 2 executes various processes in parallel or sequentially.

記憶装置３は、処理装置２が読取可能な記録媒体であり、処理装置２が実行する制御プログラムＰＲを含む複数のプログラム、解析用辞書データ３１、及び、感情分類データ３３を記憶する。記憶装置３は、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ROM）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ROM）、ＲＡＭ（Random Access Memory）等の記憶回路の１種類以上で構成される。 The storage device 3 is a recording medium that can be read by the processing device 2, and stores a plurality of programs including the control program PR executed by the processing device 2, the analysis dictionary data 31, and the emotion classification data 33. The storage device 3 is composed of, for example, one or more types of storage circuits such as ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), and RAM (Random Access Memory).

図３は、解析用辞書データ３１の記憶内容の一例を示す図である。解析用辞書データ３１は、形態素ごとに、品詞、品詞細分類、及び、原形データを互いに対応付けたデータである。形態素は、意味を有する表現要素の最小単位の文字列である。品詞は、文法的性質によって分類された単語の種別であり、名詞、動詞、及び形容詞等である。品詞細分類は、品詞をさらに細分類した項目である。原形データは、該当の形態素が活用する単語である場合、単語の原形を示す文字列であり、該当の形態素が活用しない単語である場合、該当の形態素と同一の文字列である。 FIG. 3 is a diagram showing an example of the stored contents of the analysis dictionary data 31. The analysis dictionary data 31 is data in which part of speech, part of speech subclassification, and original form data are associated with each other for each morpheme. A morpheme is a character string that is the smallest unit of a meaningful expression element. Part of speech is a type of word classified according to its grammatical nature, such as nouns, verbs, and adjectives. Part of speech subclassification is an item in which part of speech is further subdivided. The original form data is a character string indicating the original form of the word when the word is utilized by the corresponding morpheme, and is the same character string as the corresponding morpheme when the word is not utilized by the relevant morpheme.

図４は、感情分類データ３３の記憶内容の一例を示す図である。感情分類データ３３は、文字列を、喜び、怒り、及び、悲しみの何れかに分類したデータである。図４の例では、喜びに分類された文字列群３３１は、「嬉しい」、「合格」、「勝つ」、及び、「勝っ」等を含む。同様に、怒りに分類された文字列群３３２は、「イライラ」、及び、「むかっ腹」等を含む。同様に、悲しみに分類された文字列群３３３は、「悲しい」、及び、「負ける」等を含む。 FIG. 4 is a diagram showing an example of the stored contents of the emotion classification data 33. The emotion classification data 33 is data in which a character string is classified into any of joy, anger, and sadness. In the example of FIG. 4, the character string group 331 classified as joy includes "happy", "pass", "win", "win", and the like. Similarly, the character string group 332 classified as anger includes "irritated", "mucked up" and the like. Similarly, the character string group 333 classified as sadness includes "sad", "losing", and the like.

説明を図１に戻す。表示装置４は、処理装置２による制御のもとで各種の画像を表示する。例えば液晶表示パネル、又は有機ＥＬ（Electro Luminescence）表示パネル等の各種の表示パネルが表示装置４として好適に利用される。 The explanation is returned to FIG. The display device 4 displays various images under the control of the processing device 2. For example, various display panels such as a liquid crystal display panel or an organic EL (Electro Luminescence) display panel are preferably used as the display device 4.

操作装置５は、ユーザ装置１が使用する情報を入力するための機器である。操作装置５は、ユーザによる操作を受け付ける。具体的には、操作装置５は、数字及び文字等の符号を入力するための操作と、表示装置４が表示するアイコンを選択するための操作とを受け付ける。例えば、表示装置４の表示面に対する接触を検出するタッチパネルが操作装置５として好適である。なお、利用者が操作可能な操作子を操作装置５が含んでもよい。操作子は、例えば、タッチペンである。 The operation device 5 is a device for inputting information used by the user device 1. The operation device 5 accepts an operation by the user. Specifically, the operating device 5 accepts an operation for inputting a code such as a number and a character and an operation for selecting an icon displayed by the display device 4. For example, a touch panel that detects contact with the display surface of the display device 4 is suitable as the operation device 5. The operation device 5 may include an operator that can be operated by the user. The operator is, for example, a stylus.

通信装置６は、ネットワークを介して他の装置と通信を行うためのハードウェア（送受信デバイス）である。通信装置６は、例えば、ネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュール等とも呼ばれる。 The communication device 6 is hardware (transmission / reception device) for communicating with another device via a network. The communication device 6 is also called, for example, a network device, a network controller, a network card, a communication module, or the like.

放音装置７は、例えばスピーカで構成され、処理装置２による制御のもとで、音を放音する。集音装置８は、例えばマイクロフォン及びＡＤ変換器で構成され、処理装置２による制御のもとで、発話者の音声を含む音を集音する。マイクロフォンは、集音した音声を電気信号に変換する。ＡＤ変換器は、マイクロフォンが変換した電気信号をＡＤ変換して、図５に示す音声データＶＤに変換する。音声データＶＤが示す音には、発話者の音声に加えて、発話者の周囲から発せられた雑音が含まれ得る。 The sound emitting device 7 is composed of, for example, a speaker, and emits sound under the control of the processing device 2. The sound collecting device 8 is composed of, for example, a microphone and an AD converter, and collects sound including the voice of the speaker under the control of the processing device 2. The microphone converts the collected voice into an electric signal. The AD converter AD-converts the electric signal converted by the microphone and converts it into the voice data VD shown in FIG. The sound indicated by the voice data VD may include noise emitted from the surroundings of the speaker in addition to the voice of the speaker.

１．１．第１実施形態の機能
処理装置２は、記憶装置３から制御プログラムＰＲを読み取り実行することによって、取得部２１、推定部２５、及び、出力部２６として機能する。
図５を用いて、処理装置２によって実現される機能について説明する。 1.1. The function processing device 2 of the first embodiment functions as an acquisition unit 21, an estimation unit 25, and an output unit 26 by reading and executing the control program PR from the storage device 3.
The function realized by the processing apparatus 2 will be described with reference to FIG.

図５は、ユーザ装置１の機能の概要を示す図である。取得部２１は、集音装置８が集音した発話者の音声を示す音声データＶＤを取得する。推定部２５は、音声データＶＤに基づいて、発話者の感情を推定する。具体的には、推定部２５は、音声認識処理部２５１、第１推定部２５２、第２推定部２５３、及び、感情推定部２５４を含む。 FIG. 5 is a diagram showing an outline of the functions of the user device 1. The acquisition unit 21 acquires voice data VD indicating the voice of the speaker collected by the sound collecting device 8. The estimation unit 25 estimates the emotion of the speaker based on the voice data VD. Specifically, the estimation unit 25 includes a voice recognition processing unit 251, a first estimation unit 252, a second estimation unit 253, and an emotion estimation unit 254.

音声認識処理部２５１は、音声データＶＤに音声認識処理を施して認識文字列ＳＤを出力する。音声認識処理部２５１は、例えば、予め用意された音響モデル及び言語モデルを用いて、音声から文字列を認識する手法を含む、種々の手法によって、認識文字列ＳＤを出力する。 The voice recognition processing unit 251 performs voice recognition processing on the voice data VD and outputs the recognition character string SD. The voice recognition processing unit 251 outputs the recognition character string SD by various methods including a method of recognizing a character string from a voice by using, for example, an acoustic model and a language model prepared in advance.

第１推定部２５２は、文字列感情推定処理を実行する。文字列感情推定処理は、認識文字列ＳＤに基づいて、発話者の感情が、ポジティブグループＧＥ１と、ネガティブグループＧＥ２との何れに属するかを推定する。 The first estimation unit 252 executes the character string emotion estimation process. The character string emotion estimation process estimates whether the speaker's emotion belongs to the positive group GE1 or the negative group GE2 based on the recognition character string SD.

より詳細には、第１推定部２５２は、形態素解析処理部２５２１及び感情スコア算出処理部２５２２を有する。形態素解析処理部２５２１は、解析用辞書データ３１を参照して、認識文字列ＳＤに対して形態素解析処理を施して、補正後認識文字列ＣＳＤを出力する。形態素解析処理は、認識文字列ＳＤを形態素ごとに分解する処理である。形態素解析処理において、解析用辞書データ３１の品詞及び品詞細分類が利用される。補正後認識文字列ＣＳＤは、感動詞等といった、発話者の感情を推定するためには不要な文字列を除いた文字列である。 More specifically, the first estimation unit 252 includes a morphological analysis processing unit 2521 and an emotion score calculation processing unit 2522. The morphological analysis processing unit 2521 refers to the analysis dictionary data 31 to perform morphological analysis processing on the recognition character string SD, and outputs the corrected recognition character string CSD. The morphological analysis process is a process of decomposing the recognition character string SD into morphemes. In the morphological analysis process, the part of speech and the part of speech subclassification of the analysis dictionary data 31 are used. The corrected recognition character string CSD is a character string excluding a character string that is unnecessary for estimating the emotion of the speaker, such as an interjection.

感情スコア算出処理部２５２２は、感情分類データ３３に含まれる文字列と、補正後認識文字列ＣＳＤとを比較することにより、各感情に対して、発話者の感情である可能性の程度を示すスコアを算出する。より詳細には、感情スコア算出処理部２５２２は、補正後認識文字列ＣＳＤが、感情分類データ３３に含まれる文字列を含む場合に、この補正後認識文字列ＣＳＤに含まれる文字列に対応する感情のスコアを増加させる。
例えば、補正後認識文字列ＣＳＤが「今日試合に勝った」であれば、感情スコア算出処理部２５２２は、以下のような感情ごとのスコアを出力する。 The emotion score calculation processing unit 2522 indicates the degree of possibility that the emotion is the speaker's emotion for each emotion by comparing the character string included in the emotion classification data 33 with the corrected recognition character string CSD. Calculate the score. More specifically, the emotion score calculation processing unit 2522 corresponds to the character string included in the corrected recognition character string CSD when the corrected recognition character string CSD includes the character string included in the emotion classification data 33. Increases emotional scores.
For example, if the corrected recognition character string CSD is "winning the game today", the emotion score calculation processing unit 2522 outputs the following score for each emotion.

喜び１
怒り０
悲しみ０
上述の例では、補正後認識文字列ＣＳＤに、感情分類データ３３に含まれる「勝っ」が含まれているため、感情スコア算出処理部２５２２は、「勝っ」に対応する喜びのスコアを１増加させる。増加させるスコアの量は、１に限らなく、感情分類データ３３に含まれる文字列ごとに異なってもよい。例えば、より喜びを強く示す文字列のスコアの増加量を２としてもよい。さらに、補正後認識文字列ＣＳＤに、感情分類データ３３に含まれる文字列、及び、内容を強調する文字列が含まれる場合、感情スコア算出処理部２５２２は、感情のスコアの増加量を大きくしてもよい。例えば、補正後認識文字列ＣＳＤが「今日試合に勝ててとても嬉しい」であれば、補正後認識文字列ＣＳＤに感情分類データ３３に含まれる「嬉しい」が含まれており、かつ、「とても」という内容を強調する文字列が含まれるため、感情スコア算出処理部２５２２は、例えば、喜びのスコアを２増加させる。補正後認識文字列ＣＳＤのうち、どの文字列が、内容を強調する文字列であるか否かは、形態素解析処理によって得られる形態素によって判定することができる。以下の例では、説明を容易にするため、増加させるスコアの量が１であるとする。
さらに、補正後認識文字列ＣＳＤに、感情分類データ３３に含まれる文字列、及び、内容を否定する文字列が含まれる場合、感情スコア算出処理部２５２２は、この補正後認識文字列ＣＳＤに含まれる文字列に対応する感情のスコアを増加させる処理とは異なる処理を実行してもよい。例えば、補正後認識文字列ＣＳＤが「今日試合に勝つことができなかった」であれば、補正後認識文字列ＣＳＤに感情分類データ３３に含まれる「勝つ」が含まれるが、「なかっ」という内容を否定する文字列が含まれるため、感情スコア算出処理部２５２２は、例えば、悲しみのスコアを１増加させる。補正後認識文字列ＣＳＤのうち、どの文字列が、内容を否定する文字列であるか否かは、形態素解析処理によって得られる形態素によって判定することができる。このように、形態素解析処理によって、補正後認識文字列ＣＳＤが肯定的な内容なのか否定的な内容かを推定することが可能である。以下の例では、説明を容易にするため、補正後認識文字列ＣＳＤに、感情分類データ３３に含まれる文字列が含まれれば、この補正後認識文字列ＣＳＤに含まれる文字列に対応する感情のスコアを増加させることとして説明を行う。 Joy 1
Anger 0
Sadness 0
In the above example, since the corrected recognition character string CSD includes the “win” included in the emotion classification data 33, the emotion score calculation processing unit 2522 increases the joy score corresponding to the “win” by 1. Let me. The amount of the score to be increased is not limited to 1, and may differ for each character string included in the emotion classification data 33. For example, the amount of increase in the score of the character string that indicates more joy may be set to 2. Further, when the corrected recognition character string CSD includes the character string included in the emotion classification data 33 and the character string that emphasizes the content, the emotion score calculation processing unit 2522 increases the amount of increase in the emotion score. You may. For example, if the corrected recognition character string CSD is "very happy to win the game today", the corrected recognition character string CSD contains "happy" included in the emotion classification data 33 and is "very". Since the character string emphasizing the content is included, the emotion score calculation processing unit 2522 increases, for example, the joy score by 2. Which of the corrected recognition character string CSDs is a character string that emphasizes the content can be determined by the morpheme obtained by the morphological analysis process. In the following example, for ease of explanation, the amount of score to be increased is assumed to be 1.
Further, when the corrected recognition character string CSD includes a character string included in the emotion classification data 33 and a character string denying the content, the emotion score calculation processing unit 2522 is included in the corrected recognition character string CSD. A process different from the process of increasing the emotional score corresponding to the character string may be executed. For example, if the corrected recognition character string CSD is "could not win the game today", the corrected recognition character string CSD includes "win" included in the emotion classification data 33, but is said to be "not". Since the character string denying the content is included, the emotion score calculation processing unit 2522 increases the score of sadness by 1, for example. Which character string of the corrected recognition character string CSD is a character string whose content is negated can be determined by the morpheme obtained by the morphological analysis process. In this way, it is possible to estimate whether the corrected recognition character string CSD has positive content or negative content by the morphological analysis process. In the following example, if the corrected recognition character string CSD includes the character string included in the emotion classification data 33, the emotion corresponding to the character string included in the corrected recognition character string CSD is used for ease of explanation. It will be explained as increasing the score of.

第１推定部２５２は、感情ごとのスコアに基づいて、発話者の感情が、ポジティブグループＧＥ１と、ネガティブグループＧＥ２との何れに属するかを示す第１感情グループデータＧＤ１を出力する。第１感情グループデータＧＤ１は、例えば、以下に示す２つの態様がある。 The first estimation unit 252 outputs the first emotion group data GD1 indicating whether the speaker's emotion belongs to the positive group GE1 or the negative group GE2 based on the score for each emotion. The first emotion group data GD1 has, for example, the following two aspects.

第１感情グループデータＧＤ１の第１の態様は、ポジティブグループＧＥ１を示す識別子及びネガティブグループＧＥ２を示す識別子の何れか一方である。例えば、第１推定部２５２は、下記（１）式を満たす場合に、ポジティブグループＧＥ１を示す識別子を第１感情グループデータＧＤ１として出力する。一方、（１）式を満たさない場合、第１推定部２５２は、ネガティブグループＧＥ２を示す識別子を第１感情グループデータＧＤ１として出力する。 The first aspect of the first emotion group data GD1 is either an identifier indicating a positive group GE1 or an identifier indicating a negative group GE2. For example, the first estimation unit 252 outputs an identifier indicating the positive group GE1 as the first emotion group data GD1 when the following equation (1) is satisfied. On the other hand, when the equation (1) is not satisfied, the first estimation unit 252 outputs an identifier indicating the negative group GE2 as the first emotion group data GD1.

喜びのスコア＞α×（怒りのスコア＋悲しみのスコア）／２（１） Joy score> α × (anger score + sadness score) / 2 (1)

αは、例えば、ユーザ装置１の開発者又は発話者などによって設定される値である。 α is a value set by, for example, the developer of the user device 1 or the speaker.

第１感情グループデータＧＤ１の第２の態様は、ポジティブグループＧＥ１を示す識別子及びネガティブグループＧＥ２を示す識別子の何れか一方と、各感情のスコアとを含む。 The second aspect of the first emotion group data GD1 includes either an identifier indicating a positive group GE1 or an identifier indicating a negative group GE2, and a score of each emotion.

第２推定部２５３は、音声感情推定処理を実行する。音声感情推定処理は、音声データＶＤが示す音の特徴量に基づいて、発話者の感情が、興奮時の感情が属する興奮グループＧＥ３と、興奮時ではない感情が属する非興奮グループＧＥ４との何れに属するかを推定する処理である。 The second estimation unit 253 executes the voice emotion estimation process. In the voice emotion estimation process, the speaker's emotion is either the excitement group GE3 to which the emotion at the time of excitement belongs or the non-excitement group GE4 to which the emotion at the time of excitement belongs based on the feature amount of the sound indicated by the voice data VD. It is a process of estimating whether or not it belongs to.

より詳細には、第２推定部２５３は、音特徴量抽出処理部２５３１及び学習モデル実行処理部２５３２を有する。音特徴量抽出処理部２５３１は、音声データＶＤから音の特徴量を抽出する。音の特徴量とは、音声データＶＤが示す音の特徴を示す特徴量である。音の特徴量は、例えば、MFCC（Mel-Frequency Cepstrum Coefficients）12次元、ラウドネス、基本周波数(F0)、音声確率、ゼロ交差率、HNR（Harmonics-to-Noise-Ratio）、及びこれらの一次微分、MFCC及びラウドネスの二次微分の計４７個である。ラウドネスは、音の大きさであり、人間の聴覚が感じる音の強さを示す。音声確率は、音声データＶＤが示す音に音声が含まれる確率を示す。ゼロ交差率は、音圧がゼロとなった回数である。興奮時には、例えば、非興奮時と比較して、基本周波数が高くなり、且つ、ラウドネスが大きくなる傾向がある。また、音特徴量抽出処理部２５３１は、音声データＶＤに対して補正処理を実行し、補正処理の実行により得られた補正後音声データから、音の特徴量を抽出してもよい。補正処理は、例えば、音声データＶＤから無音部分のデータを除去する処理、及び、音声データＶＤが示す音に含まれるノイズを除去する処理の一方又は両方である。
学習モデル実行処理部２５３２は、抽出した音の特徴量を、予め学習した学習モデルに入力して、この学習モデルから得られた出力結果に基づいて、第２感情グループデータＧＤ２を出力する。第２感情グループデータＧＤ２は、発話者の感情が、興奮グループＧＥ３と非興奮グループＧＥ４との何れに属するかを示す。
予め学習した学習モデルは、例えば、音の特徴量が入力されると、感情ごとのスコアを出力するモデルである。第２推定部２５３は、下記（２）式を満たす場合、発話者の感情が興奮グループＧＥ３に属することを示す第２感情グループデータＧＤ２を出力する。一方、（２）式を満たさない場合、第２推定部２５３は、発話者の感情が非興奮グループＧＥ４に属することを示す第２感情グループデータＧＤ２を出力する。 More specifically, the second estimation unit 253 has a sound feature amount extraction processing unit 2531 and a learning model execution processing unit 2532. The sound feature amount extraction processing unit 2531 extracts the sound feature amount from the voice data VD. The sound feature amount is a feature amount indicating the sound feature indicated by the voice data VD. Sound features include, for example, MFCC (Mel-Frequency Cepstrum Coefficients) 12 dimensions, loudness, fundamental frequency (F0), voice probability, zero crossover ratio, HNR (Harmonics-to-Noise-Ratio), and their first derivative. , MFCC and loudness quadratic differentiation total 47. Loudness is the loudness of a sound, which indicates the intensity of the sound felt by human hearing. The voice probability indicates the probability that the sound indicated by the voice data VD includes voice. The zero crossing rate is the number of times the sound pressure becomes zero. At the time of excitement, for example, the fundamental frequency tends to be higher and the loudness tends to be larger than at the time of non-excitation. Further, the sound feature amount extraction processing unit 2531 may execute a correction process on the voice data VD and extract the sound feature amount from the corrected voice data obtained by executing the correction process. The correction process is, for example, one or both of a process of removing silent portion of data from the voice data VD and a process of removing noise contained in the sound indicated by the voice data VD.
The learning model execution processing unit 2532 inputs the feature amount of the extracted sound into the learning model learned in advance, and outputs the second emotion group data GD2 based on the output result obtained from the learning model. The second emotion group data GD2 indicates whether the speaker's emotion belongs to the excited group GE3 or the non-excited group GE4.
The learning model learned in advance is, for example, a model that outputs a score for each emotion when a sound feature amount is input. The second estimation unit 253 outputs the second emotion group data GD2 indicating that the speaker's emotion belongs to the excitement group GE3 when the following equation (2) is satisfied. On the other hand, when the equation (2) is not satisfied, the second estimation unit 253 outputs the second emotion group data GD2 indicating that the speaker's emotion belongs to the non-excited group GE4.

（喜びのスコア＋怒りのスコア）／２＞β×悲しみのスコア（２） (Score of joy + score of anger) / 2> β x score of sadness (2)

βは、例えば、ユーザ装置１の開発者又は発話者などによって設定される値である。 β is, for example, a value set by the developer or speaker of the user device 1.

第２感情グループデータＧＤ２は、例えば、下記に示す２つの態様がある。第２感情グループデータＧＤ２の第１の態様は、興奮グループＧＥ３を示す識別子及び非興奮グループＧＥ４を示す識別子の何れか一方である。第２感情グループデータＧＤ２の第２の態様は、興奮グループＧＥ３を示す識別子及び非興奮グループＧＥ４を示す識別子の何れか一方と、予め学習した学習モデルが出力した各感情のスコアである。 The second emotion group data GD2 has, for example, the following two aspects. The first aspect of the second emotion group data GD2 is either an identifier indicating the excitement group GE3 or an identifier indicating the non-excitement group GE4. The second aspect of the second emotion group data GD2 is either one of the identifier indicating the excitement group GE3 and the identifier indicating the non-excitement group GE4, and the score of each emotion output by the learning model learned in advance.

感情推定部２５４は、第１感情グループデータＧＤ１が示す推定結果と、第２感情グループデータＧＤ２が示す推定結果とに基づいて、発話者の感情を推定する。 The emotion estimation unit 254 estimates the emotion of the speaker based on the estimation result indicated by the first emotion group data GD1 and the estimation result indicated by the second emotion group data GD2.

より詳細には、発話者の感情がポジティブグループＧＥ１に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４は、発話者の感情が喜びであると推定する。
また、発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示し、且つ、発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４は、発話者の感情が怒りであると推定する。
発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示し、且つ、発話者の感情が非興奮グループＧＥ４に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４は、発話者の感情が悲しみであると推定する。
感情推定部２５４は、推定した発話者の感情を示す感情データＥＤを出力する。感情データＥＤは、例えば、以下に示す２つの態様がある。感情データＥＤの第１の態様は、推定した発話者の感情を示す識別子である。感情を示す識別子には、喜びを示す識別子、怒りを示す識別子、及び、悲しみを示す識別子がある。感情データＥＤの第２の態様は、推定した発話者の感情を示す識別子と、推定した発話者の感情のスコアとである。推定した発話者の感情のスコアは、例えば、第１感情グループデータＧＤ１の第２の態様に含まれる、推定した発話者の感情のスコアと、第２感情グループデータＧＤ２の第２の態様に含まれる、推定した発話者の感情のスコアとの合計値、又は、平均値である。 More specifically, when the first emotion group data GD1 indicates that the speaker's emotion belongs to the positive group GE1, the emotion estimation unit 254 estimates that the speaker's emotion is joy.
Further, when the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2 and the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3, the emotion estimation unit 254 estimates that the speaker's emotions are anger.
When the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2 and the second emotion group data GD2 indicates that the speaker's emotion belongs to the non-excited group GE4, the emotion estimation unit 254 Presumes that the speaker's emotions are sad.
The emotion estimation unit 254 outputs emotion data ED indicating the estimated emotion of the speaker. The emotional data ED has, for example, the following two aspects. The first aspect of the emotion data ED is an identifier indicating the estimated emotion of the speaker. The identifier indicating emotion includes an identifier indicating joy, an identifier indicating anger, and an identifier indicating sadness. A second aspect of the emotion data ED is an identifier indicating the estimated speaker's emotion and an estimated speaker's emotion score. The estimated speaker emotion score is included in, for example, the estimated speaker emotion score included in the second aspect of the first emotion group data GD1 and the second aspect of the second emotion group data GD2. It is the total value or the average value with the estimated speaker's emotional score.

出力部２６は、音声認識処理部２５１によって得られた認識文字列ＳＤに対して、感情データＥＤが示す感情に応じた処理を施して得られたデータを出力する。感情に応じた処理は、例えば、下記に示す２つの態様がある。
感情に応じた処理の第１の態様は、認識文字列ＳＤに対して、感情を具象化した図形を付加する処理である。感情を具象化した図形は、例えば、感情を具象化した絵文字、及び、感情を具象化した顔文字である。絵文字は、文字コードに対応付けられた画像である。文字コードは、例えば、Unicodeである。顔文字は、記号及び文字を組み合わせて顔を表現した文字列である。以下の説明では、感情を具象化した図形は、感情を具象化した絵文字であるとして説明する。喜びを具象化した絵文字は、例えば、笑顔を示す絵文字である。怒りを具象化した絵文字は、例えば、怒りの顔を示す絵文字である。悲しみを具象化した絵文字は、例えば、泣き顔を示す絵文字である。さらに、感情データＥＤが第２の態様である場合、出力部２６は、感情データＥＤが示す感情であって、感情データＥＤに含まれるスコアに応じた深さを有する感情を具象化した絵文字を、認識文字列ＳＤに付加する絵文字として決定してもよい。例えば、感情データＥＤが示す感情が悲しみであり、かつ、感情データＥＤに含まれるスコアが所定の閾値以下である場合、出力部２６は、涙をこぼす顔を示す絵文字を認識文字列ＳＤに付加する絵文字として決定する。一方、感情データＥＤが示す感情が悲しみであり、かつ、感情データＥＤに含まれるスコアが所定の閾値より大きい場合、出力部２６は、号泣した顔を示す絵文字を認識文字列ＳＤに付加する絵文字として決定する。号泣した顔を示す絵文字は、涙をこぼす顔を示す絵文字と比較して、より深い悲しみを具象化している。
出力部２６は、認識文字列ＳＤに絵文字を付加して得られた絵文字付き文字列を出力する。絵文字を付加する位置は、例えば、以下に示す２つがある。第１の位置は、認識文字列ＳＤの末尾である。第２の位置は、認識文字列ＳＤ内における、感情分類データ３３に含まれる文字列の次である。表示装置４は、出力部２６が出力した絵文字付き文字列に基づく画像を表示する。 The output unit 26 outputs the data obtained by performing processing according to the emotion indicated by the emotion data ED on the recognition character string SD obtained by the voice recognition processing unit 251. There are two modes of processing according to emotions, for example, as shown below.
The first aspect of the process according to the emotion is a process of adding a figure embodying the emotion to the recognition character string SD. The figures that embody emotions are, for example, pictograms that embody emotions and emoticons that embody emotions. A pictogram is an image associated with a character code. The character code is, for example, Unicode. An emoticon is a character string that expresses a face by combining symbols and characters. In the following description, a figure that embodies emotions will be described as a pictogram that embodies emotions. The pictogram that embodies joy is, for example, a pictogram that shows a smile. The pictogram that embodies anger is, for example, a pictogram that shows the face of anger. The pictogram that embodies sadness is, for example, a pictogram that shows a crying face. Further, when the emotion data ED is the second aspect, the output unit 26 outputs a pictogram embodying the emotion indicated by the emotion data ED and having a depth corresponding to the score included in the emotion data ED. , May be determined as a pictogram to be added to the recognition character string SD. For example, when the emotion indicated by the emotion data ED is sadness and the score included in the emotion data ED is equal to or less than a predetermined threshold value, the output unit 26 adds a pictogram indicating a face spilling tears to the recognition character string SD. Decide as a pictogram to do. On the other hand, when the emotion indicated by the emotion data ED is sad and the score included in the emotion data ED is larger than a predetermined threshold value, the output unit 26 adds a pictogram indicating a crying face to the recognition character string SD. To determine as. The emoji showing a crying face embodies deeper sadness compared to the emoji showing a tearful face.
The output unit 26 outputs a character string with a pictogram obtained by adding a pictogram to the recognition character string SD. For example, there are two positions for adding pictograms as shown below. The first position is the end of the recognition character string SD. The second position is next to the character string included in the emotion classification data 33 in the recognition character string SD. The display device 4 displays an image based on the character string with pictograms output by the output unit 26.

感情に応じた処理の第２の態様は、感情に基づく抑揚を付加して読み上げた合成音声を生成する処理である。抑揚は、読み上げ速度を速くするもしくは遅くする、又は、音量を大きくするもしくは小さくすることである。喜びに基づく抑揚は、例えば、読み上げ速度を上げることである。怒りに基づく抑揚は、例えば、音量を大きくすることである。悲しみに基づく抑揚は、例えば、音量を小さくすることである。出力部２６は、感情に基づく抑揚を付加して読み上げた合成音声を示すデータを出力する。そして、出力部２６は、生成したデータが示す合成音声に、感情に基づく抑揚を付加して、感情に基づく抑揚を付加して読み上げた合成音声を示すデータを出力する。放音装置７は、出力部２６が出力したデータが示す合成音声を放音する。 The second aspect of the emotion-based process is a process of generating a synthetic voice read aloud by adding emotion-based intonation. Inflection is to increase or decrease the reading speed, or to increase or decrease the volume. Pleasure-based intonation is, for example, speeding up reading. Anger-based intonation is, for example, increasing the volume. Sadness-based intonation is, for example, reducing the volume. The output unit 26 outputs data indicating a synthetic voice read aloud with an emotion-based intonation added. Then, the output unit 26 adds emotion-based intonation to the synthetic voice indicated by the generated data, adds emotion-based intonation, and outputs data indicating the synthetic voice read aloud. The sound emitting device 7 emits a synthetic voice indicated by the data output by the output unit 26.

１．２．第１実施形態の動作
次に、推定部２５が実行する処理について、図６を用いて説明する。 1.2. Operation of First Embodiment Next, the process executed by the estimation unit 25 will be described with reference to FIG.

図６は、推定部２５の処理を示すフローチャートである。図６に示すステップＳ３、ステップＳ４、ステップＳ６、ステップＳ７、及び、ステップＳ８の処理が、感情推定部２５４に相当する。音声認識処理部２５１は、音声データＶＤに対して音声認識処理を施して、認識文字列ＳＤを得る（ステップＳ１）。次に、第１推定部２５２は、認識文字列ＳＤに対して文字列感情推定処理を実行し、第１感情グループデータＧＤ１を出力する（ステップＳ２）。 FIG. 6 is a flowchart showing the processing of the estimation unit 25. The processes of step S3, step S4, step S6, step S7, and step S8 shown in FIG. 6 correspond to the emotion estimation unit 254. The voice recognition processing unit 251 performs voice recognition processing on the voice data VD to obtain the recognition character string SD (step S1). Next, the first estimation unit 252 executes the character string emotion estimation process on the recognition character string SD and outputs the first emotion group data GD1 (step S2).

感情推定部２５４は、第１感情グループデータＧＤ１がポジティブグループＧＥ１を示すか否かを判定する（ステップＳ３）。換言すればステップＳ３において、感情推定部２５４は、第１感情グループデータＧＤ１がポジティブグループＧＥ１とネガティブグループＧＥ２との何れを示すか判定する。第１感情グループデータＧＤ１がポジティブグループＧＥ１を示し、ステップＳ３の判定結果が肯定となる場合、感情推定部２５４は、発話者の感情が喜びであると推定する（ステップＳ４）。 The emotion estimation unit 254 determines whether or not the first emotion group data GD1 indicates the positive group GE1 (step S3). In other words, in step S3, the emotion estimation unit 254 determines whether the first emotion group data GD1 indicates a positive group GE1 or a negative group GE2. When the first emotion group data GD1 indicates the positive group GE1 and the determination result in step S3 is affirmative, the emotion estimation unit 254 estimates that the speaker's emotion is joy (step S4).

第１感情グループデータＧＤ１がネガティブグループＧＥ２を示し、ステップＳ３の判定結果が否定となる場合、第２推定部２５３は、音声データＶＤに対して音声感情推定処理を実行し、第２感情グループデータＧＤ２を出力する（ステップＳ５）。感情推定部２５４は、第２感情グループデータＧＤ２が興奮グループＧＥ３を示すか否かを判定する。換言すればステップＳ５において、感情推定部２５４は、第２感情グループデータＧＤ２が興奮グループＧＥ３と非興奮グループＧＥ４との何れを示すか判定する（ステップＳ６）。 When the first emotion group data GD1 indicates the negative group GE2 and the determination result in step S3 is negative, the second estimation unit 253 executes the voice emotion estimation process on the voice data VD, and the second emotion group data. Output GD2 (step S5). The emotion estimation unit 254 determines whether or not the second emotion group data GD2 indicates the excitement group GE3. In other words, in step S5, the emotion estimation unit 254 determines whether the second emotion group data GD2 indicates the excited group GE3 or the non-excited group GE4 (step S6).

第１感情グループデータＧＤ１がネガティブグループＧＥ２を示し、且つ、第２感情グループデータＧＤ２が興奮グループＧＥ３を示す場合、感情推定部２５４は、発話者の感情が怒りであると推定する（ステップＳ７）。第１感情グループデータＧＤ１がネガティブグループＧＥ２を示し、且つ、第２感情グループデータＧＤ２が非興奮グループＧＥ４を示す場合、感情推定部２５４は、発話者の感情が悲しみであると推定する（ステップＳ８）。 When the first emotion group data GD1 indicates the negative group GE2 and the second emotion group data GD2 indicates the excitement group GE3, the emotion estimation unit 254 estimates that the speaker's emotion is anger (step S7). .. When the first emotion group data GD1 indicates the negative group GE2 and the second emotion group data GD2 indicates the non-excited group GE4, the emotion estimation unit 254 estimates that the speaker's emotion is sad (step S8). ).

ステップＳ４、ステップＳ７、又は、ステップＳ８の処理終了後、推定部２５は、図６に示す一連の処理を終了する。 After the processing of step S4, step S7, or step S8 is completed, the estimation unit 25 ends a series of processing shown in FIG.

１．３．第１実施形態の効果
以上説明したように、第１実施形態によれば、ユーザ装置１は、認識文字列ＳＤに対する文字列感情推定処理の推定結果と、音声データＶＤに対する音声感情推定処理の推定結果とに基づいて、発話者の感情を推定する。文字列感情推定処理では、認識文字列ＳＤの意味内容に着目するので、発話者の音声に基づく認識文字列ＳＤが肯定的な内容なのか否定的な内容なのかを高い精度で判定できる。一方、音声の抑揚には、発話者が興奮しているか否かが顕著に表れる。認識文字列ＳＤは、単なる文字列に過ぎないので、音声の抑揚が失われている。発話者の感情には、興奮時に表れる喜びと怒りと、非興奮時に表れる悲しみがある。従って、仮に、認識文字列ＳＤから、発話者の感情が、興奮時の感情であるか非興奮時の感情であるかを推定しようとすると、誤推定が発生する場合がある。例えば、感情スコア算出処理部２５２２において、発話者の真の感情が怒りであるのに、認識文字列ＳＤに基づいて悲しみのスコアが最も高く算出される場合がある。また、感情スコア算出処理部２５２２において、発話者の真の感情が悲しみであるのに、認識文字列ＳＤに基づいて怒りのスコアが最も高く算出される場合がある。即ち、認識文字列ＳＤに基づいて、興奮時の感情と非興奮時の感情とを区別しようとすると、怒りと悲しみとを混同する可能性がある。しかしながら、認識文字列ＳＤに基づいて感情を推定する第１推定部２５２は、混同することがある怒りと悲しみとを１つのグループとして推定するため、怒りと悲しみとの混同による誤推定を無くすことができる。
音声感情推定処理について、音の特徴量の中には、基本周波数及びラウドネスのように、興奮時と非興奮時とで値が大きく異なる傾向を有する特徴量がある。従って、音声感情推定処理では、発話者の感情が、興奮時の感情であるか非興奮時の感情であるかを精度良く推定することができる。一方、音の特徴量には、発話者の発話の意味内容が含まれていない。発話者の感情には、肯定的な時に現れる喜びと、否定的な時に現れる怒りと悲しみとがある。従って、仮に、音の特徴量から、発話者の感情が、肯定的な感情であるか否定的な感情であるかを推定しようとすると、誤推定が発生する場合がある。例えば、学習モデル実行処理部２５３２における学習モデルにおいて、発話者の真の感情が喜びであるのに、怒りのスコアが最も高く算出される場合がある。また、学習モデル実行処理部２５３２における学習モデルにおいて、発話者の真の感情が怒りであるのに、喜びのスコアが最も高く算出される場合がある。即ち、音の特徴量に基づいて、肯定的な感情と否定的な感情とを区別しようとすると、喜びと怒りとを混同する可能性がある。しかしながら、音の特徴量に基づいて感情を推定する第２推定部２５３は、混同することがある喜びと怒りとを１つのグループとして推定するため、喜びと怒りとの混同による誤推定を無くすことができる。
以上により、第１実施形態によれば、混同することがある感情同士を１つのグループとして推定するため、誤推定を抑制することができる。例えば、認識文字列ＳＤに基づく各感情のスコアと音の特徴量に基づく各感情のスコアとの平均値によって発話者の感情を推定する場合と比較すると、発話者の感情を精度良く推定することが可能になる。 1.3. Effect of First Embodiment As described above, according to the first embodiment, the user device 1 estimates the estimation result of the character string emotion estimation process for the recognition character string SD and the estimation of the voice emotion estimation process for the voice data VD. Estimate the speaker's emotions based on the results. Since the character string emotion estimation process focuses on the meaning and content of the recognition character string SD, it is possible to determine with high accuracy whether the recognition character string SD based on the speaker's voice has positive content or negative content. On the other hand, the intonation of the voice clearly shows whether or not the speaker is excited. Since the recognition character string SD is merely a character string, the intonation of the voice is lost. Speakers' emotions include joy and anger that appear when they are excited, and sadness that appears when they are not excited. Therefore, if it is attempted to estimate whether the speaker's emotion is an excited emotion or a non-excited emotion from the recognition character string SD, an erroneous estimation may occur. For example, in the emotion score calculation processing unit 2522, the sadness score may be calculated to be the highest based on the recognition character string SD even though the true emotion of the speaker is anger. Further, in the emotion score calculation processing unit 2522, the anger score may be calculated to be the highest based on the recognition character string SD even though the true emotion of the speaker is sadness. That is, when trying to distinguish between emotions during excitement and emotions during non-excitement based on the recognition character string SD, anger and sadness may be confused. However, since the first estimation unit 252, which estimates emotions based on the recognition character string SD, estimates anger and sadness, which may be confused, as one group, it is possible to eliminate erroneous estimation due to confusion between anger and sadness. Can be done.
Regarding the voice emotion estimation process, among the sound features, there are features such as fundamental frequency and loudness that tend to have values that differ greatly between when excited and when not excited. Therefore, in the voice emotion estimation process, it is possible to accurately estimate whether the emotion of the speaker is an emotion during excitement or an emotion during non-excitement. On the other hand, the sound features do not include the meaning and content of the speaker's utterance. Speakers' emotions include joy that appears in positive times and anger and sadness that appear in negative times. Therefore, if it is attempted to estimate whether the speaker's emotion is a positive emotion or a negative emotion from the feature amount of the sound, an erroneous estimation may occur. For example, in the learning model in the learning model execution processing unit 2532, the anger score may be calculated to be the highest even though the true emotion of the speaker is joy. Further, in the learning model in the learning model execution processing unit 2532, the joy score may be calculated to be the highest even though the true emotion of the speaker is anger. That is, trying to distinguish between positive and negative emotions based on sound features can confuse joy with anger. However, the second estimation unit 253, which estimates emotions based on sound features, estimates joy and anger, which may be confused, as one group, so that erroneous estimation due to confusion between joy and anger is eliminated. Can be done.
As described above, according to the first embodiment, emotions that may be confused are estimated as one group, so that erroneous estimation can be suppressed. For example, the speaker's emotion is estimated more accurately than the case where the speaker's emotion is estimated by the average value of the score of each emotion based on the recognition character string SD and the score of each emotion based on the sound features. Becomes possible.

また、第１実施形態によれば、発話者の感情がポジティブグループＧＥ１に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４は、発話者の感情が喜びであると推定する。発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示し、且つ、発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４は、発話者の感情が怒りであると推定する。発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示し、且つ、発話者の感情が非興奮グループＧＥ４に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４は、発話者の感情が悲しみであると推定する。
以上により、感情推定部２５４は、発話者の感情がポジティブグループＧＥ１に属することを第１感情グループデータＧＤ１が示す場合、第２感情グループデータＧＤ２を参照することなく、発話者の感情を推定することが可能になる。また、発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示す場合であっても、感情推定部２５４は、第２感情グループデータＧＤ２を参照することにより、発話者の感情を精度良く推定することが可能になる。 Further, according to the first embodiment, when the first emotion group data GD1 indicates that the speaker's emotion belongs to the positive group GE1, the emotion estimation unit 254 estimates that the speaker's emotion is joy. When the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2, and the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3, the emotion estimation unit 254 , Estimate that the speaker's emotions are anger. When the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2, and the second emotion group data GD2 indicates that the speaker's emotion belongs to the non-excited group GE4, the emotion estimation unit 254 Presumes that the speaker's emotions are sad.
As described above, when the first emotion group data GD1 indicates that the speaker's emotion belongs to the positive group GE1, the emotion estimation unit 254 estimates the speaker's emotion without referring to the second emotion group data GD2. Will be possible. Further, even when the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2, the emotion estimation unit 254 refers to the second emotion group data GD2 to refer to the speaker's emotion. Can be estimated accurately.

２．第２実施形態
第１実施形態では、推定部２５は、第１推定部２５２によって文字列感情推定処理を実行し、第１感情グループデータＧＤ１がネガティブグループＧＥ２を示す場合、第２推定部２５３によって音声感情推定処理を実行する。一方、第２実施形態では、推定部２５ａは、第２推定部２５３によって音声感情推定処理を実行し、第２感情グループデータＧＤ２が興奮グループＧＥ３を示す場合、第１推定部２５２によって文字列感情推定処理を実行する。以下、第２実施形態にかかるユーザ装置１ａを説明する。なお、以下に例示する第２実施形態において作用又は機能が第１実施形態と同等である要素については、以上の説明で参照の符号を流用して各々の詳細な説明を適宜に省略する。 2. 2. 2nd Embodiment In the 1st embodiment, the estimation unit 25 executes the character string emotion estimation process by the 1st estimation unit 252, and when the 1st emotion group data GD1 indicates the negative group GE2, the 2nd estimation unit 253 Execute voice emotion estimation processing. On the other hand, in the second embodiment, the estimation unit 25a executes the voice emotion estimation process by the second estimation unit 253, and when the second emotion group data GD2 indicates the excitement group GE3, the character string emotion by the first estimation unit 252. Perform the estimation process. Hereinafter, the user device 1a according to the second embodiment will be described. Regarding the elements whose actions or functions are the same as those of the first embodiment in the second embodiment illustrated below, the reference numerals are used in the above description, and detailed description of each is appropriately omitted.

２．１．第２実施形態の機能
図７は、第２実施形態にかかるユーザ装置１ａを示すブロック図である。ユーザ装置１ａは、処理装置２ａ、記憶装置３ａ、表示装置４、操作装置５、通信装置６、放音装置７、及び、集音装置８を具備するコンピュータシステムにより実現される。記憶装置３ａは、処理装置２ａが読取可能な記録媒体であり、処理装置２ａが実行する制御プログラムＰＲａを含む複数のプログラムを記憶する。 2.1. Function of the second embodiment FIG. 7 is a block diagram showing a user device 1a according to the second embodiment. The user device 1a is realized by a computer system including a processing device 2a, a storage device 3a, a display device 4, an operating device 5, a communication device 6, a sound emitting device 7, and a sound collecting device 8. The storage device 3a is a recording medium that can be read by the processing device 2a, and stores a plurality of programs including the control program PRa executed by the processing device 2a.

処理装置２ａは、記憶装置３ａから制御プログラムＰＲａを読み取り実行することによって、取得部２１、推定部２５ａ、及び、出力部２６として機能する。 The processing device 2a functions as an acquisition unit 21, an estimation unit 25a, and an output unit 26 by reading and executing the control program PRa from the storage device 3a.

図８は、第２実施形態にかかるユーザ装置１ａの機能の概要を示す図である。推定部２５ａは、音声認識処理部２５１、第１推定部２５２、第２推定部２５３、及び、感情推定部２５４ａを含む。 FIG. 8 is a diagram showing an outline of the function of the user device 1a according to the second embodiment. The estimation unit 25a includes a voice recognition processing unit 251, a first estimation unit 252, a second estimation unit 253, and an emotion estimation unit 254a.

感情推定部２５４ａは、発話者の感情が非興奮グループＧＥ４に属することを第２感情グループデータＧＤ２が示す場合、発話者の感情が悲しみであると推定する。
また、発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示し、且つ、発話者の感情がポジティブグループＧＥ１に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４ａは、発話者の感情が喜びであると推定する。
発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示し、且つ、発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４ａは、発話者の感情が怒りであると推定する。 When the second emotion group data GD2 indicates that the speaker's emotion belongs to the non-excited group GE4, the emotion estimation unit 254a estimates that the speaker's emotion is sad.
Further, when the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3 and the first emotion group data GD1 indicates that the speaker's emotion belongs to the positive group GE1, the emotion estimation unit 254a presumes that the speaker's emotions are joy.
When the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3 and the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2, the emotion estimation unit 254a , Estimate that the speaker's emotions are anger.

２．２．第２実施形態の動作
次に、推定部２５ａが実行する処理について、図９を用いて説明する。 2.2. Operation of the Second Embodiment Next, the process executed by the estimation unit 25a will be described with reference to FIG.

図９は、第２実施形態にかかる推定部２５ａの処理を示すフローチャートである。図９に示すステップＳ２２、ステップＳ２３、ステップＳ２６、ステップＳ２７、及び、ステップＳ２８の処理が、感情推定部２５４ａに相当する。第２推定部２５３は、音声データＶＤに対して音声感情推定処理を実行し、第２感情グループデータＧＤ２を出力する（ステップＳ２１）。 FIG. 9 is a flowchart showing the processing of the estimation unit 25a according to the second embodiment. The processes of step S22, step S23, step S26, step S27, and step S28 shown in FIG. 9 correspond to the emotion estimation unit 254a. The second estimation unit 253 executes the voice emotion estimation process on the voice data VD and outputs the second emotion group data GD2 (step S21).

感情推定部２５４ａは、第２感情グループデータＧＤ２が非興奮グループＧＥ４を示すか否かを判定する（ステップＳ２２）。換言すればステップＳ２２において、感情推定部２５４ａは、第２感情グループデータＧＤ２が興奮グループＧＥ３と非興奮グループＧＥ４との何れを示すか判定する。第２感情グループデータＧＤ２が非興奮グループＧＥ４を示し、ステップＳ２２の判定結果が肯定となる場合、感情推定部２５４ａは、発話者の感情が悲しみであると推定する（ステップＳ２３）。 The emotion estimation unit 254a determines whether or not the second emotion group data GD2 indicates the non-excited group GE4 (step S22). In other words, in step S22, the emotion estimation unit 254a determines whether the second emotion group data GD2 indicates the excited group GE3 or the non-excited group GE4. When the second emotion group data GD2 indicates the non-excited group GE4 and the determination result in step S22 is affirmative, the emotion estimation unit 254a estimates that the speaker's emotion is sad (step S23).

一方、第２感情グループデータＧＤ２が興奮グループＧＥ３を示し、ステップＳ２２の判定結果が否定となる場合、音声認識処理部２５１は、音声データＶＤに対して音声認識処理を施して、認識文字列ＳＤを得る（ステップＳ２４）。次に、第１推定部２５２は、認識文字列ＳＤに対して文字列感情推定処理を実行し、第１感情グループデータＧＤ１を出力する（ステップＳ２５）。 On the other hand, when the second emotion group data GD2 indicates the excitement group GE3 and the determination result in step S22 is negative, the voice recognition processing unit 251 performs voice recognition processing on the voice data VD and recognizes the recognition character string SD. (Step S24). Next, the first estimation unit 252 executes the character string emotion estimation process on the recognition character string SD and outputs the first emotion group data GD1 (step S25).

感情推定部２５４ａは、第１感情グループデータＧＤ１がポジティブグループＧＥ１を示すか否かを判定する（ステップＳ２６）。換言すればステップＳ２６において、感情推定部２５４ａは、第１感情グループデータＧＤ１がポジティブグループＧＥ１とネガティブグループＧＥ２との何れを示すか判定する。 The emotion estimation unit 254a determines whether or not the first emotion group data GD1 indicates a positive group GE1 (step S26). In other words, in step S26, the emotion estimation unit 254a determines whether the first emotion group data GD1 indicates a positive group GE1 or a negative group GE2.

第２感情グループデータＧＤ２が興奮グループＧＥ３を示し、且つ、第１感情グループデータＧＤ１がポジティブグループＧＥ１を示す場合、感情推定部２５４ａは、発話者の感情が喜びであると推定する（ステップＳ２７）。一方、第２感情グループデータＧＤ２が興奮グループＧＥ３を示し、且つ、第１感情グループデータＧＤ１がネガティブグループＧＥ２を示す場合、感情推定部２５４ａは、発話者の感情が怒りであると推定する（ステップＳ２８）。 When the second emotion group data GD2 indicates the excitement group GE3 and the first emotion group data GD1 indicates the positive group GE1, the emotion estimation unit 254a estimates that the speaker's emotion is joy (step S27). .. On the other hand, when the second emotion group data GD2 shows the excitement group GE3 and the first emotion group data GD1 shows the negative group GE2, the emotion estimation unit 254a estimates that the speaker's emotion is anger (step). S28).

ステップＳ２３、ステップＳ２７、又は、ステップＳ２８の処理終了後、推定部２５ａは、図９に示す一連の処理を終了する。 After the processing of step S23, step S27, or step S28 is completed, the estimation unit 25a ends a series of processing shown in FIG.

２．３．第２実施形態の効果
以上説明したように、第２実施形態によれば、発話者の感情が非興奮グループＧＥ４に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４ａは、発話者の感情が悲しみであると推定する。また、発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示し、且つ、発話者の感情がポジティブグループＧＥ１に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４ａは、発話者の感情が喜びであると推定する。発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示し、且つ、発話者の感情がネガティブグループＧＥ２に属することを第１感情グループデータＧＤ１が示す場合、感情推定部２５４ａは、発話者の感情が怒りであると推定する。
以上によれば、感情推定部２５４ａは、発話者の感情が非興奮グループＧＥ４に属することを第２感情グループデータＧＤ２が示す場合、感情推定部２５４ａは、第１感情グループデータＧＤ１を参照することなく、発話者の感情を推定することが可能になる。また、発話者の感情が興奮グループＧＥ３に属することを第２感情グループデータＧＤ２が示す場合であっても、第１感情グループデータＧＤ１を参照することにより、発話者の感情を精度良く推定することが可能になる。 2.3. Effect of Second Embodiment As described above, according to the second embodiment, when the second emotion group data GD2 indicates that the speaker's emotion belongs to the non-excited group GE4, the emotion estimation unit 254a speaks. It is presumed that one's feelings are sadness. Further, when the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3 and the first emotion group data GD1 indicates that the speaker's emotion belongs to the positive group GE1, the emotion estimation unit 254a presumes that the speaker's emotions are joy. When the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3 and the first emotion group data GD1 indicates that the speaker's emotion belongs to the negative group GE2, the emotion estimation unit 254a , Estimate that the speaker's emotions are anger.
According to the above, when the second emotion group data GD2 indicates that the speaker's emotion belongs to the non-excited group GE4, the emotion estimation unit 254a refers to the first emotion group data GD1. It becomes possible to estimate the emotion of the speaker. Further, even when the second emotion group data GD2 indicates that the speaker's emotion belongs to the excitement group GE3, the speaker's emotion can be estimated accurately by referring to the first emotion group data GD1. Becomes possible.

３．変形例
本発明は、以上に例示した各実施形態に限定されない。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を併合してもよい。 3. 3. Modifications The present invention is not limited to the embodiments exemplified above. A specific mode of modification is illustrated below. Two or more aspects arbitrarily selected from the following examples may be merged.

（１）上述した各形態において、第１推定部２５２は、（１）式において、左辺である「喜びのスコア」と、右辺である「α×（怒りのスコア＋悲しみのスコア）／２」との差の絶対値が所定値以上ある場合に、発話者の感情がポジティブグループＧＥ１とネガティブグループＧＥ２との何れかに属することを示す第１感情グループデータＧＤ１を出力し、前述の差の絶対値が所定値未満である場合に、発話者の感情が不明であることを示す第１感情グループデータＧＤ１を出力してもよい。
同様に、第２推定部２５３は、（２）式において、左辺である「（喜びのスコア＋怒りのスコア）／２」と、右辺である「β×悲しみのスコア」との差の絶対値が所定値以上ある場合に、発話者の感情が興奮グループＧＥ３と非興奮グループＧＥ４との何れに属することを示す第２感情グループデータＧＤ２を出力し、前述の差の絶対値が所定値未満である場合に、発話者の感情が不明であることを示す第２感情グループデータＧＤ２を出力してもよい。
第１感情グループデータＧＤ１及び第２感情グループデータＧＤ２の何れか一方が、発明者の感情が不明であることを示す場合、感情推定部２５４は、他方の感情グループデータに基づいて、発話者の感情を推定してもよい。
ここで、第１感情グループデータＧＤ１及び第２感情グループデータＧＤ２が、第２の態様であるとする。例えば、第１感情グループデータＧＤ１が、発話者の感情が不明であることを示す場合、感情推定部２５４は、第２感情グループデータＧＤ２に含まれる各感情のスコアのうち、最も大きいスコアを有する感情を、発話者の感情として推定する。同様に、第２感情グループデータＧＤ２が、発話者の感情が不明であることを示す場合、感情推定部２５４は、第１感情グループデータＧＤ１に含まれる各感情のスコアのうち、最も大きいスコアを有する感情を、発話者の感情として推定する。 (1) In each of the above-described forms, in the equation (1), the first estimation unit 252 has a “joy score” on the left side and an “α × (anger score + sadness score) / 2” on the right side. When the absolute value of the difference from is equal to or greater than a predetermined value, the first emotion group data GD1 indicating that the speaker's emotion belongs to either the positive group GE1 or the negative group GE2 is output, and the absolute value of the above difference is absolute. When the value is less than a predetermined value, the first emotion group data GD1 indicating that the speaker's emotion is unknown may be output.
Similarly, in the second estimation unit 253, in equation (2), the absolute value of the difference between the left side "(joy score + anger score) / 2" and the right side "β x sadness score" Is equal to or greater than a predetermined value, the second emotion group data GD2 indicating that the speaker's emotion belongs to either the excited group GE3 or the non-excited group GE4 is output, and the absolute value of the above difference is less than the predetermined value. In some cases, the second emotion group data GD2 indicating that the speaker's emotion is unknown may be output.
When either one of the first emotion group data GD1 and the second emotion group data GD2 indicates that the inventor's emotion is unknown, the emotion estimation unit 254 of the speaker is based on the other emotion group data. Emotions may be estimated.
Here, it is assumed that the first emotion group data GD1 and the second emotion group data GD2 are the second aspect. For example, when the first emotion group data GD1 indicates that the speaker's emotion is unknown, the emotion estimation unit 254 has the highest score among the scores of each emotion included in the second emotion group data GD2. Estimate the emotion as the speaker's emotion. Similarly, when the second emotion group data GD2 indicates that the speaker's emotion is unknown, the emotion estimation unit 254 determines the highest score among the scores of each emotion included in the first emotion group data GD1. Estimate the emotions that the speaker has.

（２）第２実施形態において、ステップＳ２５において、第１推定部２５２は、感情分類データ３３に含まれる、喜びに分類された文字列群３３１及び怒りに分類された文字列群３３２を、認識文字列ＳＤと比較することにより、発話者の感情が、ポジティブグループＧＥ１とネガティブグループＧＥ２との何れに属するかを推定してもよい。言い換えれば、第１推定部２５２は、感情分類データ３３に含まれる悲しみに分類された文字列群３３３を、認識文字列ＳＤと比較しなくてよい。第１推定部２５２は、感情分類データ３３に含まれる悲しみに分類された文字列群３３３を認識文字列ＳＤと比較しない分、文字列感情推定処理にかかる時間を短縮することが可能になる。 (2) In the second embodiment, in step S25, the first estimation unit 252 recognizes the character string group 331 classified as joy and the character string group 332 classified as anger, which are included in the emotion classification data 33. By comparing with the character string SD, it may be estimated whether the speaker's emotion belongs to the positive group GE1 or the negative group GE2. In other words, the first estimation unit 252 does not have to compare the character string group 333 classified as sadness included in the emotion classification data 33 with the recognition character string SD. The first estimation unit 252 can shorten the time required for the character string emotion estimation process because the character string group 333 classified into sadness included in the emotion classification data 33 is not compared with the recognition character string SD.

（３）第１実施形態では、ステップＳ３において、第１感情グループデータＧＤ１がポジティブグループＧＥ１を示す場合、第２推定部２５３が音声感情推定処理を実行しなかったが、実行してもよい。同様に、第２実施形態では、ステップＳ２２において、第２感情グループデータＧＤ２が非興奮グループＧＥ４を示す場合、第１推定部２５２が文字列感情推定処理を実行しなかったが、実行してもよい。第１感情グループデータＧＤ１がポジティブグループＧＥ１を示し、且つ、第２感情グループデータＧＤ２が非興奮グループＧＥ４を示す場合、感情推定部２５４は、発話者の感情が推定不能と決定してもよい。
このように、第１推定部２５２の推定結果と第２推定部２５３の推定結果とに整合性がとれない場合には、何れか一方の推定結果が誤推定であることを示すため、感情推定部２５４は、誤った推定結果を出力することを抑制することが可能になる。 (3) In the first embodiment, in step S3, when the first emotion group data GD1 indicates the positive group GE1, the second estimation unit 253 did not execute the voice emotion estimation process, but it may be executed. Similarly, in the second embodiment, in step S22, when the second emotion group data GD2 indicates the non-excited group GE4, the first estimation unit 252 did not execute the character string emotion estimation process, but even if it does. Good. When the first emotion group data GD1 indicates a positive group GE1 and the second emotion group data GD2 indicates a non-excited group GE4, the emotion estimation unit 254 may determine that the speaker's emotion cannot be estimated.
In this way, when the estimation result of the first estimation unit 252 and the estimation result of the second estimation unit 253 are inconsistent, it is shown that one of the estimation results is an erroneous estimation. The unit 254 can suppress the output of an erroneous estimation result.

（４）上述したように、第１感情グループデータＧＤ１がポジティブグループＧＥ１を示し、且つ、第２感情グループデータＧＤ２が非興奮グループＧＥ４を示す場合、感情推定部２５４は、発話者の感情が推定不能と決定してもよい。感情推定部２５４が、発話者の感情が推定不能と決定した場合、出力部２６は、以下に示す２つの絵文字付き文字列とのうち少なくとも１つの図形付き文字列を出力する。第１の絵文字付き文字列は、認識文字列ＳＤに対して喜びを具象化した絵文字を付加する処理を施して得られる絵文字付き文字列である。第２の絵文字付き文字列は、認識文字列ＳＤに対して悲しみを具象化した絵文字を付加した処理を施して得られる文字列である。出力部２６は、第１の絵文字付き文字列と第２の絵文字付き文字列ともに出力してもよいし、何れか一方を出力してもよい。
例えば、第１感情グループデータＧＤ１及び第２感情グループデータＧＤ２が、第２の態様であるとする。出力部２６は、第１感情グループデータＧＤ１に含まれる喜びのスコアが所定値以上である場合に第１の絵文字付き文字列を出力し、第２感情グループデータＧＤ２に含まれる悲しみのスコアが所定値以上である場合に第２の絵文字付き文字列を出力する。
ユーザである発話者は、表示装置４に表示された絵文字付き文字列を見て、操作装置５を操作することにより、自身の感情に近い絵文字付き文字列を選択する。
以上により、ユーザ装置１は、発話者の感情が推定不能と決定した場合でも、発話者に選択させることにより、発話者の感情に近い、適切な感情を選択することが可能になる。 (4) As described above, when the first emotion group data GD1 indicates the positive group GE1 and the second emotion group data GD2 indicates the non-excited group GE4, the emotion estimation unit 254 estimates the emotion of the speaker. You may decide that it is impossible. When the emotion estimation unit 254 determines that the speaker's emotion cannot be estimated, the output unit 26 outputs at least one graphic character string out of the two pictogram character strings shown below. The first character string with pictograms is a character string with pictograms obtained by applying a process of adding pictograms embodying joy to the recognition character string SD. The second character string with a pictogram is a character string obtained by subjecting the recognition character string SD to a process in which a pictogram embodying sadness is added. The output unit 26 may output both the first character string with pictograms and the second character string with pictograms, or may output either one.
For example, assume that the first emotion group data GD1 and the second emotion group data GD2 are the second aspect. The output unit 26 outputs a character string with a first pictogram when the joy score included in the first emotion group data GD1 is equal to or higher than a predetermined value, and the sadness score included in the second emotion group data GD2 is predetermined. If it is greater than or equal to the value, a second character string with a pictogram is output.
The speaker, who is a user, looks at the character string with a pictogram displayed on the display device 4 and operates the operation device 5 to select a character string with a pictogram that is close to his / her emotion.
As described above, even if the user device 1 determines that the emotion of the speaker cannot be estimated, the user device 1 can select an appropriate emotion close to the emotion of the speaker by letting the speaker select the emotion.

（５）ユーザ装置１ｃと、ユーザ装置１ｃとアクセス可能なサーバ装置１０１とを含む感情推定システム１００によって、他者とのコミュニケーションを支援するサービスを提供してもよい。 (5) A service that supports communication with another person may be provided by the emotion estimation system 100 including the user device 1c and the user device 1c and the accessible server device 101.

図１０は、感情推定システム１００を示すブロック図である。感情推定システム１００は、ユーザ装置１ｃと、サーバ装置１０１とを含む。この変形例では、サーバ装置１０１が、「感情推定装置」の一例である。ユーザ装置１ｃが、「端末装置」の一例である。 FIG. 10 is a block diagram showing an emotion estimation system 100. The emotion estimation system 100 includes a user device 1c and a server device 101. In this modification, the server device 101 is an example of the “emotion estimation device”. The user device 1c is an example of a "terminal device".

ユーザ装置１ｃは、処理装置２ｃ、記憶装置３ｃ、表示装置４、操作装置５、通信装置６、放音装置７、及び、集音装置８を具備するコンピュータシステムにより実現される。記憶装置３ｃは、処理装置２ｃが読取可能な記録媒体であり、処理装置２ｃが実行する制御プログラムＰＲｃを含む複数のプログラムを記憶する。通信装置６は、ネットワークを介してサーバ装置１０１とアクセスする。 The user device 1c is realized by a computer system including a processing device 2c, a storage device 3c, a display device 4, an operating device 5, a communication device 6, a sound emitting device 7, and a sound collecting device 8. The storage device 3c is a recording medium that can be read by the processing device 2c, and stores a plurality of programs including the control program PRc executed by the processing device 2c. The communication device 6 accesses the server device 101 via the network.

処理装置２ｃは、記憶装置３ｃから制御プログラムＰＲｃを読み取り実行することによって、送信部２２、受信部２３、及び、出力部２６として機能する。 The processing device 2c functions as a transmitting unit 22, a receiving unit 23, and an output unit 26 by reading and executing the control program PRc from the storage device 3c.

送信部２２は、集音装置８によって得られた音声データＶＤを、サーバ装置１０１に送信する。受信部２３は、サーバ装置１０１から、認識文字列ＳＤと、感情データＥＤとを受信する。 The transmission unit 22 transmits the voice data VD obtained by the sound collecting device 8 to the server device 101. The receiving unit 23 receives the recognition character string SD and the emotion data ED from the server device 101.

サーバ装置１０１は、処理装置２Ｃ、記憶装置３Ｃ、及び通信装置６Ｃを具備するコンピュータシステムにより実現される。サーバ装置１０１の各要素は、情報を通信するための単体又は複数のバス９Ｃで相互に接続される。記憶装置３Ｃは、処理装置２Ｃが読取可能な記録媒体であり、処理装置２Ｃが実行する制御プログラムＰＲＣを含む複数のプログラム、解析用辞書データ３１、及び、感情分類データ３３を記憶する。通信装置６Ｃは、ネットワークを介してユーザ装置１ｃとアクセスする。 The server device 101 is realized by a computer system including a processing device 2C, a storage device 3C, and a communication device 6C. Each element of the server device 101 is connected to each other by a single unit or a plurality of buses 9C for communicating information. The storage device 3C is a recording medium that can be read by the processing device 2C, and stores a plurality of programs including the control program PRC executed by the processing device 2C, analysis dictionary data 31, and emotion classification data 33. The communication device 6C accesses the user device 1c via the network.

処理装置２Ｃは、記憶装置３Ｃから制御プログラムＰＲＣを読み取り実行することによって、取得部２１Ｃ、及び、推定部２５として機能する。 The processing device 2C functions as an acquisition unit 21C and an estimation unit 25 by reading and executing the control program PRC from the storage device 3C.

取得部２１Ｃは、ユーザ装置１ｃから、音声データＶＤを取得する。推定部２５は、音声データＶＤに基づいて発話者の感情を推定し、推定した感情を示す感情データＥＤと、認識文字列ＳＤとをユーザ装置１ｃに送信する。 The acquisition unit 21C acquires the voice data VD from the user device 1c. The estimation unit 25 estimates the emotion of the speaker based on the voice data VD, and transmits the emotion data ED indicating the estimated emotion and the recognition character string SD to the user device 1c.

この変形例によれば、サーバ装置１０１が発話者の感情を推定するため、第１実施形態におけるユーザ装置１と比較すると、ユーザ装置１ｃにかかる負荷を抑制することが可能になる。
なお、この変形例では、処理装置２ｃが出力部２６として機能する、言い換えれば、認識文字列ＳＤに対して感情データＥＤが示す感情に応じた処理を実行するが、処理装置２Ｃが出力部２６として機能してもよい。処理装置２Ｃが出力部２６として機能する場合、サーバ装置１０１が、認識文字列ＳＤに対して感情データＥＤが示す感情に応じた処理を実行し、この処理によって得られたデータを、ユーザ装置１ｃに送信する。 According to this modification, since the server device 101 estimates the emotion of the speaker, it is possible to suppress the load applied to the user device 1c as compared with the user device 1 in the first embodiment.
In this modification, the processing device 2c functions as the output unit 26, in other words, the recognition character string SD is processed according to the emotion indicated by the emotion data ED, but the processing device 2C is the output unit 26. May function as. When the processing device 2C functions as the output unit 26, the server device 101 executes a process corresponding to the emotion indicated by the emotion data ED on the recognition character string SD, and the data obtained by this process is used as the user device 1c. Send to.

（６）上述の各態様において、推定部２５は、第１推定部２５２と第２推定部２５３とを並列に実行してもよい。 (6) In each of the above-described aspects, the estimation unit 25 may execute the first estimation unit 252 and the second estimation unit 253 in parallel.

（７）上述の各態様において、ユーザ装置１は、集音装置８を有さなくてもよい。集音装置８を有さない場合、ユーザ装置１は、通信装置６を介して音声データＶＤを取得してもよいし、記憶装置３に記憶された音声データＶＤを取得してもよい。 (7) In each of the above-described aspects, the user device 1 does not have to have the sound collecting device 8. When the sound collecting device 8 is not provided, the user device 1 may acquire the voice data VD via the communication device 6 or may acquire the voice data VD stored in the storage device 3.

（８）上述の各態様において、ユーザ装置１は、放音装置７を有さなくてもよい。 (8) In each of the above aspects, the user device 1 does not have to have the sound emitting device 7.

（９）上述の各態様において、ユーザ装置１は、スマートスピーカでもよい。ユーザ装置１がスマートスピーカである場合、ユーザ装置１は、表示装置４及び操作装置５を有さなくてもよい。 (9) In each of the above aspects, the user device 1 may be a smart speaker. When the user device 1 is a smart speaker, the user device 1 does not have to have the display device 4 and the operation device 5.

（１０）上述の各態様において、感情分類データ３３は、図４に示すように、「勝つ」、「勝っ」のように、ある単語が活用した複数の形態素のそれぞれを、喜び、怒り、及び、悲しみの何れかに分類したが、これに限らない。例えば、感情分類データ３３は、解析用辞書データ３１の原形データに登録された文字列を、喜び、怒り、及び、悲しみの何れかに分類してもよい。例えば、感情分類データ３３は、解析用辞書データ３１の原形データに登録された文字列「嬉しい」、「合格」、及び「勝つ」を、喜びに分類する。感情スコア算出処理部２５２２は、補正後認識文字列ＣＳＤを形態素ごとに分解し、分解した形態素を、解析用辞書データ３１の原形データに登録された文字列に変換する。そして、感情スコア算出処理部２５２２は、変換して得られた文字列と、感情分類データ３３に含まれる文字列とが一致する場合に、この補正後認識文字列ＣＳＤに含まれる文字列に対応する感情のスコアを増加させる。 (10) In each of the above-described aspects, the emotion classification data 33 rejoices, angers, and makes each of the plurality of morphemes utilized by a certain word, such as "win" and "win", as shown in FIG. , But not limited to this. For example, the emotion classification data 33 may classify the character string registered in the prototype data of the analysis dictionary data 31 into any of joy, anger, and sadness. For example, the emotion classification data 33 classifies the character strings “happy”, “pass”, and “win” registered in the prototype data of the analysis dictionary data 31 into joy. The emotion score calculation processing unit 2522 decomposes the corrected recognition character string CSD for each morpheme, and converts the decomposed morpheme into a character string registered in the original form data of the analysis dictionary data 31. Then, the emotion score calculation processing unit 2522 corresponds to the character string included in the corrected recognition character string CSD when the character string obtained by the conversion matches the character string included in the emotion classification data 33. Increase your emotional score.

（１１）上述の各態様において、感情スコア算出処理部２５２２は、補正後認識文字列ＣＳＤに対して、感情ごとのスコアを算出したが、認識文字列ＳＤに対して感情ごとのスコアを算出してもよい。しかしながら、認識文字列ＳＤには、感情を推定するためには不要な文字列が含まれる。従って、補正後認識文字列ＣＳＤに対して感情ごとのスコアを算出することにより、認識文字列ＳＤに対して感情ごとのスコアを算出する場合と比較して、感情の推定精度を向上させることが可能になる。 (11) In each of the above aspects, the emotion score calculation processing unit 2522 calculated the score for each emotion with respect to the corrected recognition character string CSD, but calculated the score for each emotion with respect to the recognition character string SD. You may. However, the recognition character string SD includes a character string that is unnecessary for estimating emotions. Therefore, by calculating the score for each emotion for the corrected recognition character string CSD, it is possible to improve the estimation accuracy of the emotion as compared with the case of calculating the score for each emotion for the recognition character string SD. It will be possible.

（１２）上述の各態様では、発話者が日本語を話す例を用いたが、発話者が如何なる言語を話しても上述の各態様を適用することが可能である。例えば、発話者が、日本語以外の英語、フランス語、又は中国語等を話す場合であっても上述の各態様を適用できる。例えば、発話者が英語を話す場合、解析用辞書データ３１は、英語の形態素に関するデータであり、感情分類データ３３は、英単語を喜び、怒り、及び悲しみの何れかに分類したデータであればよい。 (12) In each of the above aspects, an example in which the speaker speaks Japanese is used, but each of the above aspects can be applied regardless of the language spoken by the speaker. For example, even when the speaker speaks English, French, Chinese, or the like other than Japanese, each of the above aspects can be applied. For example, when the speaker speaks English, the analysis dictionary data 31 is data related to English morphemes, and the emotion classification data 33 is data that classifies English words into any of joy, anger, and sadness. Good.

（１３）上述の各態様において、学習モデル実行処理部２５３２における予め学習した学習モデルは、音の特徴量が入力されると、発話者の感情が、興奮グループＧＥ３と非興奮グループＧＥ４との何れに属するかを示す第２感情グループデータＧＤ２を出力するモデルでもよい。 (13) In each of the above aspects, in the pre-learned learning model in the learning model execution processing unit 2532, when the sound feature amount is input, the speaker's emotion is either the excited group GE3 or the non-excited group GE4. It may be a model that outputs the second emotion group data GD2 indicating whether or not it belongs to.

（１４）上述の各態様において、喜び、怒り、及び悲しみ以外の人が取り得る感情に対しても、感情のグループ分けに従って適用してもよい。例えば、癒しを、ポジティブグループＧＥ１に属し、且つ、非興奮グループＧＥ４に属するとしてもよい。 (14) In each of the above aspects, emotions other than joy, anger, and sadness may be applied according to the grouping of emotions. For example, healing may belong to the positive group GE1 and the non-excited group GE4.

（１５）上述した各態様の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に(例えば、有線及び／又は無線)で接続し、これら複数の装置により実現されてもよい。 (15) The block diagram used in the description of each of the above-described embodiments shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wireless) may be connected and realized by these plurality of devices.

（１６）上述した各態様における処理手順、シーケンス、フローチャートなどは、矛盾のない限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 (16) The order of the processing procedures, sequences, flowcharts, etc. in each of the above-described aspects may be changed as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order, and are not limited to the particular order presented.

（１７）上述した各態様において、入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 (17) In each of the above-described aspects, the input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Input / output information and the like can be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

（１８）上述した各態様において、判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 (18) In each of the above-described aspects, the determination may be made by a value represented by 1 bit (0 or 1) or by a boolean value (Boolean: true or false). , May be done by numerical comparison (eg, comparison with a given value).

（１９）上述した各態様では、スマートフォン等の可搬型の情報処理装置をユーザ装置１として例示したが、ユーザ装置１の具体的な形態は任意であり、前述の各形態の例示には限定されない。例えば、可搬型又は据置型のパーソナルコンピュータをユーザ装置１として利用してもよい。 (19) In each of the above-described aspects, a portable information processing device such as a smartphone is illustrated as the user device 1, but the specific form of the user device 1 is arbitrary and is not limited to the above-mentioned examples of each form. .. For example, a portable or stationary personal computer may be used as the user device 1.

（２０）上述した各態様では、記憶装置３は、処理装置２が読取可能な記録媒体であり、ＲＯＭ及びＲＡＭなどを例示したが、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリデバイス(例えば、カード、スティック、キードライブ)、ＣＤ−ＲＯＭ（Compact Disc−ＲＯＭ）、レジスタ、リムーバブルディスク、ハードディスク、フロッピー（登録商標）ディスク、磁気ストリップ、データベース、サーバその他の適切な記憶媒体である。また、プログラムは、ネットワークから送信されても良い。また、プログラムは、電気通信回線を介して通信網から送信されても良い。 (20) In each of the above-described aspects, the storage device 3 is a recording medium that can be read by the processing device 2, and examples thereof include a ROM and a RAM. However, a flexible disk, a magneto-optical disk (for example, a compact disk, a digital versatile disk) Discs, Blu-ray® disks, smart cards, flash memory devices (eg cards, sticks, key drives), CD-ROMs (Compact Disc-ROMs), registers, removable disks, hard disks, floppies (registered trademarks) ) Disks, magnetic strips, databases, servers and other suitable storage media. The program may also be transmitted from the network. The program may also be transmitted from the communication network via a telecommunication line.

（２１）上述した各態様は、ＬＴＥ（Long Term Evolution）、ＬＴＥ−Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ、５Ｇ、ＦＲＡ（Future Radio Access）、Ｗ−ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ）、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ）、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及び／又はこれらに基づいて拡張された次世代システムに適用されてもよい。 (21) Each of the above-described aspects includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), W-CDMA (registered trademark). , GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark) ), Other systems that utilize suitable systems and / or next-generation systems that are extended based on them.

（２２）上述した各態様において、説明した情報及び信号などは、様々な異なる技術の何れかを使用して表されてもよい。例えば、上述の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。
なお、本明細書で説明した用語及び／又は本明細書の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。 (22) In each of the above aspects, the information, signals, etc. described may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.
In addition, the terms described in the present specification and / or the terms necessary for understanding the present specification may be replaced with terms having the same or similar meanings.

（２３）図１、図７、及び、図１０に例示された各機能は、ハードウェア及びソフトウェアの任意の組み合わせによって実現される。また、各機能は、単体の装置によって実現されてもよいし、相互に別体で構成された２個以上の装置によって実現されてもよい。 (23) Each of the functions illustrated in FIGS. 1, 7, and 10 is realized by any combination of hardware and software. In addition, each function may be realized by a single device, or may be realized by two or more devices configured as separate bodies from each other.

（２４）上述した各実施形態で例示したプログラムは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード又はハードウェア記述言語と呼ばれるか、他の名称によって呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順又は機能等を意味するよう広く解釈されるべきである。
また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線（ＤＳＬ）などの有線技術及び／又は赤外線、無線及びマイクロ波などの無線技術を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 (24) The programs exemplified in each of the above-described embodiments are called instructions, instruction sets, codes, code segments regardless of whether they are called software, firmware, middleware, microcode or hardware description language, or by other names. , Program code, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, execution threads, procedures or functions, etc. should be broadly interpreted to mean.
Further, software, instructions, and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology such as coaxial cable, fiber optic cable, twist pair and digital subscriber line (DSL) and / or wireless technology such as infrared, wireless and microwave to websites, servers, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

（２５）上述した各実施形態において、情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。 (25) In each of the above-described embodiments, the information, parameters, etc. may be represented by absolute values, relative values from a predetermined value, or other corresponding information. May be good.

（２６）上述したパラメータに使用する名称はいかなる点においても限定的なものではない。さらに、これらのパラメータを使用する数式等は、本明細書で明示的に開示したものと異なる場合もある。 (26) The names used for the above-mentioned parameters are not limited in any respect. Further, mathematical formulas and the like using these parameters may differ from those expressly disclosed herein.

（２７）上述した各実施形態において、ユーザ装置１は、移動局である場合が含まれる。移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。 (27) In each of the above-described embodiments, the user device 1 includes a case where it is a mobile station. Mobile stations can be subscriber stations, mobile units, subscriber units, wireless units, remote units, mobile devices, wireless devices, wireless communication devices, remote devices, mobile subscriber stations, access terminals, mobile terminals, wireless, depending on the trader. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

（２８）上述した各実施形態において、「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 (28) In each of the above embodiments, the phrase "based on" does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

（２９）本明細書で使用する「第１」、「第２」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。従って、第１及び第２の要素への参照は、２つの要素のみがそこで採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 (29) Any reference to elements using designations such as "first", "second" as used herein does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be adopted there, or that the first element must somehow precede the second element.

（３０）上述した各実施形態において「含む(ｉｎｃｌｕｄｉｎｇ)」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、及びそれらの変形が、本明細書あるいは特許請求の範囲で使用されている限り、これら用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 (30) As long as "inclusion," "comprising," and variations thereof in each of the embodiments described above are used herein or within the scope of the claims, these terms are used. As with the term "prepare", it is intended to be comprehensive. Furthermore, the term "or" as used herein or in the claims is intended not to be an exclusive OR.

（３１）本願の全体において、例えば、英語におけるa、an及びtheのように、翻訳によって冠詞が追加された場合、これらの冠詞は、文脈から明らかにそうではないことが示されていなければ、複数を含む。 (31) In the whole of the present application, if articles are added by translation, for example, a, an and the in English, unless the context clearly indicates that these articles are not. Including multiple.

（３２）本発明が本明細書中に説明した実施形態に限定されないことは当業者にとって明白である。本発明は、特許請求の範囲の記載に基づいて定まる本発明の趣旨及び範囲を逸脱することなく修正及び変更態様として実施できる。従って、本明細書の記載は、例示的な説明を目的とし、本発明に対して何ら制限的な意味を有さない。また、本明細書に例示した態様から選択された複数の態様を組み合わせてもよい。 (32) It will be apparent to those skilled in the art that the present invention is not limited to the embodiments described herein. The present invention can be implemented as modifications and modifications without departing from the gist and scope of the present invention, which is determined based on the description of the scope of claims. Therefore, the description herein is for illustrative purposes and has no limiting implications for the present invention. In addition, a plurality of aspects selected from the aspects illustrated in the present specification may be combined.

１，１ａ，１ｃ…ユーザ装置、８…集音装置、２１、２１Ｃ…取得部、２２…送信部、２５、２５ａ…推定部、２６…出力部、２５１…音声認識処理部、２５２…第１推定部、２５３…第２推定部、２５４、２５４ａ…感情推定部、ＥＤ…感情データ、ＧＥ１…ポジティブグループ、ＧＥ２…ネガティブグループ、ＧＥ３…興奮グループ、ＧＥ４…非興奮グループ、ＳＤ…認識文字列、ＶＤ…音声データ。 1,1a, 1c ... user device, 8 ... sound collector, 21, 21C ... acquisition unit, 22 ... transmission unit, 25, 25a ... estimation unit, 26 ... output unit, 251 ... voice recognition processing unit, 252 ... first Estimating unit, 253 ... 2nd estimation unit, 254, 254a ... Emotion estimation unit, ED ... Emotion data, GE1 ... Positive group, GE2 ... Negative group, GE3 ... Excitement group, GE4 ... Non-excitement group, SD ... Recognition character string, VD ... Voice data.

Claims

An acquisition unit that acquires voice data indicating sounds including the speaker's voice, and
Based on the recognition character string obtained by subjecting the voice data to voice recognition processing, the speaker's emotions are either the first group to which the positive emotions belong or the second group to which the negative emotions belong. The first estimation part that estimates whether it belongs to
Based on the sound features indicated by the voice data, it is estimated whether the speaker's emotion belongs to the third group to which the emotion during excitement belongs or the fourth group to which the emotion not during excitement belongs. The second estimation part and
An emotion estimation unit that estimates the emotion of the speaker based on the estimation result of the first estimation unit and the estimation result of the second estimation unit.
Emotion estimator equipped with.

The emotion estimation unit
When the estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the first group, it is estimated that the emotion of the speaker is joy.
The estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the second group, and the estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the third group. If shown, it is presumed that the speaker's emotions are anger,
The estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the second group, and the estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the fourth group. When indicated, the emotion of the speaker is presumed to be sadness,
The emotion estimation device according to claim 1.

The emotion estimation unit
When the estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the fourth group, it is estimated that the emotion of the speaker is sadness.
The estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the third group, and the estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the first group. When showing, it is presumed that the speaker's emotions are joy, and
The estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the third group, and the estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the second group. When indicated, it is presumed that the speaker's emotions are anger.
The emotion estimation device according to claim 1.

The first estimation unit
When the estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the third group, the emotion classification data for classifying the character string into any of joy, anger, and sadness is referred to. By comparing the character string of joy or anger included in the emotion classification data with the recognition character string, it is estimated whether the emotion of the speaker belongs to the first group or the second group. ,
The emotion estimation device according to claim 3.

The emotion estimation unit
The estimation result of the first estimation unit indicates that the emotion of the speaker belongs to the first group, and the estimation result of the second estimation unit indicates that the emotion of the speaker belongs to the fourth group. If indicated, determine that it cannot be estimated,
The emotion estimation device according to claim 1.

It is provided with an output unit that performs a process of adding a figure that embodies the emotion estimated by the emotion estimation unit to the recognition character string.
When the emotion estimation unit determines that the emotion estimation unit cannot estimate, the output unit includes a graphic character string obtained by adding a graphic that embodies joy to the recognition character string, and the recognition character. Outputs at least one graphic character string out of the graphic character string obtained by adding a graphic that embodies sadness to the column.
The emotion estimation device according to claim 5.

An emotion estimation system including the emotion estimation device according to any one of claims 1 to 5 and a terminal device capable of communicating with the emotion estimation device.
The terminal device is
A sound collecting unit that collects sounds including the voice of the speaker, and
A transmission unit that transmits the voice data indicating a sound including the voice of the speaker to the emotion estimation device, and
A receiving unit that receives the recognition character string and emotion data indicating the emotion of the speaker estimated by the emotion estimation unit from the emotion estimation device.
An output unit that outputs data obtained by processing the recognition character string according to the emotion indicated by the emotion data, and
Emotion estimation system with.

Acquires voice data indicating the sound including the speaker's voice,
Based on the recognition character string obtained by subjecting the voice data to voice recognition processing, the speaker's emotions are either the first group to which the positive emotions belong or the second group to which the negative emotions belong. Estimate whether it belongs to
Based on the sound features indicated by the voice data, it is estimated whether the speaker's emotion belongs to the third group to which the emotion during excitement belongs or the fourth group to which the emotion not during excitement belongs. ,
An estimation result indicating whether the speaker's emotion belongs to the first group or the second group, and whether the speaker's emotion belongs to the third group or the fourth group are shown. The emotion of the speaker is estimated based on the estimation result.
An emotion estimation method in which a computer performs processing.