JP2021012285A

JP2021012285A - Emotion estimation device, and emotion estimation system

Info

Publication number: JP2021012285A
Application number: JP2019126106A
Authority: JP
Inventors: 秀行窪田; Hideyuki Kubota; 博子進藤; Hiroko Shindo
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2021-02-04
Anticipated expiration: 2039-07-05
Also published as: JP7379788B2

Abstract

To precisely estimate emotion born by a user, even in a state that user voice has little expression of the emotion.SOLUTION: An emotion estimation device comprises: a generation unit which generates a feature amount for user voice, based on voice information on a user; a first evaluation unit which generates a first voice evaluation value indicating the intensity of first emotion born by the user, and a second voice evaluation value indicating the intensity of second emotion born by the user, based on the feature amount; a recognition unit which generates a recognition character string indicating an utterance content of the user, based on the voice information; a second evaluation unit which generates a first character evaluation value indicating the intensity of the first emotion born by the user, and a second character evaluation value indicating the intensity of the second emotion born by the user, based on the recognition character string; a correction unit which corrects the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value, based on situation information indicating the situation of the user; and an estimation unit which estimates one or more pieces of emotion born by the user, based on the correction result of the correction unit.SELECTED DRAWING: Figure 6

Description

本発明は、感情推定装置、及び、感情推定システムに関する。 The present invention relates to an emotion estimation device and an emotion estimation system.

近年、喜び、怒り、悲しみ、及び、平常等のユーザが抱く感情を、ユーザの状況を考慮して推定する技術が知られている。例えば、特許文献１には、ユーザの音声情報の特徴量と感情との関係を学習済みの学習モデルに、ユーザの状況に応じて補正された特徴量を入力して、学習モデルからユーザが抱く感情を出力させる技術が開示されている。 In recent years, there has been known a technique for estimating joy, anger, sadness, and emotions held by a normal user in consideration of the user's situation. For example, in Patent Document 1, a feature amount corrected according to the user's situation is input to a learning model in which the relationship between the feature amount of the user's voice information and emotions has been learned, and the user holds the feature amount from the learning model. A technique for outputting emotions is disclosed.

特開２０１８−０７２８７６号公報Japanese Unexamined Patent Publication No. 2018-072876

しかしながら、ユーザの音声について感情の発露が乏しい状況では、感情を音声に発露しにくい状況なのか、又は、そもそもユーザが感情を込めて発露していない状況なのかが判断することが困難であるため、ユーザが抱く感情を精度良く判断することが困難である。 However, in a situation where emotions are poorly expressed in the user's voice, it is difficult to determine whether it is difficult to express emotions in the voice or whether the user does not express emotions in the first place. , It is difficult to accurately judge the emotions that the user has.

本発明の好適な態様にかかる感情推定装置は、ユーザの音声情報に基づいて、前記ユーザの音声について特徴量を生成する生成部と、前記特徴量に基づいて、前記ユーザが第１感情を抱く強度を示す第１音声評価値と、前記ユーザが第２感情を抱く強度を示す第２音声評価値とを生成する第１評価部と、前記音声情報に基づいて、前記ユーザの発話内容を示す認識文字列を生成する認識部と、前記認識文字列に基づいて、前記ユーザが前記第１感情を抱く強度を示す第１文字評価値と、前記ユーザが前記第２感情を抱く強度を示す第２文字評価値とを生成する第２評価部と、前記ユーザの状況を示す状況情報に基づいて、前記第１音声評価値、前記第２音声評価値、前記第１文字評価値、及び、前記第２文字評価値を補正する補正部と、前記補正部の補正結果に基づいて、前記ユーザが抱く１つ以上の感情を推定する推定部と、を備える。 The emotion estimation device according to a preferred embodiment of the present invention includes a generation unit that generates a feature amount for the user's voice based on the user's voice information, and the user has a first emotion based on the feature amount. Based on the voice information, the first evaluation unit that generates the first voice evaluation value indicating the intensity and the second voice evaluation value indicating the intensity that the user has the second emotion, and the speech content of the user are shown. A recognition unit that generates a recognition character string, a first character evaluation value indicating the strength with which the user has the first emotion based on the recognition character string, and a first character evaluation value indicating the strength with which the user has the second emotion. The first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the above, based on the second evaluation unit that generates the two-character evaluation value and the situation information indicating the situation of the user. It includes a correction unit that corrects the second character evaluation value, and an estimation unit that estimates one or more emotions held by the user based on the correction result of the correction unit.

本発明の好適な態様にかかる感情推定システムは、上述の感情推定装置と、前記感情推定装置と通信可能な端末装置とを備える感情推定システムであって、前記端末装置は、前記ユーザの音声を集音する集音装置と、前記ユーザの状況を示す状況情報を生成する状況情報生成部と、前記ユーザの音声を示す前記音声情報及び前記状況情報を前記感情推定装置に送信し、前記感情推定装置から、前記認識文字列、及び、前記推定部が推定した前記ユーザが抱く１つ以上の感情を示す感情情報を受信する通信装置と、前記感情情報が示す感情に応じた処理を前記認識文字列に対して実行することにより得られる情報を出力する出力部と、を備える。 The emotion estimation system according to a preferred embodiment of the present invention is an emotion estimation system including the above-mentioned emotion estimation device and a terminal device capable of communicating with the emotion estimation device, and the terminal device uses the voice of the user. The sound collecting device for collecting sound, the situation information generation unit for generating the situation information indicating the user's situation, the voice information indicating the user's voice, and the situation information are transmitted to the emotion estimation device to estimate the emotion. The recognition character is a communication device that receives the recognition character string and emotion information indicating one or more emotions held by the user estimated by the estimation unit, and processing according to the emotion indicated by the emotion information. It includes an output unit that outputs information obtained by executing the column.

本発明によれば、ユーザの音声について感情の発露が乏しい状況であっても、ユーザが抱く感情を精度良く推定できる。 According to the present invention, it is possible to accurately estimate the emotions held by the user even in a situation where the emotions of the user's voice are poorly expressed.

ユーザ装置１の概要を示す図。The figure which shows the outline of the user apparatus 1. 第１実施形態にかかるユーザ装置１の構成を示すブロック図。The block diagram which shows the structure of the user apparatus 1 which concerns on 1st Embodiment. 解析用辞書情報３１の記憶内容の一例を示す図。The figure which shows an example of the storage contents of the dictionary information 31 for analysis. 感情分類情報３３の記憶内容の一例を示す図。The figure which shows an example of the memory content of the emotion classification information 33. スケジュール情報３５の記憶内容の一例を示す図。The figure which shows an example of the storage contents of the schedule information 35. ユーザ装置１の機能の概要を示す図。The figure which shows the outline of the function of the user apparatus 1. 状況関係情報３７の記憶内容の一例を示す図。The figure which shows an example of the memory content of the situation relation information 37. ユーザ装置１の動作を示すフローチャート。The flowchart which shows the operation of the user apparatus 1. 第２実施形態にかかるユーザ装置１ａを示すブロック図。The block diagram which shows the user apparatus 1a which concerns on 2nd Embodiment. 第２実施形態におけるユーザ装置１ａの機能の概要を示す図。The figure which shows the outline of the function of the user apparatus 1a in 2nd Embodiment. 文字列関係情報３８の記憶内容の一例を示す図。The figure which shows an example of the storage contents of the character string relation information 38. ユーザ装置１ａの動作を示すフローチャート。The flowchart which shows the operation of the user apparatus 1a. 感情推定システムＳＹＳの全体構成を示す図。The figure which shows the whole structure of the emotion estimation system SYS. ユーザ装置１ｂの構成を示すブロック図。The block diagram which shows the structure of the user apparatus 1b. サーバ装置１０の構成を示すブロック図。The block diagram which shows the structure of the server apparatus 10.

１．第１実施形態
図１は、ユーザ装置１の概要を示す図である。ユーザ装置１は、スマートフォンを想定する。ユーザ装置１が、「感情推定装置」の一例である。ただし、ユーザ装置１としては、任意の情報処理装置を採用することができ、例えば、パーソナルコンピュータ等の端末型の情報機器であってもよいし、ノートパソコン、ウェアラブル端末及びタブレット端末等の可搬型の情報端末であってもよい。 1. 1. 1st Embodiment FIG. 1 is a diagram showing an outline of a user device 1. The user device 1 is assumed to be a smartphone. The user device 1 is an example of an “emotion estimation device”. However, as the user device 1, any information processing device can be adopted, and for example, it may be a terminal-type information device such as a personal computer, or a portable type such as a notebook computer, a wearable terminal, or a tablet terminal. It may be an information terminal of.

ユーザ装置１は、ユーザ装置１を所持するユーザＵの音声を含む音を示す音声情報に対して音声認識処理を実行して得られた認識文字列を、他者が利用する装置に送信する機能、又は、ユーザＵの付近に位置する他者に聞かせるために、認識文字列を示す音を放音する機能を有する。さらに、ユーザ装置１は、ユーザＵの音声に基づいてユーザＵが抱く感情を推定し、認識文字列に対して、推定した感情に応じた図形を認識文字列に付加する、又は、推定した感情に応じた抑揚で認識文字列を示す音を放音することにより、コミュニケーションに必要な感情表現を付加できる。
図１の例では、ユーザＵが「こんにちは」と発声し、ユーザ装置１が、推定した感情に応じた図形ＰＩを、認識文字列を表す画像に付加している。 The user device 1 has a function of transmitting a recognition character string obtained by executing a voice recognition process on voice information indicating a sound including a sound of a user U who owns the user device 1 to a device used by another person. Or, it has a function of emitting a sound indicating a recognition character string in order to let another person located near the user U hear it. Further, the user device 1 estimates the emotion held by the user U based on the voice of the user U, adds a figure corresponding to the estimated emotion to the recognition character string, or adds the estimated emotion to the recognition character string. Emotional expressions necessary for communication can be added by emitting a sound indicating a recognition character string with intonation according to.
In the example of FIG. 1, a user U is say "Hello", the user device 1, a graphic PI according to the estimated emotions, appended to the image representing the recognized character string.

図２は、第１実施形態にかかるユーザ装置１の構成を示すブロック図である。ユーザ装置１は、処理装置２、記憶装置３、入力装置４、出力装置５、通信装置６、慣性センサ７、及び、ＧＰＳ（Global Positioning System）装置８を具備するコンピュータシステムにより実現される。ユーザ装置１の各要素は、情報を通信するための単体又は複数のバス９で相互に接続される。なお、本明細書における「装置」という用語は、回路、デバイス又はユニット等の他の用語に読替えてもよい。また、ユーザ装置１の各要素は、単数又は複数の機器で構成され、ユーザ装置１の一部の要素は省略されてもよい。 FIG. 2 is a block diagram showing a configuration of the user device 1 according to the first embodiment. The user device 1 is realized by a computer system including a processing device 2, a storage device 3, an input device 4, an output device 5, a communication device 6, an inertial sensor 7, and a GPS (Global Positioning System) device 8. Each element of the user device 1 is connected to each other by a single unit or a plurality of buses 9 for communicating information. The term "device" in the present specification may be read as another term such as a circuit, a device, or a unit. Further, each element of the user device 1 may be composed of a single device or a plurality of devices, and some elements of the user device 1 may be omitted.

処理装置２は、ユーザ装置１の全体を制御するプロセッサであり、例えば、単数又は複数のチップで構成される。処理装置２は、例えば、周辺装置とのインタフェース、演算装置及びレジスタ等を含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成される。なお、処理装置２の機能の一部又は全部を、ＤＳＰ（Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）等のハードウェアによって実現してもよい。処理装置２は、各種の処理を並列的又は逐次的に実行する。 The processing device 2 is a processor that controls the entire user device 1, and is composed of, for example, a single chip or a plurality of chips. The processing device 2 is composed of, for example, a central processing unit (CPU) including an interface with peripheral devices, an arithmetic unit, registers, and the like. Part or all of the functions of the processing device 2 are realized by hardware such as DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), PLD (Programmable Logic Device), FPGA (Field Programmable Gate Array), etc. You may. The processing device 2 executes various processes in parallel or sequentially.

記憶装置３は、処理装置２が読取可能な記録媒体であり、処理装置２が実行する制御プログラムＰＲを含む複数のプログラム、解析用辞書情報３１、感情分類情報３３、スケジュール情報３５、状況関係情報３７、及び、学習モデルＬＭを記憶する。記憶装置３は、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ROM）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ROM）、ＲＡＭ（Random Access Memory）等の記憶回路の１種類以上で構成される。 The storage device 3 is a recording medium that can be read by the processing device 2, and is a plurality of programs including a control program PR executed by the processing device 2, analysis dictionary information 31, emotion classification information 33, schedule information 35, and situation-related information. 37 and the learning model LM are stored. The storage device 3 is composed of, for example, one or more types of storage circuits such as ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), and RAM (Random Access Memory).

図３は、解析用辞書情報３１の記憶内容の一例を示す図である。解析用辞書情報３１は、形態素ごとに、品詞、品詞細分類、及び、原形情報を互いに対応付けた情報である。形態素は、意味を有する表現要素の最小単位の文字列である。品詞は、文法的性質によって分類された単語の種別であり、名詞、動詞、及び形容詞等である。品詞細分類は、品詞をさらに細分類した項目である。原形情報は、該当の形態素が活用する単語である場合、単語の原形を示す文字列であり、該当の形態素が活用しない単語である場合、該当の形態素と同一の文字列である。 FIG. 3 is a diagram showing an example of the stored contents of the analysis dictionary information 31. The analysis dictionary information 31 is information in which part of speech, part of speech subclassification, and original form information are associated with each other for each morpheme. A morpheme is a character string that is the smallest unit of a meaningful expression element. Part of speech is a type of word classified according to its grammatical nature, such as nouns, verbs, and adjectives. Part of speech subclassification is an item in which part of speech is further subdivided. The original form information is a character string indicating the original form of the word when the word is utilized by the corresponding morpheme, and is the same character string as the corresponding morpheme when the word is not utilized by the relevant morpheme.

図４は、感情分類情報３３の記憶内容の一例を示す図である。感情分類情報３３は、文字列を、喜び、怒り、悲しみ、及び、平常の何れかに分類した情報である。図４の例では、喜びに分類された文字列群３３１は、「嬉しい」、「合格」、「勝つ」、及び、「勝っ」等を含む。同様に、怒りに分類された文字列群３３２は、「イライラ」、及び、「むかっ腹」等を含む。同様に、悲しみに分類された文字列群３３３は、「悲しい」、及び、「敗ける」等を含む。同様に、平常に分類された文字列群３３４は、「安心」等を含む。 FIG. 4 is a diagram showing an example of the stored contents of the emotion classification information 33. The emotion classification information 33 is information in which the character string is classified into any of joy, anger, sadness, and normal. In the example of FIG. 4, the character string group 331 classified as joy includes "happy", "pass", "win", "win", and the like. Similarly, the character string group 332 classified as anger includes "irritated", "mucked up" and the like. Similarly, the character string group 333 classified as sad includes "sad", "losing", and the like. Similarly, the character string group 334 normally classified includes "safety" and the like.

図５は、スケジュール情報３５の記憶内容の一例を示す図である。スケジュール情報３５は、ユーザＵのスケジュールを示す。図５に示すスケジュール情報３５は、レコード３５＿１〜３５＿３を有する。レコード３５＿１は、２０１９年４月１０日の１０時から１１時までのユーザＵの予定が、クライアントと打合せであることを示す。レコード３５＿２は、２０１９年４月１２日の１５時から１６時までのユーザＵの予定が、部内会議への出席であることを示す。レコード３５＿３は、２０１９年４月１５日の１８時から２０時までのユーザＵの予定が、同窓会への出席であることを示す。 FIG. 5 is a diagram showing an example of the stored contents of the schedule information 35. Schedule information 35 indicates the schedule of user U. The schedule information 35 shown in FIG. 5 has records 35_1 to 35_3. Record 35_1 indicates that User U's schedule from 10:00 to 11:00 on April 10, 2019 is a meeting with the client. Record 35_2 indicates that User U's schedule from 15:00 to 16:00 on April 12, 2019 is to attend an internal meeting. Record 35_3 indicates that User U's schedule from 18:00 to 20:00 on April 15, 2019 is to attend the alumni association.

説明を図２に戻す。状況関係情報３７は、感情を推定する際に用いられる。学習モデルＬＭは、人間の音声に応じた特徴量と、複数の感情の各々に対する強度との関係を学習済みである。また、学習モデルＬＭは、複数の人間について、人間の音声に応じた特徴量と、複数の感情の各々に対する強度との関係を学習済みであることが好ましい。 The explanation is returned to FIG. The situation-related information 37 is used when estimating emotions. In the learning model LM, the relationship between the feature amount corresponding to the human voice and the intensity for each of the plurality of emotions has been learned. Further, it is preferable that the learning model LM has learned the relationship between the feature amount corresponding to the human voice and the intensity for each of the plurality of emotions for a plurality of humans.

入力装置４は、ユーザ装置１が使用する情報を処理装置２に入力するための機器である。入力装置４は、集音装置４１と、タッチパネル４３とを含む。出力装置５は、情報を出力するための機器である。出力装置５は、表示装置５１と、放音装置５３とを含む。 The input device 4 is a device for inputting information used by the user device 1 into the processing device 2. The input device 4 includes a sound collecting device 41 and a touch panel 43. The output device 5 is a device for outputting information. The output device 5 includes a display device 51 and a sound emitting device 53.

集音装置４１は、例えばマイクロフォン及びＡＤ変換器で構成され、処理装置２による制御のもとで、ユーザＵの音声を含む音を集音する。マイクロフォンは、集音した音声を電気信号に変換する。ＡＤ変換器は、マイクロフォンが変換した電気信号をＡＤ変換して、図６に示す音声情報ＶＩに変換する。音声情報ＶＩが示す音には、発話者の音声に加えて、発話者の周囲から発せられた雑音が含まれ得る。タッチパネル４３は、表示装置５１の表示面に対する接触を検出する。なお、ユーザＵが操作可能な複数の操作子をタッチパネル４３が含んでもよい。 The sound collecting device 41 is composed of, for example, a microphone and an AD converter, and collects sounds including the voice of the user U under the control of the processing device 2. The microphone converts the collected voice into an electric signal. The AD converter AD-converts the electric signal converted by the microphone and converts it into the voice information VI shown in FIG. The sound indicated by the voice information VI may include noise emitted from the surroundings of the speaker in addition to the voice of the speaker. The touch panel 43 detects contact with the display surface of the display device 51. The touch panel 43 may include a plurality of controls that can be operated by the user U.

表示装置５１は、処理装置２による制御のもとで各種の画像を表示する。例えば液晶表示パネル、有機ＥＬ（electro-luminescence）表示パネル等の各種の表示パネルが、表示装置５１として好適に利用される。放音装置５３は、例えばスピーカで構成され、処理装置２による制御のもとで、音を放音する。 The display device 51 displays various images under the control of the processing device 2. For example, various display panels such as a liquid crystal display panel and an organic EL (electro-luminescence) display panel are preferably used as the display device 51. The sound emitting device 53 is composed of, for example, a speaker, and emits sound under the control of the processing device 2.

通信装置６は、ネットワークを介して他の装置と通信を行うためのハードウェア（送受信デバイス）である。通信装置６は、例えば、ネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュール等とも呼ばれる。 The communication device 6 is hardware (transmission / reception device) for communicating with another device via a network. The communication device 6 is also called, for example, a network device, a network controller, a network card, a communication module, or the like.

慣性センサ７は、ユーザ装置１にかかる慣性力を測定し、測定結果から得られる、図６に示す慣性情報ＩＦＩを出力する。例えば、慣性センサ７は、加速度センサ及び角速度センサの一方又は両方である。 The inertial sensor 7 measures the inertial force applied to the user device 1 and outputs the inertial information IFI shown in FIG. 6 obtained from the measurement result. For example, the inertial sensor 7 is one or both of an acceleration sensor and an angular velocity sensor.

ＧＰＳ装置８は、複数の衛星からの電波を受信し、図６に示す位置情報ＰｏＩを生成する。位置情報ＰｏＩは、位置を特定できるのであれば、どのような形式であってもよい。位置情報ＰｏＩは、例えば、ユーザ装置１の緯度と経度とを示す。また、本実施形態では、位置情報ＰｏＩはＧＰＳ装置８から得られることを例示するが、ユーザ装置１は、どのような方法で位置情報ＰｏＩを取得してもよい。例えば、ユーザ装置１は、ユーザ装置１の通信先となる基地局に割り当てられたセルＩＤ（IDentifier）を位置情報ＰｏＩとして取得する。セルＩＤは基地局を一意に識別する識別情報である。さらに、ユーザ装置１が無線ＬＡＮ（Local Area Network）のアクセスポイントと通信する場合には、アクセスポイントに割り当てられたネットワーク上の識別アドレス（ＭＡＣ（Media Access Control）アドレス）と実際の住所（位置）とを対応付けたデータベースを参照して位置情報ＰｏＩを取得してもよい。 The GPS device 8 receives radio waves from a plurality of satellites and generates the position information PoI shown in FIG. The position information PoI may be in any format as long as the position can be specified. The position information PoI indicates, for example, the latitude and longitude of the user device 1. Further, in the present embodiment, it is illustrated that the position information PoI is obtained from the GPS device 8, but the user device 1 may acquire the position information PoI by any method. For example, the user device 1 acquires a cell ID (IDentifier) assigned to a base station that is a communication destination of the user device 1 as position information PoI. The cell ID is identification information that uniquely identifies the base station. Further, when the user device 1 communicates with the access point of the wireless LAN (Local Area Network), the identification address (MAC (Media Access Control) address) on the network assigned to the access point and the actual address (location). The location information PoI may be acquired by referring to the database associated with.

１．１．第１実施形態の機能
処理装置２は、記憶装置３から制御プログラムＰＲを読み取り実行することによって、取得部２１、状況情報生成部２３、感情情報生成部２５、及び、出力部２７として機能する。
図６を用いて、処理装置２によって実現される機能について説明する。 1.1. The function processing device 2 of the first embodiment functions as an acquisition unit 21, a situation information generation unit 23, an emotion information generation unit 25, and an output unit 27 by reading and executing the control program PR from the storage device 3.
The function realized by the processing apparatus 2 will be described with reference to FIG.

図６は、ユーザ装置１の機能の概要を示す図である。取得部２１は、集音装置４１が出力する音声情報ＶＩを取得する。 FIG. 6 is a diagram showing an outline of the functions of the user device 1. The acquisition unit 21 acquires the voice information VI output by the sound collecting device 41.

状況情報生成部２３は、ＧＰＳ装置８から得られる位置情報ＰｏＩ、慣性センサ７から得られる慣性情報ＩＦＩ、及び、スケジュール情報３５に基づいて、ユーザＵの状況を示す状況情報ＳｉＩを生成する。より詳細には、状況情報生成部２３は、位置情報ＰｏＩに基づいて、ユーザＵの場所を特定する。場所には、例えば、ユーザＵの自宅、ユーザＵの会社、及び、ユーザＵの自宅から会社までの通勤経路内等がある。また、状況情報生成部２３は、慣性情報ＩＦＩに基づいて、ユーザＵが停止しているか、ユーザＵが歩行しているか、又は、ユーザＵが車両に乗り移動中か、を判定する。そして、状況情報生成部２３は、特定した場所が自宅である場合、状況情報ＳｉＩとして、ユーザＵが自宅にいる状況を示す識別情報を生成する。また、特定した場所が通勤経路内であり、且つ、ユーザＵが車両に乗り移動中である場合、状況情報生成部２３は、状況情報ＳｉＩとして、ユーザＵが電車を利用している状況を示す識別情報を生成する。電車を利用している状況は、「公共の交通機関を利用している状況」の一例である。また、スケジュール情報３５が、現在の時間におけるユーザＵの予定について、スケジュール情報３５が打合せ又は会議を示す場合、状況情報生成部２３は、状況情報ＳｉＩとして、ユーザＵが会議中である状況を示す識別情報を生成する。 The status information generation unit 23 generates status information SiI indicating the status of the user U based on the position information PoI obtained from the GPS device 8, the inertial information IFI obtained from the inertial sensor 7, and the schedule information 35. More specifically, the situation information generation unit 23 identifies the location of the user U based on the location information PoI. The location includes, for example, the home of the user U, the company of the user U, and the commuting route from the home of the user U to the company. Further, the situation information generation unit 23 determines whether the user U is stopped, the user U is walking, or the user U is riding in the vehicle and moving based on the inertial information IFI. Then, when the specified place is home, the situation information generation unit 23 generates identification information indicating the situation where the user U is at home as the situation information SiI. Further, when the specified place is in the commuting route and the user U is moving in the vehicle, the situation information generation unit 23 indicates the situation where the user U is using the train as the situation information SiI. Generate identification information. The situation of using a train is an example of "a situation of using public transportation". Further, when the schedule information 35 indicates a meeting or a meeting regarding the schedule of the user U at the current time, the status information generation unit 23 indicates a situation in which the user U is in a meeting as the status information SiI. Generate identification information.

感情情報生成部２５は、ユーザＵが抱く複数の感情の中から、ユーザＵが抱く１つ以上の感情を推定する。第１実施形態において、ユーザＵが抱く複数の感情は、喜び、怒り、悲しみ、及び、平常の４つであるとして説明する。以下、喜び、怒り、悲しみ、及び、平常は複数の感情の一例である。 The emotion information generation unit 25 estimates one or more emotions held by the user U from a plurality of emotions held by the user U. In the first embodiment, the plurality of emotions held by the user U will be described as four emotions: joy, anger, sadness, and normality. Below, joy, anger, sadness, and normality are examples of multiple emotions.

感情情報生成部２５は、特徴量生成部２５１、第１評価部２５２、認識部２５４、第２評価部２５５、補正部２５７、及び、推定部２５８を含む。特徴量生成部２５１は、「生成部」の一例である。 The emotion information generation unit 25 includes a feature amount generation unit 251, a first evaluation unit 252, a recognition unit 254, a second evaluation unit 255, a correction unit 257, and an estimation unit 258. The feature amount generation unit 251 is an example of the “generation unit”.

特徴量生成部２５１は、音声情報ＶＩから、ユーザＵの音声について特徴量を生成する。特徴量は、例えば、MFCC（Mel-Frequency Cepstrum Coefficients）12次元、ラウドネス、基本周波数(F0)、音声確率、ゼロ交差率、HNR（Harmonics-to-Noise-Ratio）、及びこれらの一次微分、MFCC及びラウドネスの二次微分の計４７個の一部又は全部である。ラウドネスは、音の大きさであり、人間の聴覚が感じる音の強さを示す。音声確率は、音声情報ＶＩが示す音に音声が含まれる確率を示す。ゼロ交差率は、音圧がゼロとなった回数である。また、特徴量生成部２５１は、音声情報ＶＩに対して補正処理を実行し、補正処理の実行により得られた補正後音声情報から、特徴量を抽出してもよい。補正処理は、例えば、音声情報ＶＩから無音部分のデータを除去する処理、及び、音声情報ＶＩが示す音に含まれるノイズを除去する処理の一方又は両方である。 The feature amount generation unit 251 generates a feature amount for the voice of the user U from the voice information VI. Features include, for example, MFCC (Mel-Frequency Cepstrum Coefficients) 12 dimensions, loudness, fundamental frequency (F0), loudness, zero crossover ratio, HNR (Harmonics-to-Noise-Ratio), and their first derivative, MFCC. And a total of 47 second-order differentials of loudness, part or all. Loudness is the loudness of a sound, which indicates the intensity of the sound felt by human hearing. The voice probability indicates the probability that the sound indicated by the voice information VI includes voice. The zero crossing rate is the number of times the sound pressure becomes zero. Further, the feature amount generation unit 251 may execute the correction process on the voice information VI and extract the feature amount from the corrected voice information obtained by executing the correction process. The correction process is, for example, one or both of a process of removing silent portion data from the voice information VI and a process of removing noise contained in the sound indicated by the voice information VI.

第１評価部２５２は、特徴量に基づいて、複数の感情の各々に対応する音声評価値ＥＶ１〜ＥＶ４を生成する。より詳細には、第１評価部２５２は、喜びに対応する音声評価値ＥＶ１、怒りに対応する音声評価値ＥＶ２、悲しみに対応する音声評価値ＥＶ３、及び、平常に対応する音声評価値ＥＶ４を生成する。音声評価値ＥＶは、ユーザＵが感情を抱く強度を示す。
以下の説明では、同種の要素を区別する場合には、喜びに対応する音声評価値ＥＶ１、怒りに対応する音声評価値ＥＶ２のように参照符号を使用する。一方、同種の要素を区別しない場合には、音声評価値ＥＶのように、参照符号のうちの共通番号だけを使用する。
なお、喜び、怒り、悲しみ、及び、平常の各々が、「第１感情」の一例である。「第２感情」は、喜び、怒り、悲しみ、及び、平常のうち「第１感情」とは異なる感情のうち任意の１つの感情である。喜びに対応する音声評価値ＥＶ１、怒りに対応する音声評価値ＥＶ２、悲しみに対応する音声評価値ＥＶ３、平常に対応する音声評価値ＥＶ４が、「第１音声評価値」の一例である。第１感情とは異なる第２感情に対応する音声評価値が、「第２音声評価値」の一例である。
例えば、第１評価部２５２は、以下に示す２つの態様のいずれか一方によって、音声評価値ＥＶ１〜ＥＶ４を生成する。 The first evaluation unit 252 generates voice evaluation values EV1 to EV4 corresponding to each of the plurality of emotions based on the feature amount. More specifically, the first evaluation unit 252 sets the voice evaluation value EV1 corresponding to joy, the voice evaluation value EV2 corresponding to anger, the voice evaluation value EV3 corresponding to sadness, and the voice evaluation value EV4 corresponding to normal times. Generate. The voice evaluation value EV indicates the intensity with which the user U has emotions.
In the following description, when distinguishing the same kind of elements, reference codes are used such as the voice evaluation value EV1 corresponding to joy and the voice evaluation value EV2 corresponding to anger. On the other hand, when the same type of elements are not distinguished, only the common number among the reference codes is used as in the voice evaluation value EV.
In addition, joy, anger, sadness, and normality are each examples of "first emotion". The "second emotion" is any one of joy, anger, sadness, and normal emotions that are different from the "first emotion". The voice evaluation value EV1 corresponding to joy, the voice evaluation value EV2 corresponding to anger, the voice evaluation value EV3 corresponding to sadness, and the voice evaluation value EV4 corresponding to normal times are examples of the "first voice evaluation value". A voice evaluation value corresponding to a second emotion different from the first emotion is an example of a “second voice evaluation value”.
For example, the first evaluation unit 252 generates the voice evaluation values EV1 to EV4 by one of the two modes shown below.

第１の態様において、第１評価部２５２は、特徴量と所定値とを比較することにより、音声評価値ＥＶ１〜ＥＶ４を生成する。例えば、喜び又は怒りが音声に発露する場合、喜び又は怒りが音声に発露しない場合と比較して、基本周波数が高くなり、且つ、ラウドネスが大きくなる傾向がある。例えば、第１評価部２５２は、基本周波数が所定値より大きく、かつ、ラウドネスが所定値より大きい場合に、基本周波数が所定値より小さく、かつ、ラウドネスが所定値より小さい場合と比較して、音声評価値ＥＶ１及び音声評価値ＥＶ２を大きい値に設定する。 In the first aspect, the first evaluation unit 252 generates the voice evaluation values EV1 to EV4 by comparing the feature amount with the predetermined value. For example, when joy or anger is expressed in voice, the fundamental frequency tends to be higher and the loudness tends to be higher than when joy or anger is not expressed in voice. For example, in the first evaluation unit 252, when the fundamental frequency is larger than the predetermined value and the loudness is larger than the predetermined value, the fundamental frequency is smaller than the predetermined value and the loudness is smaller than the predetermined value. The voice evaluation value EV1 and the voice evaluation value EV2 are set to large values.

第２の態様において、第１評価部２５２は、音声評価値ＥＶ１〜ＥＶ４を生成するために、学習モデルＬＭに、特徴量生成部２５１が生成した特徴量を入力し、音声評価値ＥＶ１〜ＥＶ４を学習モデルＬＭから取得する。 In the second aspect, the first evaluation unit 252 inputs the feature amount generated by the feature amount generation unit 251 into the learning model LM in order to generate the voice evaluation values EV1 to EV4, and the voice evaluation values EV1 to EV4. Is obtained from the learning model LM.

認識部２５４は、音声情報ＶＩに基づいて、ユーザＵの発話内容を示す認識文字列ＲＴを生成する。より詳細には、認識部２５４は、例えば、予め用意された音響モデル及び言語モデルを用いて、音声から文字列を認識する手法を含む音声認識処理を実行して、認識文字列ＲＴを出力する。 The recognition unit 254 generates a recognition character string RT indicating the utterance content of the user U based on the voice information VI. More specifically, the recognition unit 254 executes a voice recognition process including a method of recognizing a character string from a voice using, for example, an acoustic model and a language model prepared in advance, and outputs a recognition character string RT. ..

第２評価部２５５は、認識文字列ＲＴに基づいて、複数の感情の各々に対応する文字評価値ＥＴを生成する。より詳細には、第２評価部２５５は、喜びに対応する文字評価値ＥＴ１、怒りに対応する文字評価値ＥＴ２、悲しみに対応する文字評価値ＥＴ３、及び、平常に対応する文字評価値ＥＴ４を生成する。文字評価値ＥＴは、ユーザＵが感情を抱く強度を示す。
なお、喜びに対応する文字評価値ＥＴ１、怒りに対応する文字評価値ＥＴ２、悲しみの文字評価値ＥＴ３、及び、平常に対応する文字評価値ＥＴ４が、「第１文字評価値」の一例である。第１感情とは異なる第２感情に対応する文字評価値が、「第２文字評価値」の一例である。 The second evaluation unit 255 generates a character evaluation value ET corresponding to each of the plurality of emotions based on the recognition character string RT. More specifically, the second evaluation unit 255 sets the character evaluation value ET1 corresponding to joy, the character evaluation value ET2 corresponding to anger, the character evaluation value ET3 corresponding to sadness, and the character evaluation value ET4 corresponding to normal times. Generate. The character evaluation value ET indicates the intensity with which the user U has emotions.
The character evaluation value ET1 corresponding to joy, the character evaluation value ET2 corresponding to anger, the character evaluation value ET3 corresponding to sadness, and the character evaluation value ET4 corresponding to normal times are examples of the "first character evaluation value". .. The character evaluation value corresponding to the second emotion different from the first emotion is an example of the “second character evaluation value”.

より詳細には、第２評価部２５５は、解析部２５５２と、算出部２５５４とを含む。解析部２５５２は、解析用辞書情報３１を参照して、認識文字列ＲＴに対して形態素解析処理を実行して、補正後認識文字列ＣＲＴを出力する。形態素解析処理は、認識文字列ＲＴを形態素ごとに分解する処理である。形態素解析処理において、解析用辞書情報３１の品詞及び品詞細分類が利用される。補正後認識文字列ＣＲＴは、フィラー等といった、ユーザＵが抱く感情を推定するためには不要な文字列を除いた文字列である。フィラーは、「ええと」、「あの」、及び、「まあ」といった、発話の合間に挟み込む言葉である。 More specifically, the second evaluation unit 255 includes an analysis unit 2552 and a calculation unit 2554. The analysis unit 2552 refers to the analysis dictionary information 31 to execute morphological analysis processing on the recognition character string RT, and outputs the corrected recognition character string CRT. The morphological analysis process is a process of decomposing the recognition character string RT for each morpheme. In the morphological analysis process, the part of speech and the part of speech subclassification of the analysis dictionary information 31 are used. The corrected recognition character string CRT is a character string excluding a character string such as a filler, which is unnecessary for estimating the emotion held by the user U. Filler is a word that is inserted between utterances, such as "um", "that", and "well".

算出部２５５４は、感情分類情報３３に含まれる文字列と、補正後認識文字列ＣＲＴとを比較することにより各感情の文字評価値ＥＴを算出する。各感情の文字評価値ＥＴの算出について、算出部２５５４は、補正後認識文字列ＣＲＴが、感情分類情報３３に含まれる文字列を含む場合に、この補正後認識文字列ＣＲＴに含まれる文字列に対応する感情の文字評価値ＥＴを増加させる。
例えば、補正後認識文字列ＣＲＴが「今日試合に勝った」であれば、算出部２５５４は、以下のような各感情の文字評価値ＥＴを出力する。 The calculation unit 2554 calculates the character evaluation value ET of each emotion by comparing the character string included in the emotion classification information 33 with the corrected recognition character string CRT. Regarding the calculation of the character evaluation value ET of each emotion, the calculation unit 2554 calculates the character string included in the corrected recognition character string CRT when the corrected recognition character string CRT includes the character string included in the emotion classification information 33. Increases the character evaluation value ET of the emotion corresponding to.
For example, if the corrected recognition character string CRT is "winning the game today", the calculation unit 2554 outputs the character evaluation value ET of each emotion as follows.

喜びに対応する文字評価値ＥＴ１１
怒りに対応する文字評価値ＥＴ２０
悲しみに対応する文字評価値ＥＴ３０
平常に対応する文字評価値ＥＴ４０ Character evaluation value corresponding to joy ET1 1
Character evaluation value ET20 corresponding to anger
Character evaluation value ET30 corresponding to sadness
Character evaluation value ET40 corresponding to normal times

上述の例では、補正後認識文字列ＣＲＴに、感情分類情報３３に含まれる「勝っ」が含まれているため、算出部２５５４は、「勝っ」に対応する喜びの文字評価値ＥＴ１を１増加させる。増加させる文字評価値ＥＴの増加量は、１に限らなく、感情分類情報３３に含まれる文字列ごとに異なってもよい。例えば、より喜びを強く示す文字列に対する文字評価値ＥＴの増加量を２としてもよい。さらに、補正後認識文字列ＣＲＴに、感情分類情報３３に含まれる文字列、及び、内容を強調する文字列が含まれる場合、算出部２５５４は、文字評価値ＥＴの増加量を大きくしてもよい。例えば、補正後認識文字列ＣＲＴが「今日試合に勝ててとても嬉しい」であれば、補正後認識文字列ＣＲＴに感情分類情報３３に含まれる「嬉しい」が含まれており、かつ、「とても」という内容を強調する文字列が含まれるため、算出部２５５４は、例えば、喜びの文字評価値ＥＴ１を２増加させる。補正後認識文字列ＣＲＴのうち、どの文字列が、内容を強調する文字列であるか否かは、形態素解析処理によって得られる形態素によって判定することができる。以下の例では、説明を容易にするため、増加させる文字評価値ＥＴの増加量が１であるとする。
さらに、補正後認識文字列ＣＲＴに、感情分類情報３３に含まれる文字列、及び、内容を否定する文字列が含まれる場合、算出部２５５４は、この補正後認識文字列ＣＲＴに含まれる文字列に対応する文字評価値ＥＴを増加させる処理とは異なる処理を実行してもよい。例えば、補正後認識文字列ＣＲＴが「今日試合に勝つことができなかった」であれば、補正後認識文字列ＣＲＴに感情分類情報３３に含まれる「勝つ」が含まれるが、「なかっ」という内容を否定する文字列が含まれるため、算出部２５５４は、例えば、悲しみの文字評価値ＥＴ３を１増加させる。補正後認識文字列ＣＲＴのうち、どの文字列が、内容を否定する文字列であるか否かは、形態素解析処理によって得られる形態素によって判定することができる。このように、形態素解析処理によって、補正後認識文字列ＣＲＴが肯定的な内容なのか否定的な内容かを推定することが可能である。以下の例では、説明を容易にするため、補正後認識文字列ＣＲＴに、感情分類情報３３に含まれる文字列が含まれれば、この補正後認識文字列ＣＲＴに含まれる文字列に対応する文字評価値ＥＴを増加させることとして説明を行う。 In the above example, since the corrected recognition character string CRT includes the "win" included in the emotion classification information 33, the calculation unit 2554 increases the joy character evaluation value ET1 corresponding to the "win" by 1. Let me. The amount of increase in the character evaluation value ET to be increased is not limited to 1, and may differ for each character string included in the emotion classification information 33. For example, the amount of increase in the character evaluation value ET for a character string that more strongly indicates joy may be set to 2. Further, when the corrected recognition character string CRT includes the character string included in the emotion classification information 33 and the character string that emphasizes the content, the calculation unit 2554 may increase the increase amount of the character evaluation value ET. Good. For example, if the corrected recognition character string CRT is "very happy to win the game today", the corrected recognition character string CRT contains "happy" included in the emotion classification information 33 and is "very". Since the character string emphasizing the content is included, the calculation unit 2554 increases, for example, the character evaluation value ET1 of joy by 2. Which character string of the corrected recognition character string CRT is a character string that emphasizes the content can be determined by the morpheme obtained by the morphological analysis process. In the following example, for the sake of simplicity, it is assumed that the amount of increase in the character evaluation value ET to be increased is 1.
Further, when the corrected recognition character string CRT includes a character string included in the emotion classification information 33 and a character string denying the content, the calculation unit 2554 determines the character string included in the corrected recognition character string CRT. A process different from the process of increasing the character evaluation value ET corresponding to the above may be executed. For example, if the corrected recognition character string CRT is "could not win the game today", the corrected recognition character string CRT includes "win" included in the emotion classification information 33, but is said to be "not". Since the character string denying the content is included, the calculation unit 2554 increases, for example, the character evaluation value ET3 of sadness by 1. Which character string of the corrected recognition character string CRT is a character string whose content is negated can be determined by the morpheme obtained by the morphological analysis process. In this way, it is possible to estimate whether the corrected recognition character string CRT has positive content or negative content by the morphological analysis process. In the following example, if the corrected recognition character string CRT includes the character string included in the emotion classification information 33, the character corresponding to the character string included in the corrected recognition character string CRT is included in the following example. The explanation will be given as increasing the evaluation value ET.

補正部２５７は、状況情報ＳｉＩに基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正する。より詳細には、補正部２５７は、音声評価値ＥＶ１、音声評価値ＥＶ２、音声評価値ＥＶ３、音声評価値ＥＶ４を補正して、補正後の音声評価値ＣＥＶ１、補正後の音声評価値ＣＥＶ２、補正後の音声評価値ＣＥＶ３、補正後の音声評価値ＣＥＶ４を出力する。同様に、補正部２５７は、文字評価値ＥＴ１、文字評価値ＥＴ２、文字評価値ＥＴ３、文字評価値ＥＴ４を補正して、補正後の文字評価値ＣＥＴ１、補正後の文字評価値ＣＥＴ２、補正後の文字評価値ＣＥＴ３、補正後の文字評価値ＣＥＴ４を出力する。 The correction unit 257 corrects the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the situation information SiI. More specifically, the correction unit 257 corrects the voice evaluation value EV1, the voice evaluation value EV2, the voice evaluation value EV3, and the voice evaluation value EV4, and corrects the voice evaluation value CEV1 and the corrected voice evaluation value CEV2. The corrected voice evaluation value CEV3 and the corrected voice evaluation value CEV4 are output. Similarly, the correction unit 257 corrects the character evaluation value ET1, the character evaluation value ET2, the character evaluation value ET3, and the character evaluation value ET4, and corrects the corrected character evaluation value CET1, the corrected character evaluation value CET2, and the corrected character evaluation value CET2. The character evaluation value CET3 and the corrected character evaluation value CET4 are output.

より詳細には、補正部２５７は、下記演算を実行することにより、補正後の音声評価値ＣＥＶ１〜ＣＥＶ４、及び、補正後の文字評価値ＣＥＴ１〜ＣＥＴ４を出力する。 More specifically, the correction unit 257 outputs the corrected voice evaluation values CEV1 to CEV4 and the corrected character evaluation values CET1 to CET4 by executing the following calculation.

ＣＥＶ１＝ｋｖ１×ＥＶ１
ＣＥＶ２＝ｋｖ２×ＥＶ２
ＣＥＶ３＝ｋｖ３×ＥＶ３
ＣＥＶ４＝ｋｖ４×ＥＶ４
ＣＥＴ１＝ｋｔ１×ＥＴ１
ＣＥＴ２＝ｋｔ２×ＥＴ２
ＣＥＴ３＝ｋｔ３×ＥＴ３
ＣＥＴ４＝ｋｔ４×ＥＴ４ CEV1 = kv1 x EV1
CEV2 = kv2 x EV2
CEV3 = kv3 x EV3
CEV4 = kv4 x EV4
CET1 = kt1 x ET1
CET2 = kt2 x ET2
CET3 = kt3 x ET3
CET4 = kt4 x ET4

ただし、重み付け係数ｋｖ１〜ｋｖ４、及び、重み付け係数ｋｔ１〜ｋｔ４は、０以上１以下の実数である。さらに、重み付け係数ｋｖ１〜ｋｖ４、及び、重み付け係数ｋｔ１〜ｋｔ４には、以下の条件がある。 However, the weighting coefficients kv1 to kv4 and the weighting coefficients kt1 to kt4 are real numbers of 0 or more and 1 or less. Further, the weighting coefficients kv1 to kv4 and the weighting coefficients kt1 to kt4 have the following conditions.

ｋｖ１＋ｋｔ１＝ｋｖ２＋ｋｔ２＝ｋｖ３＋ｋｔ３＝ｋｖ４＋ｋｔ４ kv1 + kt1 = kv2 + kt2 = kv3 + kt3 = kv4 + kt4

補正部２５７は、例えば、状況関係情報３７を参照して、重み付け係数ｋｖ１〜ｋｖ４、及び、重み付け係数ｋｔ１〜ｋｔ４を設定する。 For example, the correction unit 257 sets the weighting coefficients kv1 to kv4 and the weighting coefficients kt1 to kt4 with reference to the situation-related information 37.

図７は、状況関係情報３７の記憶内容の一例を示す図である。状況関係情報３７は、人間が取り得る状況を示す識別情報と、当該状況に応じて設定された複数の音声評価値ＥＶの各々、及び、複数の文字評価値ＥＴの各々との関係を示す。 FIG. 7 is a diagram showing an example of the stored contents of the situation-related information 37. The situation-related information 37 shows the relationship between the identification information indicating a situation that a human can take, each of the plurality of voice evaluation values EV set according to the situation, and each of the plurality of character evaluation values ET.

人間が取り得る状況には、ユーザの許可なく立ち入りが禁止されるプライベート空間内に当該ユーザがいる状況と、ユーザの許可なく立ち入りが可能な非プライベート空間に当該ユーザがいる状況とが含まれる。プライベート空間は、例えば、自宅の中である。非プライベート空間は、例えば、公共の交通機関内、及び、職場の中である。 The situations that humans can take include a situation in which the user is in a private space where entry is prohibited without the permission of the user, and a situation in which the user is in a non-private space where access is possible without the permission of the user. The private space is, for example, in the home. Non-private spaces are, for example, in public transportation and in the workplace.

図７に示す状況関係情報３７には、人間が取り得る状況を示す識別情報として、「自宅」、「電車」、及び、「会議」が登録されている。識別情報「自宅」は、ユーザＵが自宅にいる状況を示す。識別情報「電車」は、ユーザＵが電車を利用している状況を示す。識別情報「会議」は、ユーザＵが職場の会議中である状況を示す。 In the situation-related information 37 shown in FIG. 7, "home", "train", and "meeting" are registered as identification information indicating a situation that a human can take. The identification information "home" indicates a situation in which the user U is at home. The identification information "train" indicates a situation in which the user U is using the train. The identification information "meeting" indicates a situation in which the user U is in a meeting at work.

状況関係情報３７は、識別情報「自宅」に関係する重み付け係数について、重み付け係数ｋｖ１が重み付け係数ｋｔ１より大きいことを示す。同様に、状況関係情報３７は、重み付け係数ｋｖ２が重み付け係数ｋｔ２よりも大きく、かつ、重み付け係数ｋｖ３が重み付け係数ｋｔ３よりも大きく、かつ、重み付け係数ｋｖ４が重み付け係数ｋｔ４より大きいことを示す。具体的には、図７に示す識別情報「自宅」に関係する重み付け係数は、以下の通りである。
重み付け係数ｋｖ１＝０．７
重み付け係数ｋｔ１＝０．３
重み付け係数ｋｖ２＝０．８
重み付け係数ｋｔ２＝０．２
重み付け係数ｋｖ３＝０．７
重み付け係数ｋｔ３＝０．３
重み付け係数ｋｖ４＝０．８
重み付け係数ｋｔ４＝０．２ The situation-related information 37 indicates that the weighting coefficient kv1 is larger than the weighting coefficient kt1 for the weighting coefficient related to the identification information “home”. Similarly, the situation-related information 37 indicates that the weighting coefficient kv2 is larger than the weighting coefficient kt2, the weighting coefficient kv3 is larger than the weighting coefficient kt3, and the weighting coefficient kv4 is larger than the weighting coefficient kt4. Specifically, the weighting coefficient related to the identification information "home" shown in FIG. 7 is as follows.
Weighting coefficient kv1 = 0.7
Weighting coefficient kt1 = 0.3
Weighting coefficient kv2 = 0.8
Weighting coefficient kt2 = 0.2
Weighting coefficient kv3 = 0.7
Weighting coefficient kt3 = 0.3
Weighting coefficient kv4 = 0.8
Weighting coefficient kt4 = 0.2

また、状況関係情報３７は、識別情報「電車」に関係する重み付け係数について、重み付け係数ｋｔ１が重み付け係数ｋｖ１より大きいことを示す。同様に、状況関係情報３７は、重み付け係数ｋｔ２が重み付け係数ｋｖ２より大きく、かつ、重み付け係数ｋｔ３が重み付け係数ｋｖ３より大きく、かつ、重み付け係数ｋｔ４が重み付け係数ｋｖ４より大きいことを示す。具体的には、図７に示す識別情報「電車」に関係する重み付け係数は、以下の通りである。
重み付け係数ｋｖ１＝重み付け係数ｋｖ２＝重み付け係数ｋｖ３＝重み付け係数ｋｖ４＝０．１
重み付け係数ｋｔ１＝重み付け係数ｋｔ２＝重み付け係数ｋｔ３＝重み付け係数ｋｔ４＝０．９ Further, the situation-related information 37 indicates that the weighting coefficient kt1 is larger than the weighting coefficient kv1 for the weighting coefficient related to the identification information “train”. Similarly, the situation-related information 37 indicates that the weighting coefficient kt2 is larger than the weighting coefficient kv2, the weighting coefficient kt3 is larger than the weighting coefficient kv3, and the weighting coefficient kt4 is larger than the weighting coefficient kv4. Specifically, the weighting coefficients related to the identification information "train" shown in FIG. 7 are as follows.
Weighting coefficient kv1 = Weighting coefficient kv2 = Weighting coefficient kv3 = Weighting coefficient kv4 = 0.1
Weighting coefficient kt1 = Weighting coefficient kt2 = Weighting coefficient kt3 = Weighting coefficient kt4 = 0.9

また、状況関係情報３７は、識別情報「会議」に関係する重み付け係数について、重み付け係数ｋｔ１が重み付け係数ｋｖ１より大きいことを示す。同様に、状況関係情報３７は、重み付け係数ｋｔ２が重み付け係数ｋｖ２より大きく、かつ、重み付け係数ｋｔ３が重み付け係数ｋｖ３より大きく、かつ、重み付け係数ｋｔ４が重み付け係数ｋｖ４より大きいことを示す。具体的には、図７に示す識別情報「会議」に関係する重み付け係数は、以下の通りである。
重み付け係数ｋｖ１＝０．４
重み付け係数ｋｔ１＝０．６
重み付け係数ｋｖ２＝０．２
重み付け係数ｋｔ２＝０．８
重み付け係数ｋｖ３＝０．３
重み付け係数ｋｔ３＝０．７
重み付け係数ｋｖ４＝０．２
重み付け係数ｋｔ４＝０．８ Further, the situation-related information 37 indicates that the weighting coefficient kt1 is larger than the weighting coefficient kv1 for the weighting coefficient related to the identification information “meeting”. Similarly, the situation-related information 37 indicates that the weighting coefficient kt2 is larger than the weighting coefficient kv2, the weighting coefficient kt3 is larger than the weighting coefficient kv3, and the weighting coefficient kt4 is larger than the weighting coefficient kv4. Specifically, the weighting coefficients related to the identification information "meeting" shown in FIG. 7 are as follows.
Weighting coefficient kv1 = 0.4
Weighting coefficient kt1 = 0.6
Weighting coefficient kv2 = 0.2
Weighting coefficient kt2 = 0.8
Weighting coefficient kv3 = 0.3
Weighting coefficient kt3 = 0.7
Weighting coefficient kv4 = 0.2
Weighting coefficient kt4 = 0.8

説明を図６に戻す。推定部２５８は、補正部２５７の補正結果、即ち、補正後の音声評価値ＣＥＶ１〜ＣＥＶ４、及び、補正後の文字評価値ＣＥＴ１〜ＣＥＴ４に基づいて、ユーザＵが抱く１つ以上の感情を推定する。例えば、推定部２５８は、以下の演算を実行して、喜びに対応する評価値Ｅ１、怒りに対応する評価値Ｅ２、悲しみに対応する評価値Ｅ３、及び、平常に対応する評価値Ｅ４を生成する。
Ｅ１＝ＣＥＶ１＋ＣＥＴ１
Ｅ２＝ＣＥＶ２＋ＣＥＴ２
Ｅ３＝ＣＥＶ３＋ＣＥＴ３
Ｅ４＝ＣＥＶ４＋ＣＥＴ４ The explanation is returned to FIG. The estimation unit 258 estimates one or more emotions held by the user U based on the correction result of the correction unit 257, that is, the corrected voice evaluation values CEV1 to CEV4 and the corrected character evaluation values CET1 to CET4. To do. For example, the estimation unit 258 executes the following calculation to generate an evaluation value E1 corresponding to joy, an evaluation value E2 corresponding to anger, an evaluation value E3 corresponding to sadness, and an evaluation value E4 corresponding to normal times. To do.
E1 = CEV1 + CET1
E2 = CEV2 + CET2
E3 = CEV3 + CET3
E4 = CEV4 + CET4

そして、推定部２５８は、例えば、以下に示す２つの態様のいずれか一方に従って、ユーザＵが抱く感情を推定する。第１の態様において、推定部２５８は、評価値Ｅ１、評価値Ｅ２、評価値Ｅ３、及び、評価値Ｅ４のうち、最も大きい評価値Ｅに対応する感情を示す感情情報ＥＩを出力する。第２の態様において、推定部２５８は、評価値Ｅ１〜Ｅ４の各々と閾値と比較して、閾値を上回る評価値Ｅに対応する感情を示す感情情報ＥＩを出力する。第２の態様では、感情情報ＥＩが、怒り及び悲しみを示すという様に、複数の感情を示すことがある。
感情情報ＥＩは、例えば、以下に示す２つの態様がある。感情情報ＥＩの第１の態様は、推定したユーザＵが抱く１以上の感情を示す識別子である。感情を示す識別子には、喜びを示す識別子、怒りを示す識別子、悲しみを示す識別子、及び、平常を示す識別子がある。感情情報ＥＩの第２の態様は、推定したユーザＵが抱く１以上の感情を示す識別子と、推定したユーザＵが抱く感情の評価値Ｅとである。 Then, the estimation unit 258 estimates the emotion held by the user U according to, for example, one of the two modes shown below. In the first aspect, the estimation unit 258 outputs an emotion information EI indicating an emotion corresponding to the largest evaluation value E among the evaluation value E1, the evaluation value E2, the evaluation value E3, and the evaluation value E4. In the second aspect, the estimation unit 258 compares each of the evaluation values E1 to E4 with the threshold value, and outputs an emotion information EI indicating an emotion corresponding to the evaluation value E exceeding the threshold value. In the second aspect, the emotional information EI may exhibit multiple emotions, such as exhibiting anger and sadness.
The emotional information EI has, for example, the following two aspects. The first aspect of the emotion information EI is an identifier indicating one or more emotions held by the estimated user U. The identifier indicating emotion includes an identifier indicating joy, an identifier indicating anger, an identifier indicating sadness, and an identifier indicating normality. The second aspect of the emotion information EI is an identifier indicating one or more emotions held by the estimated user U and an evaluation value E of the emotions held by the estimated user U.

出力部２７は、認識部２５４によって得られた認識文字列ＲＴに対して、感情情報ＥＩが示す１つ以上の感情に応じた処理を実行して得られた情報を出力する。感情に応じた処理は、例えば、下記に示す２つの態様がある。
感情に応じた処理の第１の態様は、認識文字列ＲＴに対して、感情を具象化した図形を付加する処理である。感情を具象化した図形は、例えば、感情を具象化した絵文字、及び、感情を具象化した顔文字である。絵文字は、文字コードに対応付けられた画像である。文字コードは、例えば、Unicodeである。顔文字は、記号及び文字を組み合わせて顔を表現した文字列である。以下の説明では、感情を具象化した図形は、感情を具象化した絵文字であるとして説明する。喜びを具象化した絵文字は、例えば、笑顔を示す絵文字である。怒りを具象化した絵文字は、例えば、怒りの顔を示す絵文字である。悲しみを具象化した絵文字は、例えば、泣き顔を示す絵文字である。さらに、感情情報ＥＩが第２の態様である場合、出力部２７は、感情情報ＥＩが示す感情であって、感情情報ＥＩに含まれる評価値Ｅに応じた強度を有する感情を具象化した絵文字を、認識文字列ＲＴに付加する絵文字として決定してもよい。例えば、感情情報ＥＩが示す感情が悲しみであり、かつ、感情情報ＥＩに含まれる評価値Ｅが所定の閾値以下である場合、出力部２７は、涙をこぼす顔を示す絵文字を認識文字列ＲＴに付加する絵文字として決定する。一方、感情情報ＥＩが示す感情が悲しみであり、かつ、感情情報ＥＩに含まれる評価値Ｅが所定の閾値より大きい場合、出力部２７は、号泣した顔を示す絵文字を認識文字列ＲＴに付加する絵文字として決定する。号泣した顔を示す絵文字は、涙をこぼす顔を示す絵文字と比較して、より高い強度の悲しみを具象化している。
出力部２７は、認識文字列ＲＴに絵文字を付加して得られた絵文字付き文字列を出力する。絵文字を付加する位置は、例えば、以下に示す２つがある。第１の位置は、認識文字列ＲＴの末尾である。第２の位置は、認識文字列ＲＴ内における、感情分類情報３３に含まれる文字列の次である。表示装置５１は、出力部２７が出力した絵文字付き文字列に基づく画像を表示する。 The output unit 27 outputs the information obtained by executing the processing corresponding to one or more emotions indicated by the emotion information EI with respect to the recognition character string RT obtained by the recognition unit 254. There are two modes of processing according to emotions, for example, as shown below.
The first aspect of the process according to the emotion is a process of adding a figure embodying the emotion to the recognition character string RT. The figures that embody emotions are, for example, pictograms that embody emotions and emoticons that embody emotions. A pictogram is an image associated with a character code. The character code is, for example, Unicode. An emoticon is a character string that expresses a face by combining symbols and characters. In the following description, a figure that embodies emotions will be described as a pictogram that embodies emotions. The pictogram that embodies joy is, for example, a pictogram that shows a smile. The pictogram that embodies anger is, for example, a pictogram that shows the face of anger. The pictogram that embodies sadness is, for example, a pictogram that shows a crying face. Further, when the emotion information EI is the second aspect, the output unit 27 is a pictogram that embodies the emotion indicated by the emotion information EI and has an intensity corresponding to the evaluation value E included in the emotion information EI. May be determined as a pictogram to be added to the recognition character string RT. For example, when the emotion indicated by the emotion information EI is sadness and the evaluation value E included in the emotion information EI is equal to or less than a predetermined threshold value, the output unit 27 recognizes a pictogram indicating a face spilling tears. Determined as a pictogram to be added to. On the other hand, when the emotion indicated by the emotion information EI is sad and the evaluation value E included in the emotion information EI is larger than a predetermined threshold value, the output unit 27 adds a pictogram indicating a crying face to the recognition character string RT. Decide as a pictogram to do. The emoji showing a crying face embodies a higher degree of sadness than the emoji showing a tearful face.
The output unit 27 outputs a character string with a pictogram obtained by adding a pictogram to the recognition character string RT. For example, there are two positions for adding pictograms as shown below. The first position is the end of the recognition string RT. The second position is next to the character string included in the emotion classification information 33 in the recognition character string RT. The display device 51 displays an image based on the character string with pictograms output by the output unit 27.

感情に応じた処理の第２の態様は、感情に基づく抑揚を付加して読み上げた合成音声を生成する処理である。抑揚は、例えば、読み上げ速度を速くするもしくは遅くする、又は、音量を大きくするもしくは小さくすることである。喜びに基づく抑揚は、例えば、読み上げ速度を上げることである。怒りに基づく抑揚は、例えば、音量を大きくすることである。悲しみに基づく抑揚は、例えば、音量を小さくすることである。出力部２７は、感情に基づく抑揚を付加して読み上げた合成音声を示す情報を出力する。そして、出力部２７は、生成したデータが示す合成音声に、感情に基づく抑揚を付加して、感情に基づく抑揚を付加して読み上げた合成音声を示す情報を出力する。放音装置５３は、出力部２７が出力したデータが示す合成音声を放音する。 The second aspect of the emotion-based process is a process of generating a synthetic voice read aloud by adding emotion-based intonation. Inflection is, for example, increasing or decreasing the reading speed, or increasing or decreasing the volume. Pleasure-based intonation is, for example, speeding up reading. Anger-based intonation is, for example, increasing the volume. Sadness-based intonation is, for example, reducing the volume. The output unit 27 outputs information indicating a synthetic voice read aloud with an emotion-based intonation added. Then, the output unit 27 adds emotion-based intonation to the synthetic voice indicated by the generated data, and outputs information indicating the synthetic voice read by adding emotion-based intonation. The sound emitting device 53 emits a synthetic voice indicated by the data output by the output unit 27.

１．２．第１実施形態の動作
次に、ユーザ装置１の動作について、図８を用いて説明する。 1.2. Operation of the First Embodiment Next, the operation of the user device 1 will be described with reference to FIG.

図８は、ユーザ装置１の動作を示すフローチャートである。取得部２１は、音声情報ＶＩを取得する（ステップＳ１）。特徴量生成部２５１は、音声情報ＶＩから、特徴量を生成する（ステップＳ２）。そして、第１評価部２５２は、特徴量を学習モデルＬＭに入力し、各感情の音声評価値ＥＶを取得する（ステップＳ３）。 FIG. 8 is a flowchart showing the operation of the user device 1. The acquisition unit 21 acquires the voice information VI (step S1). The feature amount generation unit 251 generates a feature amount from the voice information VI (step S2). Then, the first evaluation unit 252 inputs the feature amount into the learning model LM and acquires the voice evaluation value EV of each emotion (step S3).

一方、認識部２５４は、音声情報ＶＩに基づいて、認識文字列ＲＴを出力する（ステップＳ４）。次に、解析部２５５２は、形態素解析処理を実行して、補正後認識文字列ＣＲＴを出力する（ステップＳ５）。そして、算出部２５５４は、補正後認識文字列ＣＲＴから、各感情の文字評価値ＥＴ１〜ＥＴ４を生成する（ステップＳ６）。 On the other hand, the recognition unit 254 outputs the recognition character string RT based on the voice information VI (step S4). Next, the analysis unit 2552 executes the morphological analysis process and outputs the corrected recognition character string CRT (step S5). Then, the calculation unit 2554 generates the character evaluation values ET1 to ET4 of each emotion from the corrected recognition character string CRT (step S6).

状況情報生成部２３は、ＧＰＳ装置８から得られる位置情報ＰｏＩ、慣性センサ７から得られる慣性情報ＩＦＩ、及び、スケジュール情報３５に基づいて、状況情報ＳｉＩを生成する（ステップＳ７）。そして、補正部２５７は、状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４、ｋｔ１〜ｋｔ４に基づいて、音声評価値ＥＶ１〜ＥＶ４と文字評価値ＥＴ１〜ＥＴ４とを補正する（ステップＳ８）。 The status information generation unit 23 generates status information SiI based on the position information PoI obtained from the GPS device 8, the inertial information IFI obtained from the inertial sensor 7, and the schedule information 35 (step S7). Then, the correction unit 257 corrects the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the weighting coefficients kv1 to kv4 and kt1 to kt4 according to the situation indicated by the situation information SiI (step S8). ..

推定部２５８は、補正後の音声評価値ＣＥＶ１〜ＣＥＶ４と、補正後の文字評価値ＣＥＴ１〜ＣＥＴ４とに基づいて、ユーザＵが抱く１つ以上の感情を推定し、感情情報ＥＩを出力する（ステップＳ９）。出力部２７は、感情情報ＥＩが示す感情に応じた処理を認識文字列ＲＴに対して実行することにより得られる情報を出力する（ステップＳ１０）。ステップＳ１０の処理終了後、ユーザ装置１は、図８に示す一連の処理を終了する。 The estimation unit 258 estimates one or more emotions held by the user U based on the corrected voice evaluation values CEV1 to CEV4 and the corrected character evaluation values CET1 to CET4, and outputs emotion information EI ( Step S9). The output unit 27 outputs the information obtained by executing the processing corresponding to the emotion indicated by the emotion information EI on the recognition character string RT (step S10). After the processing in step S10 is completed, the user device 1 ends a series of processing shown in FIG.

１．３．第１実施形態の効果
以上の説明によれば、ユーザ装置１は、状況情報ＳｉＩに基づいて、音声評価値ＥＶ１〜ＥＶ４、文字評価値ＥＴ１〜ＥＴ４を補正する。状況情報ＳｉＩが、感情を音声に発露しにくい状況を示す場合、音声評価値ＥＶ１〜ＥＶ４の精度が低下していると言える。従って、状況情報ＳｉＩが、感情を音声に発露しにくい状況を示す場合、音声評価値ＥＶ１〜ＥＶ４を低くする方式で、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正することにより、ユーザＵが抱く感情を精度良く補正できる。 1.3. Effect of 1st Embodiment According to the above description, the user apparatus 1 corrects the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the situation information SiI. When the situation information SiI indicates a situation in which it is difficult to express emotions in voice, it can be said that the accuracy of the voice evaluation values EV1 to EV4 is lowered. Therefore, when the situation information SiI indicates a situation in which it is difficult to express emotions to the voice, the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 are corrected by a method of lowering the voice evaluation values EV1 to EV4. Therefore, the emotion held by the user U can be corrected with high accuracy.

また、ユーザ装置１は、人間の音声に応じた複数の特徴量と音声を発した人間が抱く複数の感情の各々に対する強度との関係を学習済みの学習モデルＬＭに、特徴量生成部２５１が生成した特徴量を入力し、音声評価値ＥＶ１〜ＥＶ４を学習モデルＬＭから取得する。学習モデルＬＭを用いることにより、ユーザ装置１は、精度の良い音声評価値ＥＶ１〜ＥＶ４を取得できる。 Further, in the user device 1, the feature amount generation unit 251 uses a learning model LM that has learned the relationship between a plurality of feature quantities corresponding to human voice and the intensity of each of the plurality of emotions held by the human who emits the voice. The generated feature amount is input, and the voice evaluation values EV1 to EV4 are acquired from the learning model LM. By using the learning model LM, the user device 1 can acquire accurate voice evaluation values EV1 to EV4.

また、学習モデルＬＭは、複数の人間について、人間の音声に応じた複数の特徴量と当該音声を発した人間が抱く複数の感情の各々に対する強度との関係を学習済みである。言い換えれば、学習モデルＬＭは、複数のユーザの音声情報ＶＩに基づく教師データによって生成されている。従って、学習モデルＬＭは、特定の個人向けに調整されていない、汎用的なモデルである。第１実施形態において、汎用的な学習モデルＬＭを利用できるため、特定の個人向けに調整された学習モデルを用いる場合と比較して、導入が容易である。 In addition, the learning model LM has learned the relationship between a plurality of features corresponding to human voices and the intensity of each of the plurality of emotions held by the human being who emits the voices for a plurality of humans. In other words, the learning model LM is generated by teacher data based on voice information VIs of a plurality of users. Therefore, the learning model LM is a general purpose model that is not tailored for a particular individual. In the first embodiment, since the general-purpose learning model LM can be used, it is easy to introduce as compared with the case of using a learning model adjusted for a specific individual.

また、補正部２５７は、状況関係情報３７を参照して、状況情報ＳｉＩが示す状況に応じた音声評価値ＥＶ１〜ＥＶ４、及び文字評価値ＥＴ１〜ＥＴ４の各々に対する重み付け係数を設定する。状況関係情報３７を参照することにより、感情を発露しにくい状況にユーザＵが置かれている状況か否かを精度良く特定できる。 Further, the correction unit 257 sets a weighting coefficient for each of the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 according to the situation indicated by the situation information SiI with reference to the situation-related information 37. By referring to the situation-related information 37, it is possible to accurately identify whether or not the user U is placed in a situation in which it is difficult to express emotions.

状況関係情報３７は、プライベート空間にユーザＵがいる状況を示す識別情報に関係する重み付け係数について、重み付け係数ｋｖ１が重み付け係数ｋｔ１より大きいことを示す。さらに、状況関係情報３７は、重み付け係数ｋｖ２が重み付け係数ｋｔ２より大きく、かつ、重み付け係数ｋｖ３が重み付け係数ｋｔ３より大きく、重み付け係数ｋｖ４が重み付け係数ｋｔ４より大きいことを示す。
ユーザＵが自宅等のプライベート空間にいる状況では、他者の目を引くこともないので、感情を音声に発露しやすい。従って、重み付け係数ｋｖ１が重み付け係数ｋｔ１より大きく、重み付け係数ｋｖ２が重み付け係数ｋｔ２より大きく、重み付け係数ｋｖ３が重み付け係数ｋｔ３より大きく、かつ、重み付け係数ｋｖ４が重み付け係数ｋｔ４より大きいことにより、ユーザ装置１は、喜び、怒り、悲しみ、及び、平常を精度良く推定できる。 The situation-related information 37 indicates that the weighting coefficient kv1 is larger than the weighting coefficient kt1 with respect to the weighting coefficient related to the identification information indicating the situation where the user U is in the private space. Further, the situation-related information 37 indicates that the weighting coefficient kv2 is larger than the weighting coefficient kt2, the weighting coefficient kv3 is larger than the weighting coefficient kt3, and the weighting coefficient kv4 is larger than the weighting coefficient kt4.
When the user U is in a private space such as home, the emotions are easily expressed in the voice because the user U does not catch the eyes of others. Therefore, the weighting coefficient kv1 is larger than the weighting coefficient kt1, the weighting coefficient kv2 is larger than the weighting coefficient kt2, the weighting coefficient kv3 is larger than the weighting coefficient kt3, and the weighting coefficient kv4 is larger than the weighting coefficient kt4. , Joy, anger, sadness, and normality can be estimated accurately.

状況関係情報３７は、非プライベート空間にユーザＵがいる状況を示す識別情報に関係する重み付け係数について、重み付け係数ｋｔ１が重み付け係数ｋｖ１より大きいことを示す。さらに、状況関係情報３７は、重み付け係数ｋｔ２が重み付け係数ｋｖ２より大きく、重み付け係数ｋｔ３が重み付け係数ｋｖ３より大きく、かつ、重み付け係数ｋｔ４が重み付け係数ｋｖ４より大きいことを示す。
非プライベート空間にユーザＵがいる一例として、ユーザＵが電車を利用している状況では、感情を込めて発声すると他者の注目を浴びてしまうので、感情を込めずに発声することが一般的と言える。ユーザＵが電車を利用している状況は、ユーザＵが感情を音声に発露しにくい状況の一つである。非プライベート空間にユーザＵがいる他の例として、ユーザＵが会議中である状況では、感情を込めて発声することは少ないと言える。ユーザＵが会議中である状況は、ユーザＵが感情を音声に発露しにくい状況の一つである。
従って、非プライベート空間にユーザＵがいる場合には、重み付け係数ｋｔ１が重み付け係数ｋｖ１より大きく、重み付け係数ｋｔ２が重み付け係数ｋｖ２より大きく、重み付け係数ｋｔ３が重み付け係数ｋｖ３より大きく、かつ、重み付け係数ｋｔ４が重み付け係数ｋｖ４より大きいことにより、ユーザ装置１は、喜び、怒り、悲しみ、及び、平常を精度良く推定できる。 The situation-related information 37 indicates that the weighting coefficient kt1 is larger than the weighting coefficient kv1 with respect to the weighting coefficient related to the identification information indicating the situation where the user U is in the non-private space. Further, the situation-related information 37 indicates that the weighting coefficient kt2 is larger than the weighting coefficient kv2, the weighting coefficient kt3 is larger than the weighting coefficient kv3, and the weighting coefficient kt4 is larger than the weighting coefficient kv4.
As an example of having User U in a non-private space, when User U is using a train, speaking with emotion will attract the attention of others, so it is common to speak without feeling. It can be said that. The situation where the user U is using the train is one of the situations where it is difficult for the user U to express his emotions in the voice. As another example in which the user U is in a non-private space, it can be said that in a situation where the user U is in a meeting, it is rare to utter with emotion. The situation in which the user U is in a meeting is one of the situations in which the user U does not easily express his / her emotions in the voice.
Therefore, when the user U is in the non-private space, the weighting coefficient kt1 is larger than the weighting coefficient kv1, the weighting coefficient kt2 is larger than the weighting coefficient kv2, the weighting coefficient kt3 is larger than the weighting coefficient kv3, and the weighting coefficient kt4 is. By being larger than the weighting coefficient kv4, the user device 1 can accurately estimate joy, anger, sadness, and normality.

２．第２実施形態
第２実施形態にかかるユーザ装置１ａは、認識文字列ＲＴに応じて音声に感情が発露される度合いに基づいて、音声評価値ＣＥＶ１〜ＣＥＶ４、及び、文字評価値ＣＥＴ１〜ＣＥＴ４を補正する点で、第１実施形態にかかるユーザ装置１と相違する。なお、以下に例示する第２実施形態において作用又は機能が第１実施形態と同等である要素については、以上の説明で参照の符号を流用して各々の詳細な説明を適宜に省略する。 2. 2. Second Embodiment The user device 1a according to the second embodiment sets the voice evaluation values CEV1 to CEV4 and the character evaluation values CET1 to CET4 based on the degree to which emotions are expressed in the voice according to the recognition character string RT. It differs from the user device 1 according to the first embodiment in that it is corrected. Regarding the elements whose actions or functions are the same as those of the first embodiment in the second embodiment illustrated below, the reference numerals are used in the above description, and detailed description of each is appropriately omitted.

２．１．第２実施形態の機能
図９は、第２実施形態にかかるユーザ装置１ａを示すブロック図である。ユーザ装置１ａは、処理装置２ａ、記憶装置３ａ、入力装置４、出力装置５、通信装置６、慣性センサ７、及び、ＧＰＳ装置８を具備するコンピュータシステムにより実現される。記憶装置３ａは、処理装置２ａが読取可能な記録媒体であり、処理装置２ａが実行する制御プログラムＰＲａを含む複数のプログラム、解析用辞書情報３１、感情分類情報３３、スケジュール情報３５、状況関係情報３７、文字列関係情報３８、及び、学習モデルＬＭを記憶する。 2.1. Function of the second embodiment FIG. 9 is a block diagram showing a user device 1a according to the second embodiment. The user device 1a is realized by a computer system including a processing device 2a, a storage device 3a, an input device 4, an output device 5, a communication device 6, an inertial sensor 7, and a GPS device 8. The storage device 3a is a recording medium that can be read by the processing device 2a, and is a plurality of programs including the control program PRa executed by the processing device 2a, analysis dictionary information 31, emotion classification information 33, schedule information 35, and situation-related information. 37, the character string relation information 38, and the learning model LM are stored.

文字列関係情報３８は、認識文字列ＲＴに応じて音声に感情が発露される度合いを判定するために用いられる。処理装置２ａは、記憶装置３ａから制御プログラムＰＲａを読み取り実行することによって、取得部２１、状況情報生成部２３、感情情報生成部２５ａ、及び、出力部２７として機能する。 The character string-related information 38 is used for determining the degree to which emotions are expressed in the voice according to the recognition character string RT. The processing device 2a functions as an acquisition unit 21, a situation information generation unit 23, an emotion information generation unit 25a, and an output unit 27 by reading and executing the control program PRa from the storage device 3a.

図１０は、第２実施形態におけるユーザ装置１ａの機能の概要を示す図である。感情情報生成部２５ａは、特徴量生成部２５１、第１評価部２５２、認識部２５４、第２評価部２５５、補正部２５７ａ、及び、推定部２５８を含む。 FIG. 10 is a diagram showing an outline of the functions of the user device 1a in the second embodiment. The emotion information generation unit 25a includes a feature amount generation unit 251, a first evaluation unit 252, a recognition unit 254, a second evaluation unit 255, a correction unit 257a, and an estimation unit 258.

補正部２５７ａは、ユーザＵが認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いと、状況情報ＳｉＩとに基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正する。例えば、補正部２５７ａは、以下に示す２つの態様のいずれか一方に従って、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正する。 The correction unit 257a has voice evaluation values EV1 to EV4 and character evaluation values based on the degree to which emotions are expressed in the voice of the user U when the user U utters the recognition character string RT and the situation information SiI. Correct ET1 to ET4. For example, the correction unit 257a corrects the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 according to one of the two modes shown below.

第１の態様における補正部２５７ａは、発声した場合にユーザＵの音声に感情が発露される度合いの低い文字列と、高い文字列とに分類する。そして、補正部２５７ａは、感情が発露される度合いの低い文字列が認識文字列ＲＴに含まれる場合、認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いが低いと判定する。一方、感情が発露される度合いの低い文字列が認識文字列ＲＴに含まれていない場合、補正部２５７ａは、認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いが高いと判定する。そして、補正部２５７ａは、認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いが低い場合、認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いが高い場合と比較して、音声評価値ＥＶ１〜ＥＶ４に対応する重み付け係数ｋｖ１〜ｋｖ４を低下させる。
例えば、「ありがとう」は、汎用的な感謝の意味を有する。ユーザＵが「ありがとう」と発声する場合、ユーザＵが抱く感情は喜びが強くなる。しかしながら、汎用的に使用される文字列は、言い慣れているので起伏なく発声する場合が多く、音声に感情が発露される度合いが低いと言える。従って、補正部２５７ａは、認識文字列ＲＴに「ありがとう」が含まれる場合、音声評価値ＥＶ１〜ＥＶ４に対応する重み付け係数ｋｖ１〜ｋｖ４を低下させる。例えば、補正部２５７ａは、音声評価値ＥＶ１〜ＥＶ４に対応する重み付け係数ｋｖ１〜ｋｖ４から所定値を減算し、文字評価値ＥＴ１〜ＥＴ４に対応する重み付け係数ｋｔ１〜ｋｔ４に所定値を加算する。 The correction unit 257a in the first aspect classifies the character string into a character string having a low degree of emotional expression in the voice of the user U when uttered and a character string having a high degree of emotional expression. Then, when the recognition character string RT includes a character string having a low degree of emotional expression, the correction unit 257a determines that the degree of emotional expression in the voice of the user U is low when the recognition character string RT is uttered. judge. On the other hand, when the recognition character string RT does not include a character string having a low degree of emotional expression, the correction unit 257a indicates that the emotion is expressed in the voice of the user U when the recognition character string RT is uttered. Judged as high. Then, when the correction unit 257a utters the recognition character string RT, the emotion is expressed in the voice of the user U when the degree of emotion is expressed in the voice of the user U is low, and when the recognition character string RT is uttered, the emotion is expressed in the voice of the user U. Compared with the case where the degree is high, the weighting coefficients kv1 to kv4 corresponding to the voice evaluation values EV1 to EV4 are lowered.
For example, "thank you" has a general meaning of gratitude. When the user U utters "Thank you", the emotions that the user U has become more joyful. However, since character strings that are used for general purposes are familiar to people, they are often uttered without undulations, and it can be said that the degree to which emotions are expressed in the voice is low. Therefore, when the recognition character string RT includes "thank you", the correction unit 257a lowers the weighting coefficients kv1 to kv4 corresponding to the voice evaluation values EV1 to EV4. For example, the correction unit 257a subtracts a predetermined value from the weighting coefficients kv1 to kv4 corresponding to the voice evaluation values EV1 to EV4, and adds the predetermined value to the weighting coefficients kt1 to kt4 corresponding to the character evaluation values ET1 to ET4.

第２の態様における補正部２５７ａは、文字列関係情報３８及び状況関係情報３７に基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４の各々に対する、認識文字列ＲＴに含まれる文字列及び状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４、ｋｔ１〜ｋｔ４を設定する。そして、補正部２５７ａは、認識文字列ＲＴ及び状況情報ＳｉＩが示す状況に応じた重み付け係数に基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正する。 The correction unit 257a in the second aspect is included in the recognition character string RT for each of the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the character string relation information 38 and the situation relation information 37. The weighting coefficients kv1 to kv4 and kt1 to kt4 are set according to the situation indicated by the character string and the situation information SiI. Then, the correction unit 257a corrects the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the weighting coefficient according to the situation indicated by the recognition character string RT and the situation information SiI.

図１１は、文字列関係情報３８の記憶内容の一例を示す図である。文字列関係情報３８は、人間が発声する文字列と、この文字列を発声した場合において人間の音声に感情が発露される度合いに基づき設定された音声評価値ＥＶ１〜ＥＶ４、及び文字評価値ＥＴ１〜ＥＴ４に対する重み付け係数ｋｖ１〜ｋｖ４、ｋｔ１〜ｋｖ４の増減値Δｋｖ１〜Δｋｖ４、Δｋｔ１〜Δｋｔ４との関係を示す。
増減値Δｋｖ１〜Δｋｖ４、及び、増減値Δｋｔ１〜Δｋｔ４は、以下の条件がある。 FIG. 11 is a diagram showing an example of the stored contents of the character string-related information 38. The character string-related information 38 includes voice evaluation values EV1 to EV4 and character evaluation values ET1 set based on the character string uttered by a human being and the degree to which emotions are expressed in the human voice when the character string is uttered. The relationship between the increase / decrease values Δkv1 to Δkv4 and Δkt1 to Δkt4 of the weighting coefficients kv1 to kv4 and kt1 to kv4 with respect to ET4 is shown.
The increase / decrease values Δkv1 to Δkv4 and the increase / decrease values Δkt1 to Δkt4 have the following conditions.

Δｋｖ１＋Δｋｔ１＝Δｋｖ２＋Δｋｔ２＝Δｋｖ３＋Δｋｔ３＝Δｋｖ４＋Δｋｔ４＝０ Δkv1 + Δkt1 = Δkv2 + Δkt2 = Δkv3 + Δkt3 = Δkv4 + Δkt4 = 0

図１０に示す文字列関係情報３８は、文字列「ありがとう」を人間が発声した場合において、以下に示すように、増減値Δｋｖ１〜Δｋｖ４、Δｋｔ１〜Δｋｔ４を設定することを示す。
増減値Δｋｖ１＝−０．２
増減値Δｋｔ１＝０．２
増減値Δｋｖ２＝−０．２
増減値Δｋｔ２＝０．２
増減値Δｋｖ３＝−０．２
増減値Δｋｔ３＝０．２
増減値Δｋｖ４＝−０．１
増減値Δｋｔ４＝０．１
なお、増減値Δｋｖ１〜Δｋｖ４、Δｋｔ１〜Δｋｔ４のうち、０である値が含まれていてもよい。例えば、文字列「ありがとう」に対する増減値Δｋｖ１及びΔｋｔ１は、上述の値であり、増減値Δｋｖ２〜Δｋｖ４、Δｋｔ２〜Δｋｔ４は、０であってもよい。 The character string-related information 38 shown in FIG. 10 indicates that when a human utters the character string “thank you”, the increase / decrease values Δkv1 to Δkv4 and Δkt1 to Δkt4 are set as shown below.
Increase / decrease value Δkv1 = -0.2
Increase / decrease value Δkt1 = 0.2
Increase / decrease value Δkv2 = -0.2
Increase / decrease value Δkt2 = 0.2
Increase / decrease value Δkv3 = -0.2
Increase / decrease value Δkt3 = 0.2
Increase / decrease value Δkv4 = -0.1
Increase / decrease value Δkt4 = 0.1
Of the increase / decrease values Δkv1 to Δkv4 and Δkt1 to Δkt4, a value of 0 may be included. For example, the increase / decrease values Δkv1 and Δkt1 for the character string “Thank you” are the above-mentioned values, and the increase / decrease values Δkv2 to Δkv4 and Δkt2 to Δkt4 may be 0.

補正部２５７ａは、文字列関係情報３８に含まれる文字列が認識文字列ＲＴに含まれる場合、下記演算を実行することにより、補正後の音声評価値ＣＥＶ１〜ＣＥＶ４、及び、補正後の文字評価値ＣＥＴ１〜ＣＥＴ４を出力する。 When the character string included in the character string-related information 38 is included in the recognition character string RT, the correction unit 257a executes the following calculation to perform the corrected voice evaluation values CEV1 to CEV4 and the corrected character evaluation. The values CET1 to CET4 are output.

ＣＥＶ１＝（ｋｖ１＋Δｋｖ１）×ＥＶ１
ＣＥＶ２＝（ｋｖ２＋Δｋｖ２）×ＥＶ２
ＣＥＶ３＝（ｋｖ３＋Δｋｖ３）×ＥＶ３
ＣＥＶ４＝（ｋｖ４＋Δｋｖ４）×ＥＶ４
ＣＥＴ１＝（ｋｔ１＋Δｋｔ１）×ＥＴ１
ＣＥＴ２＝（ｋｔ２＋Δｋｔ２）×ＥＴ２
ＣＥＴ３＝（ｋｔ３＋Δｋｔ３）×ＥＴ３
ＣＥＴ４＝（ｋｔ４＋Δｋｔ４）×ＥＴ４ CEV1 = (kv1 + Δkv1) x EV1
CEV2 = (kv2 + Δkv2) x EV2
CEV3 = (kv3 + Δkv3) x EV3
CEV4 = (kv4 + Δkv4) x EV4
CET1 = (kt1 + Δkt1) × ET1
CET2 = (kt2 + Δkt2) × ET2
CET3 = (kt3 + Δkt3) x ET3
CET4 = (kt4 + Δkt4) x ET4

増減値Δｋｖ１〜Δｋｖ４、及び、増減値Δｋｔ１〜Δｋｔ４は、文字列関係情報３８において、認識文字列ＲＴに含まれる文字列に対応する値である。また、重み付け係数ｋｖｘと増減値Δｋｖｘとの和が０未満である場合、補正部２５７ａは、補正後の音声評価値ＣＥＶｘを０として出力し、重み付け係数ｋｖｘと増減値Δｋｖｘとの和が１より大きい場合、補正後の音声評価値ＣＥＶｘを音声評価値ＥＶｘと同一の値として出力する。同様に、重み付け係数ｋｔｘと増減値Δｋｔｘとの和が０未満である場合、補正部２５７ａは、補正後の文字評価値ＣＥＴｘを０として出力し、重み付け係数ｋｔｘと増減値Δｋｔｘとの和が１より大きい場合、補正後の文字評価値ＣＥＴｘを文字評価値ＥＴｘと同一の値として出力する。ｘは、１から４までの整数である。
以下の記載では、補正部２５７ａは、第２の態様であるとして説明する。 The increase / decrease values Δkv1 to Δkv4 and the increase / decrease values Δkt1 to Δkt4 are values corresponding to the character strings included in the recognition character string RT in the character string relation information 38. When the sum of the weighting coefficient kvx and the increase / decrease value Δkvx is less than 0, the correction unit 257a outputs the corrected voice evaluation value CEVx as 0, and the sum of the weighting coefficient kvx and the increase / decrease value Δkvx is more than 1. If it is large, the corrected voice evaluation value CEVx is output as the same value as the voice evaluation value EVx. Similarly, when the sum of the weighting coefficient ktx and the increase / decrease value Δktx is less than 0, the correction unit 257a outputs the corrected character evaluation value CETx as 0, and the sum of the weighting coefficient ktx and the increase / decrease value Δktx is 1. If it is larger than, the corrected character evaluation value CETx is output as the same value as the character evaluation value ETx. x is an integer from 1 to 4.
In the following description, the correction unit 257a will be described as a second aspect.

２．２．第２実施形態の動作
次に、ユーザ装置１ａの動作について、図１２を用いて説明する。 2.2. Operation of the Second Embodiment Next, the operation of the user device 1a will be described with reference to FIG.

図１２は、ユーザ装置１ａの動作を示すフローチャートである。図１２に示すステップＳ２１〜ステップＳ２７の処理は、それぞれ、図８に示すステップＳ１〜ステップＳ７の処理と同一であるため、説明を省略する。 FIG. 12 is a flowchart showing the operation of the user device 1a. Since the processes of steps S21 to S27 shown in FIG. 12 are the same as the processes of steps S1 to S7 shown in FIG. 8, the description thereof will be omitted.

ステップＳ２７の処理終了後、補正部２５７ａは、文字列関係情報３８内の文字列が認識文字列ＲＴに含まれるか否かを判定する（ステップＳ２８）。ステップＳ２８の判定結果が肯定の場合、補正部２５７ａは、認識文字列ＲＴに含まれる文字列及び状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４に基づいて、音声評価値ＥＶ１〜ＥＶ４と文字評価値ＥＴ１〜ＥＴ４とを補正する（ステップＳ２９）。一方、ステップＳ２８の判定結果が否定の場合、補正部２５７ａは、状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４に基づいて、音声評価値ＥＶ１〜ＥＶ４と文字評価値ＥＴ１〜ＥＴ４とを補正する（ステップＳ３０）。 After the processing in step S27 is completed, the correction unit 257a determines whether or not the character string in the character string-related information 38 is included in the recognition character string RT (step S28). When the determination result in step S28 is affirmative, the correction unit 257a sets the voice evaluation values EV1 to EV4 based on the weighting coefficients kv1 to kv4 according to the situation indicated by the character string included in the recognition character string RT and the situation information SiI. The character evaluation values ET1 to ET4 are corrected (step S29). On the other hand, when the determination result in step S28 is negative, the correction unit 257a sets the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the weighting coefficients kv1 to kv4 according to the situation indicated by the situation information SiI. Correct (step S30).

ステップＳ２９の処理又はステップＳ３０の処理終了後、推定部２５８は、補正後の音声評価値ＣＥＶ１〜ＣＥＶ４と、補正後の文字評価値ＣＥＴ１〜ＣＥＴ４とに基づいて、ユーザＵが抱く１つ以上の感情を推定し、感情情報ＥＩを出力する（ステップＳ３１）。出力部２７は、感情情報ＥＩが示す感情に応じた処理を認識文字列ＲＴに対して実行することにより得られる情報を出力する（ステップＳ３２）。ステップＳ１０の処理終了後、ユーザ装置１は、図１２に示す一連の処理を終了する。 After the processing of step S29 or the processing of step S30 is completed, the estimation unit 258 has one or more of the voice evaluation values CEV1 to CEV4 after the correction and the character evaluation values CET1 to CET4 after the correction. The emotion is estimated and the emotion information EI is output (step S31). The output unit 27 outputs the information obtained by executing the processing corresponding to the emotion indicated by the emotion information EI on the recognition character string RT (step S32). After the processing in step S10 is completed, the user device 1 ends a series of processing shown in FIG.

２．３．第２実施形態の効果
以上の説明によれば、ユーザ装置１ａは、ユーザＵが認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いと、状況情報ＳｉＩとに基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正する。一般的に、発声した場合に感情が発露される度合いが高い文字列と、発声した場合に感情が発露される度合いが低い文字列とが存在するため、発声した場合に感情が発露される度合いに応じて音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正することにより、ユーザＵが抱く感情を精度良く推定できる。 2.3. Effect of the Second Embodiment According to the above description, the user device 1a is based on the degree to which emotions are expressed in the voice of the user U when the user U utters the recognition character string RT and the situation information SiI. , Voice evaluation values EV1 to EV4, and character evaluation values ET1 to ET4 are corrected. In general, there are character strings that express emotions when they are uttered and characters that express emotions less when they are uttered. Therefore, the degree to which emotions are expressed when they are uttered. By correcting the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 according to the above, the emotion held by the user U can be estimated accurately.

また、ユーザ装置１ａは、状況関係情報３７と文字列関係情報３８とを参照して、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４の各々に対する、認識文字列ＲＴに含まれる文字列及び状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４、ｋｔ１〜ｋｔ４を設定する。認識文字列ＲＴに含まれる文字列及び状況情報ＳｉＩが示す状況に応じた重み付け係数ｋｖ１〜ｋｖ４、ｋｔ１〜ｋｔ４を設定することにより、ユーザＵが抱く感情を精度良く推定できる。 Further, the user device 1a refers to the situation-related information 37 and the character string-related information 38, and refers to the characters included in the recognition character string RT for each of the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4. The weighting coefficients kv1 to kv4 and kt1 to kt4 are set according to the situation indicated by the column and the situation information SiI. By setting the weighting coefficients kv1 to kv4 and kt1 to kt4 according to the situation indicated by the character string included in the recognition character string RT and the situation information SiI, the emotion held by the user U can be estimated accurately.

３．変形例
本発明は、以上に例示した各実施形態に限定されない。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を併合してもよい。 3. 3. Modifications The present invention is not limited to the embodiments exemplified above. A specific mode of modification is illustrated below. Two or more aspects arbitrarily selected from the following examples may be merged.

（１）上述の各態様では、ユーザ装置１の処理装置２が、取得部２１、状況情報生成部２３、感情情報生成部２５、及び、出力部２７として機能したが、これに限らない。第１変形例では、取得部２１、状況情報生成部２３、感情情報生成部２５、及び、出力部２７を、ユーザ装置１ｂと、サーバ装置１０とで分散させる。 (1) In each of the above aspects, the processing device 2 of the user device 1 functions as an acquisition unit 21, a situation information generation unit 23, an emotion information generation unit 25, and an output unit 27, but the present invention is not limited to this. In the first modification, the acquisition unit 21, the situation information generation unit 23, the emotion information generation unit 25, and the output unit 27 are distributed by the user device 1b and the server device 10.

図１３は、感情推定システムＳＹＳの全体構成を示す図である。感情推定システムＳＹＳは、ユーザＵが所持するユーザ装置１ｂと、ネットワークＮＷと、サーバ装置１０とを備える。ユーザ装置１ｂが、「端末装置」の一例である。サーバ装置１０が、第１変形例における「感情推定装置」の一例である。 FIG. 13 is a diagram showing the overall configuration of the emotion estimation system SYS. The emotion estimation system SYS includes a user device 1b owned by the user U, a network NW, and a server device 10. The user device 1b is an example of a “terminal device”. The server device 10 is an example of the “emotion estimation device” in the first modification.

図１４は、ユーザ装置１ｂの構成を示すブロック図である。ユーザ装置１ｂは、処理装置２ｂ、記憶装置３ｂ、入力装置４、出力装置５、通信装置６、慣性センサ７、及び、ＧＰＳ装置８を具備するコンピュータシステムにより実現される。記憶装置３ｂは、処理装置２ｂが読取可能な記録媒体であり、処理装置２ｂが実行する制御プログラムＰＲｂを含む複数のプログラム、及び、スケジュール情報３５を記憶する。 FIG. 14 is a block diagram showing the configuration of the user device 1b. The user device 1b is realized by a computer system including a processing device 2b, a storage device 3b, an input device 4, an output device 5, a communication device 6, an inertial sensor 7, and a GPS device 8. The storage device 3b is a recording medium that can be read by the processing device 2b, and stores a plurality of programs including the control program PRb executed by the processing device 2b, and schedule information 35.

処理装置２ｂは、記憶装置３ｂから制御プログラムＰＲｂを読み取り実行することによって、取得部２１、状況情報生成部２３、及び、出力部２７として機能する。 The processing device 2b functions as an acquisition unit 21, a status information generation unit 23, and an output unit 27 by reading and executing the control program PRb from the storage device 3b.

通信装置６は、音声情報ＶＩ及び状況情報ＳｉＩをサーバ装置１０に送信し、サーバ装置１０から、認識文字列ＲＴ及び感情情報ＥＩを受信する。 The communication device 6 transmits the voice information VI and the status information SiI to the server device 10, and receives the recognition character string RT and the emotion information EI from the server device 10.

図１５は、サーバ装置１０の構成を示すブロック図である。サーバ装置１０は、処理装置２Ｂ、記憶装置３Ｂ、通信装置６Ｂを具備するコンピュータシステムにより実現される。サーバ装置１０の各要素は、情報を通信するための単体又は複数のバス９Ｂで相互に接続される。記憶装置３Ｂは、処理装置２Ｂが読取可能な記録媒体であり、処理装置２Ｂが実行する制御プログラムＰＲＢを含む複数のプログラム、解析用辞書情報３１、感情分類情報３３、状況関係情報３７、及び、学習モデルＬＭを記憶する。 FIG. 15 is a block diagram showing the configuration of the server device 10. The server device 10 is realized by a computer system including a processing device 2B, a storage device 3B, and a communication device 6B. Each element of the server device 10 is connected to each other by a single unit or a plurality of buses 9B for communicating information. The storage device 3B is a recording medium that can be read by the processing device 2B, and includes a plurality of programs including a control program PRB executed by the processing device 2B, analysis dictionary information 31, emotion classification information 33, situation-related information 37, and Store the learning model LM.

処理装置２Ｂは、記憶装置３Ｂから制御プログラムＰＲＢを読み取り実行することによって、感情情報生成部２５として機能する。 The processing device 2B functions as the emotion information generation unit 25 by reading and executing the control program PRB from the storage device 3B.

通信装置６Ｂは、音声情報ＶＩ及び状況情報ＳｉＩをユーザ装置１ｂから受信し、認識文字列ＲＴ及び感情情報ＥＩをユーザ装置１ｂに送信する。 The communication device 6B receives the voice information VI and the status information SiI from the user device 1b, and transmits the recognition character string RT and the emotion information EI to the user device 1b.

以上、第１変形例によれば、取得部２１、状況情報生成部２３、感情情報生成部２５、及び、出力部２７を、ユーザ装置１と、サーバ装置１０とで分散することができる。 As described above, according to the first modification, the acquisition unit 21, the situation information generation unit 23, the emotion information generation unit 25, and the output unit 27 can be distributed between the user device 1 and the server device 10.

（２）第２実施形態では、補正部２５７ａは、ユーザＵが認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いと、状況情報ＳｉＩとに基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正したが、これに限らない。例えば、補正部２５７ａは、ユーザＵが認識文字列ＲＴを発声した場合にユーザＵの音声に感情が発露される度合いに基づいて、音声評価値ＥＶ１〜ＥＶ４、及び、文字評価値ＥＴ１〜ＥＴ４を補正してもよい。 (2) In the second embodiment, the correction unit 257a evaluates the voice based on the degree to which emotions are expressed in the voice of the user U when the user U utters the recognition character string RT and the situation information SiI. EV1 to EV4 and character evaluation values ET1 to ET4 have been corrected, but the present invention is not limited to this. For example, the correction unit 257a sets the voice evaluation values EV1 to EV4 and the character evaluation values ET1 to ET4 based on the degree to which emotions are expressed in the voice of the user U when the user U utters the recognition character string RT. It may be corrected.

（３）プライベート空間の一例が、自宅の中であると記載したが、これに限らない。例えば、プライベート空間は、ユーザＵが宿泊するホテルの部屋の中でもよい。例えば、ユーザ装置１が、ホテルの部屋のドアの施錠及び開錠を制御できる機能を有すると前提する。この前提の基、ユーザ装置１がホテルの部屋のドアの開錠を指示し、かつ、開錠を指示した時刻から所定時間が経過するまでの間で、ユーザＵの移動範囲が所定範囲内である場合には、ユーザ装置１は、ユーザＵが宿泊するホテルの部屋の中にいると判定する。 (3) Although it is described that an example of a private space is in the home, it is not limited to this. For example, the private space may be in the hotel room where the user U stays. For example, it is assumed that the user device 1 has a function of controlling the locking and unlocking of the door of a hotel room. Based on this premise, the movement range of the user U is within the predetermined range from the time when the user device 1 instructs the unlocking of the door of the hotel room and the predetermined time elapses from the time when the unlocking is instructed. In some cases, the user device 1 determines that the user U is in the hotel room where the user U is staying.

（４）非プライベート空間は、公共の交通機関内、及び、職場の中と記載したが、これらに限らない。例えば、非プライベート空間は、学校の中、病院の中、及び、図書館の中等がある。 (4) Non-private space is described as being in public transportation and in the workplace, but it is not limited to these. For example, non-private spaces include schools, hospitals, libraries, and so on.

（５）公共の交通機関を利用する状況の一例として、電車を利用する状況を挙げたが、公共の交通機関を利用している状況は、電車を利用する状況に限られない。例えば、公共の交通機関を利用している状況として、駅構内にいる状況を含めてもよい。例えば、ユーザ装置１が、交通系ＩＣ（Integrated Circuit）カード機能を有する場合、交通系ＩＣカード機能によって駅構内にユーザＵが入った場合に、ユーザＵが公共の交通機関を利用する状況であると判断してもよい。また、公共の交通機関には、電車に限らず、路線バス、タクシー、フェリー、及び、旅客機等も含まれる。 (5) As an example of the situation of using public transportation, the situation of using a train was mentioned, but the situation of using public transportation is not limited to the situation of using a train. For example, the situation of using public transportation may include the situation of being in a station yard. For example, when the user device 1 has a transportation IC (Integrated Circuit) card function, the user U uses public transportation when the user U enters the station yard by the transportation IC card function. You may judge that. In addition, public transportation includes not only trains but also fixed-route buses, taxis, ferries, and passenger planes.

（６）出力部２７は、感情情報ＥＩが示す感情に応じた処理を認識文字列ＲＴに対して実行することにより得られる情報を出力するが、この限りではない。例えば、出力部２７は、感情情報ＥＩが示す感情を示す文字列を、表示装置５１に出力してもよいし、感情情報ＥＩが示す絵文字を、表示装置５１に出力する。 (6) The output unit 27 outputs information obtained by executing the processing corresponding to the emotion indicated by the emotion information EI on the recognition character string RT, but the present invention is not limited to this. For example, the output unit 27 may output a character string indicating the emotion indicated by the emotion information EI to the display device 51, or output the pictogram indicated by the emotion information EI to the display device 51.

（７）ユーザ装置１は、集音装置４１を有さなくてもよい。集音装置４１を有さない場合、ユーザ装置１は、通信装置６を介して音声情報ＶＩを取得してもよいし、記憶装置３に記憶された音声情報ＶＩを取得してもよい。 (7) The user device 1 does not have to have the sound collecting device 41. When the sound collecting device 41 is not provided, the user device 1 may acquire the voice information VI via the communication device 6 or may acquire the voice information VI stored in the storage device 3.

（８）ユーザ装置１は、放音装置５３を有さなくてもよい。 (8) The user device 1 does not have to have the sound emitting device 53.

（９）ユーザ装置１は、スマートスピーカでもよい。ユーザ装置１がスマートスピーカである場合、ユーザ装置１は、タッチパネル４３及び表示装置５１を有さなくてもよい。 (9) The user device 1 may be a smart speaker. When the user device 1 is a smart speaker, the user device 1 does not have to have the touch panel 43 and the display device 51.

（１０）感情分類情報３３は、図４に示すように、「勝つ」、「勝っ」のように、ある単語が活用した複数の形態素のそれぞれを、喜び、怒り、悲しみ、及び、平常の何れかに分類したが、これに限らない。例えば、感情分類情報３３は、解析用辞書情報３１の原形データに登録された文字列を、喜び、怒り、悲しみ、及び、平常の何れかに分類してもよい。例えば、感情分類情報３３は、解析用辞書情報３１の原形データに登録された文字列「嬉しい」、「合格」、及び「勝つ」を、喜びに分類する。算出部２５５４は、補正後認識文字列ＣＲＴを形態素ごとに分解し、分解した形態素を、解析用辞書情報３１の原形データに登録された文字列に変換する。そして、算出部２５５４は、変換して得られた文字列と、感情分類情報３３に含まれる文字列とが一致する場合に、この補正後認識文字列ＣＲＴに含まれる文字列に対応する感情の文字評価値ＥＴを増加させる。 (10) As shown in FIG. 4, the emotion classification information 33 is either joy, anger, sadness, or normal for each of a plurality of morphemes utilized by a certain word, such as "win" and "win". However, it is not limited to this. For example, the emotion classification information 33 may classify the character string registered in the prototype data of the analysis dictionary information 31 into any of joy, anger, sadness, and normal. For example, the emotion classification information 33 classifies the character strings “happy”, “pass”, and “win” registered in the prototype data of the analysis dictionary information 31 into joy. The calculation unit 2554 decomposes the corrected recognition character string CRT for each morpheme, and converts the decomposed morpheme into a character string registered in the original form data of the analysis dictionary information 31. Then, when the character string obtained by conversion and the character string included in the emotion classification information 33 match, the calculation unit 2554 determines the emotion corresponding to the character string included in the corrected recognition character string CRT. Increase the character evaluation value ET.

（１１）算出部２５５４は、補正後認識文字列ＣＲＴに対して、感情ごとの文字評価値ＥＴを算出したが、認識文字列ＲＴに対して感情ごとの文字評価値ＥＴを算出してもよい。しかしながら、認識文字列ＲＴには、感情を推定するためには不要な文字列が含まれる。従って、補正後認識文字列ＣＲＴに対して感情ごとの文字評価値ＥＴを算出することにより、認識文字列ＲＴに対して感情ごとの文字評価値ＥＴを算出する場合と比較して、感情の推定精度を向上できる。 (11) Although the calculation unit 2554 calculated the character evaluation value ET for each emotion with respect to the corrected recognition character string CRT, the character evaluation value ET for each emotion may be calculated with respect to the recognition character string RT. .. However, the recognition character string RT includes a character string that is unnecessary for estimating emotions. Therefore, by calculating the character evaluation value ET for each emotion for the corrected recognition character string CRT, the emotion is estimated as compared with the case where the character evaluation value ET for each emotion is calculated for the recognition character string RT. The accuracy can be improved.

（１２）ユーザＵが日本語を話す例を用いたが、ユーザが如何なる言語を話しても上述の各態様を適用することが可能である。例えば、ユーザＵが、日本語以外の英語、フランス語、又は中国語等を話す場合であっても上述の各態様を適用できる。例えば、ユーザＵが英語を話す場合、解析用辞書情報３１は、英語の形態素に関する情報であり、感情分類情報３３は、英単語を喜び、怒り、悲しみ、及び、平常の何れかに分類した情報であればよい。 (12) Although the example in which the user U speaks Japanese is used, each of the above aspects can be applied regardless of the language spoken by the user. For example, even when the user U speaks English, French, Chinese, or the like other than Japanese, each of the above aspects can be applied. For example, when the user U speaks English, the analysis dictionary information 31 is information related to English morphemes, and the emotion classification information 33 is information that classifies English words into any of joy, anger, sadness, and normal. It should be.

（１３）上述した各態様の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に(例えば、有線及び／又は無線)で接続し、これら複数の装置により実現されてもよい。 (13) The block diagram used in the description of each of the above-described aspects shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wireless) may be connected and realized by these plurality of devices.

（１４）上述した各態様における処理手順、シーケンス、フローチャートなどは、矛盾のない限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 (14) The order of the processing procedures, sequences, flowcharts, etc. in each of the above-described aspects may be changed as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order, and are not limited to the particular order presented.

（１５）上述した各態様において、入出力された情報等は特定の場所(例えば、メモリ)に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、又は追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 (15) In each of the above-described aspects, the input / output information and the like may be stored in a specific place (for example, a memory) or may be managed by a management table. Input / output information and the like can be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

（１６）上述した各態様において、判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：true又はfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 (16) In each of the above-described aspects, the determination may be made by a value represented by 1 bit (0 or 1) or by a boolean value (Boolean: true or false). , May be done by numerical comparison (eg, comparison with a given value).

（１７）上述した各態様では、スマートフォン等の可搬型の情報処理装置をユーザ装置１として例示したが、ユーザ装置１の具体的な形態は任意であり、前述の各形態の例示には限定されない。例えば、可搬型又は据置型のパーソナルコンピュータをユーザ装置１として利用してもよい。 (17) In each of the above-described aspects, a portable information processing device such as a smartphone is illustrated as the user device 1, but the specific form of the user device 1 is arbitrary and is not limited to the above-mentioned examples of each form. .. For example, a portable or stationary personal computer may be used as the user device 1.

（１８）上述した各態様では、記憶装置３は、処理装置２が読取可能な記録媒体であり、ＲＯＭ及びＲＡＭなどを例示したが、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリデバイス(例えば、カード、スティック、キードライブ)、ＣＤ−ＲＯＭ（Compact Disc−ＲＯＭ）、レジスタ、リムーバブルディスク、ハードディスク、フロッピー（登録商標）ディスク、磁気ストリップ、データベース、サーバその他の適切な記憶媒体である。また、プログラムは、ネットワークから送信されても良い。また、プログラムは、電気通信回線を介して通信網から送信されても良い。 (18) In each of the above-described aspects, the storage device 3 is a recording medium that can be read by the processing device 2, and examples thereof include a ROM and a RAM. Discs, Blu-ray® discs), smart cards, flash memory devices (eg cards, sticks, key drives), CD-ROMs (Compact Disc-ROMs), registers, removable disks, hard disks, floppies (registered trademarks) ) Disks, magnetic strips, databases, servers and other suitable storage media. The program may also be transmitted from the network. The program may also be transmitted from the communication network via a telecommunication line.

（１９）上述した各態様は、ＬＴＥ（Long Term Evolution）、ＬＴＥ−Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ、５Ｇ、ＦＲＡ（Future Radio Access）、Ｗ−ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ）、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ）、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及び／又はこれらに基づいて拡張された次世代システムに適用されてもよい。 (19) Each of the above-described aspects includes LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G, 5G, FRA (Future Radio Access), and W-CDMA (registered trademark). , GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark) ), Other systems that utilize suitable systems and / or next-generation systems that are extended based on them.

（２０）上述した各態様において、説明した情報及び信号などは、様々な異なる技術の何れかを使用して表されてもよい。例えば、上述の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、又はこれらの任意の組み合わせによって表されてもよい。
なお、本明細書で説明した用語及び／又は本明細書の理解に必要な用語については、同一の又は類似する意味を有する用語と置き換えてもよい。 (20) In each of the above aspects, the information, signals, and the like described may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of.
In addition, the terms described in the present specification and / or the terms necessary for understanding the present specification may be replaced with terms having the same or similar meanings.

（２１）図２、図６、図９、図１０、図１４、及び、図１５に例示された各機能は、ハードウェア及びソフトウェアの任意の組み合わせによって実現される。また、各機能は、単体の装置によって実現されてもよいし、相互に別体で構成された２個以上の装置によって実現されてもよい。 (21) Each of the functions illustrated in FIGS. 2, 6, 9, 10, 14, and 15 is realized by any combination of hardware and software. Further, each function may be realized by a single device, or may be realized by two or more devices configured as separate bodies from each other.

（２２）上述した各実施形態で例示したプログラムは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード又はハードウェア記述言語と呼ばれるか、他の名称によって呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順又は機能等を意味するよう広く解釈されるべきである。
また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線（ＤＳＬ）などの有線技術及び／又は赤外線、無線及びマイクロ波などの無線技術を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 (22) The programs exemplified in each of the above-described embodiments are called instructions, instruction sets, codes, code segments regardless of whether they are called software, firmware, middleware, microcode or hardware description language, or by other names. , Program code, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, execution threads, procedures or functions, etc. should be broadly interpreted to mean.
Further, software, instructions, and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology such as coaxial cable, fiber optic cable, twist pair and digital subscriber line (DSL) and / or wireless technology such as infrared, wireless and microwave to websites, servers, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

（２３）上述した各実施形態において、情報、パラメータなどは、絶対値で表されてもよいし、所定の値からの相対値で表されてもよいし、対応する別の情報で表されてもよい。 (23) In each of the above-described embodiments, the information, parameters, etc. may be represented by absolute values, relative values from a predetermined value, or other corresponding information. May be good.

（２４）上述したパラメータに使用する名称はいかなる点においても限定的なものではない。さらに、これらのパラメータを使用する数式等は、本明細書で明示的に開示したものと異なる場合もある。 (24) The names used for the above-mentioned parameters are not limited in any respect. Further, mathematical formulas and the like using these parameters may differ from those expressly disclosed herein.

（２５）上述した各実施形態において、ユーザ装置１は、移動局である場合が含まれる。移動局は、当業者によって、加入者局、モバイルユニット、加入者ユニット、ワイヤレスユニット、リモートユニット、モバイルデバイス、ワイヤレスデバイス、ワイヤレス通信デバイス、リモートデバイス、モバイル加入者局、アクセス端末、モバイル端末、ワイヤレス端末、リモート端末、ハンドセット、ユーザエージェント、モバイルクライアント、クライアント、又はいくつかの他の適切な用語で呼ばれる場合もある。 (25) In each of the above-described embodiments, the user device 1 includes a case where it is a mobile station. Mobile stations can be subscriber stations, mobile units, subscriber units, wireless units, remote units, mobile devices, wireless devices, wireless communication devices, remote devices, mobile subscriber stations, access terminals, mobile terminals, wireless, depending on the trader. It may also be referred to as a terminal, remote terminal, handset, user agent, mobile client, client, or some other suitable term.

（２６）上述した各実施形態において、「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 (26) In each of the above embodiments, the phrase "based on" does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

（２７）本明細書で使用する「第１」、「第２」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。従って、第１及び第２の要素への参照は、２つの要素のみがそこで採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 (27) Any reference to elements using designations such as "first", "second" as used herein does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be adopted there, or that the first element must somehow precede the second element.

（２８）上述した各実施形態において「含む(ｉｎｃｌｕｄｉｎｇ)」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、及びそれらの変形が、本明細書あるいは特許請求の範囲で使用されている限り、これら用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 (28) As long as "inclusion," "comprising," and variations thereof in each of the embodiments described above are used herein or within the scope of the claims, these terms are used. As with the term "prepare", it is intended to be inclusive. Furthermore, the term "or" as used herein or in the claims is intended not to be an exclusive OR.

（２９）本願の全体において、例えば、英語におけるa、an及びtheのように、翻訳によって冠詞が追加された場合、これらの冠詞は、文脈から明らかにそうではないことが示されていなければ、複数を含む。 (29) In the whole of the present application, if articles are added by translation, for example, a, an and the in English, unless the context clearly indicates that these articles are not. Including multiple.

（３０）本発明が本明細書中に説明した実施形態に限定されないことは当業者にとって明白である。本発明は、特許請求の範囲の記載に基づいて定まる本発明の趣旨及び範囲を逸脱することなく修正及び変更態様として実施できる。従って、本明細書の記載は、例示的な説明を目的とし、本発明に対して何ら制限的な意味を有さない。また、本明細書に例示した態様から選択された複数の態様を組み合わせてもよい。 (30) It will be apparent to those skilled in the art that the present invention is not limited to the embodiments described herein. The present invention can be implemented as modifications and modifications without departing from the spirit and scope of the present invention, which is determined based on the description of the scope of claims. Therefore, the description herein is for illustrative purposes and has no limiting implications for the present invention. In addition, a plurality of aspects selected from the aspects illustrated in the present specification may be combined.

１、１ａ、１ｂ…ユーザ装置、６…通信装置、１０…サーバ装置、２１…取得部、２３…状況情報生成部、２５、２５ａ…感情情報生成部、２７…出力部、３７…状況関係情報、３８…文字列関係情報、４１…集音装置、２５１…特徴量生成部、２５２…第１評価部
２５４…認識部、２５５…第２評価部、２５７、２５７ａ…補正部、２５８…推定部、ＣＥＴ１…補正後の文字評価値、ＣＥＴ２…補正後の文字評価値、ＣＥＴ３…補正後の文字評価値、ＣＥＴ４…補正後の文字評価値、ＣＥＶ１…補正後の音声評価値、ＣＥＶ２…補正後の音声評価値、ＣＥＶ３…補正後の音声評価値、ＣＥＶ４…補正後の音声評価値、ＥＩ…感情情報、ＥＴ１…文字評価値、ＥＴ２…文字評価値、ＥＴ３…文字評価値、ＥＴ４…文字評価値、ＥＶ１…音声評価値、ＥＶ２…音声評価値、ＥＶ３…音声評価値、ＥＶ４…音声評価値、ｋｔ１…重み付け係数、ｋｔ２…重み付け係数、ｋｔ３…重み付け係数、ｋｔ４…重み付け係数、ｋｖ１…重み付け係数、ｋｖ２…重み付け係数、ｋｖ３…重み付け係数、ｋｖ４…重み付け係数。 1, 1a, 1b ... user device, 6 ... communication device, 10 ... server device, 21 ... acquisition unit, 23 ... situation information generation unit, 25, 25a ... emotion information generation unit, 27 ... output unit, 37 ... situation-related information , 38 ... Character string related information, 41 ... Sound collector, 251 ... Feature amount generation unit, 252 ... First evaluation unit 254 ... Recognition unit, 255 ... Second evaluation unit, 257 ... 257a ... Correction unit, 258 ... Estimating unit , CET1 ... Corrected character evaluation value, CET2 ... Corrected character evaluation value, CET3 ... Corrected character evaluation value, CET4 ... Corrected character evaluation value, CEV1 ... Corrected voice evaluation value, CEV2 ... After correction Voice evaluation value, CEV3 ... Corrected voice evaluation value, CEV4 ... Corrected voice evaluation value, EI ... Emotion information, ET1 ... Character evaluation value, ET2 ... Character evaluation value, ET3 ... Character evaluation value, ET4 ... Character evaluation Value, EV1 ... Voice evaluation value, EV2 ... Voice evaluation value, EV3 ... Voice evaluation value, EV4 ... Voice evaluation value, kt1 ... Weighting coefficient, kt2 ... Weighting coefficient, kt3 ... Weighting coefficient, kt4 ... Weighting coefficient, kv1 ... Weighting coefficient , Kv2 ... Weighting coefficient, kv3 ... Weighting coefficient, kv4 ... Weighting coefficient.

Claims

A generation unit that generates a feature amount for the user's voice based on the user's voice information,
Based on the feature amount, a first evaluation unit that generates a first voice evaluation value indicating the intensity with which the user has a first emotion and a second voice evaluation value indicating the intensity with which the user has a second emotion. ,
A recognition unit that generates a recognition character string indicating the utterance content of the user based on the voice information,
Based on the recognition character string, a second character evaluation value indicating the strength with which the user has the first emotion and a second character evaluation value indicating the strength with the user having the second emotion are generated. Evaluation department and
A correction unit that corrects the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value based on the situation information indicating the user's situation.
An estimation unit that estimates one or more emotions held by the user based on the correction result of the correction unit, and an estimation unit.
Emotion estimator equipped with.

The first evaluation unit
The feature amount generated by the generation unit is applied to a learning model in which the relationship between the feature amount corresponding to the human voice and the intensity of each of the first emotion and the second emotion held by the person who emits the voice is learned. Input and acquire the first voice evaluation value and the second voice evaluation value from the learning model.
The emotion estimation device according to claim 1.

The learning model has already learned the relationship between a plurality of features corresponding to the human voice and the intensity of each of the first emotion and the second emotion held by the person who emitted the voice for a plurality of humans. is there,
The emotion estimation device according to claim 2.

The correction unit
Identification information indicating a situation that a human can take, and each of the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value set according to the situation. With reference to the contextual information that shows the relationship with the weighting factor for
A weighting coefficient for each of the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value according to the situation indicated by the situation information is set.
The first voice evaluation value and the second voice evaluation value are based on the weighting coefficients for each of the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value. Correct the value, the first character evaluation value, and the second character evaluation value.
The emotion estimation device according to any one of claims 1 to 3.

The situation-related information is
Regarding the weighting coefficient related to the identification information indicating the situation where the user is in the private space where entry is prohibited without the permission of the user, the weighting coefficient for the first voice evaluation value is larger than the weighting coefficient for the first character evaluation value. And, the weighting coefficient for the second voice evaluation value is larger than the weighting coefficient for the second character evaluation value.
Show that
The emotion estimation device according to claim 4.

The situation-related information is
Regarding the weighting coefficient related to the identification information indicating the situation where the user is in the non-private space that can be entered without the permission of the user, the weighting coefficient for the first character evaluation value is larger than the weighting coefficient for the first voice evaluation value. And, the weighting coefficient for the second character evaluation value is larger than the weighting coefficient for the second voice evaluation value.
Show that
The emotion estimation device according to claim 4.

The correction unit
The first voice evaluation value, the second voice evaluation value, and the first voice evaluation value are based on the degree to which emotions are expressed in the user's voice when the user utters the recognition character string and the situation information. Correct the character evaluation value and the second character evaluation value.
The emotion estimation device according to any one of claims 1 to 3.

The correction unit
Identification information indicating a situation that a human can take, and the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value set according to the situation. Situation-related information showing the relationship with the weighting coefficient for each,
The first voice evaluation value, the second voice evaluation value, and the first character evaluation set based on the character string uttered by a human and the degree to which emotions are expressed in the human voice when the character string is uttered. Character string relationship information indicating the relationship between the value and the increase / decrease value of the weighting coefficient for each of the second character evaluation values, and
Refer to
The character string included in the recognition character string and the situation indicated by the situation information for each of the first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the second character evaluation value. Set the weighting coefficient according to
The first voice evaluation value, the second voice evaluation value, the first character evaluation value, and the first character evaluation value, based on the character string included in the recognition character string and the weighting coefficient according to the situation indicated by the situation information. Correct the two-character evaluation value,
The emotion estimation device according to any one of claims 1 to 3.

An emotion estimation system including the emotion estimation device according to any one of claims 1 to 8 and a terminal device capable of communicating with the emotion estimation device.
The terminal device is
A sound collecting device that collects the user's voice and
The situation information generation unit that generates the situation information and
The voice information indicating the user's voice and the situation information are transmitted to the emotion estimation device, and the recognition character string and one or more emotions held by the user estimated by the estimation unit are transmitted from the emotion estimation device. A communication device that receives emotional information indicating
An output unit that outputs information obtained by executing processing according to the emotion indicated by the emotion information on the recognition character string, and an output unit.
Emotion estimation system with.