JP2010244190A

JP2010244190A - Communication terminal and information display method

Info

Publication number: JP2010244190A
Application number: JP2009090176A
Authority: JP
Inventors: Hanano Kanakubo; 花野金窪
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2009-04-02
Filing date: 2009-04-02
Publication date: 2010-10-28
Anticipated expiration: 2029-04-02
Also published as: JP4884496B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide appropriate information responsive to a psychological state of a user, by accurately predicting the psychological state of the user. <P>SOLUTION: A communication terminal 1 includes: a sound collection unit 11 for collecting surrounding sound to acquire it as sound information; a user voice extraction unit 15 for extracting user voice information representing voice of the user from the sound information; an another person's voice extraction unit 16 for extracting another person's voice information representing voice of another person other than the user from the sound information; an emotion analysis unit 17 for identifying an emotional level of the user, based on the user voice information and identifying an emotional level of the another person based on the another person's voice information; an output information decision unit 19 for deciding information to be provided to the user based on the emotional level of the user and the emotional level of the another person; and an information display unit 20 for displaying output information to present it to the user. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、通信端末及び情報表示方法に関する。 The present invention relates to a communication terminal and an information display method.

従来の携帯電話などの通信端末において、通話中の音声を分析して話者の感情を推定する技術が知られている（例えば特許文献１）。また、特許文献１に記載されるような話者の感情を推定する技術を利用して、音声入力形式のメール作成機能において、話者（ユーザ）の感情に対応する絵文字や記号を適宜選択し、音声入力されたテキストに付加する技術が知られている（例えば特許文献２）。 In a conventional communication terminal such as a mobile phone, a technique for analyzing a voice during a call and estimating a speaker's emotion is known (for example, Patent Document 1). In addition, by using a technique for estimating the emotion of a speaker as described in Patent Document 1, a pictogram or symbol corresponding to the emotion of the speaker (user) is appropriately selected in a voice input format mail creation function. A technique for adding to a text input by voice is known (for example, Patent Document 2).

特開２００６−１０６７１１号公報JP 2006-106711 A 特開２００６−２７７５６７号公報JP 2006-277567 A

ここで、通信端末の利便性や親和性を高めるために、通信端末が、ユーザの置かれている状況に応じて、適切なメッセージをディスプレイに提示したり、文字変換候補や絵文字を変化させたりできると好ましい。 Here, in order to improve the convenience and affinity of the communication terminal, the communication terminal presents an appropriate message on the display, changes the character conversion candidate and pictogram according to the situation where the user is placed. It is preferable if possible.

しかしながら、特許文献１や特許文献２に記載されるユーザの感情推定手法のようにユーザ本人の音声情報のみを利用してユーザの感情を推定したとしても、ユーザが置かれている状況を完全には理解できない可能性が高い。ユーザが置かれている状況とは、ユーザ自身の感情のみならず、例えば周囲の環境や、会話している他者との相対的関係などの影響を受けて変化するものと考えられるからである。したがって、特許文献２に記載されるように、推定されたユーザの感情のみに基づいてユーザに提供する情報を選択したとしても、実際にユーザが置かれている状況に合致せず、ユーザにとって適切な情報とはならない虞がある。 However, even if the user's emotion is estimated using only the user's own voice information as in the user's emotion estimation method described in Patent Document 1 and Patent Document 2, the situation where the user is placed is completely Is likely not understandable. This is because the situation in which the user is placed changes not only by the user's own emotions but also by the influence of the surrounding environment and the relative relationship with the other person who is talking. . Therefore, as described in Patent Document 2, even if information to be provided to the user is selected based only on the estimated emotion of the user, it does not match the situation in which the user is actually placed and is appropriate for the user. May not be useful information.

本発明は、上記課題を解決するためになされたものであり、ユーザが置かれている状況を精度良く推定し、ユーザの状況に応じた適切な情報を提供することができる通信端末及び情報表示方法を提供することを目的とする。 The present invention has been made to solve the above-described problem, and accurately estimates a situation where a user is placed and can provide appropriate information according to the situation of the user and an information display. It aims to provide a method.

上記課題を解決するため、本発明に係る通信端末は、周囲の音を集音して音情報として取得する集音手段と、集音手段により取得された音情報からユーザの音声を示すユーザ音声情報を抽出するユーザ音声抽出手段と、集音手段により取得された音情報からユーザ以外の他者の音声を示す他者音声情報を抽出する他者音声抽出手段と、ユーザ音声抽出手段により抽出されたユーザ音声情報に基づき、ユーザの感情レベルを識別すると共に、他者音声抽出手段により抽出された他者音声情報に基づき、他者の感情レベルを識別する感情解析手段と、感情解析手段により識別されたユーザの感情レベルと、他者の感情レベルとに基づいて、ユーザに提供する情報を決定する情報決定手段と、情報決定手段により決定された情報を表示してユーザに提示する表示手段と、を備えることを特徴とする。 In order to solve the above-described problems, a communication terminal according to the present invention includes a sound collecting unit that collects ambient sounds and acquires the sound as sound information, and a user voice indicating the user's voice from the sound information acquired by the sound collecting unit. User voice extraction means for extracting information, other person voice extraction means for extracting other person voice information indicating the voice of another person other than the user from the sound information acquired by the sound collection means, and user voice extraction means Based on the user voice information, the user's emotion level is identified, and based on the other person's voice information extracted by the other person's voice extraction means, the emotion analysis means for identifying the other person's emotion level and the emotion analysis means Information determining means for determining information to be provided to the user based on the sentiment level of the user and the emotion level of the other person, and displaying the information determined by the information determining means to the user Characterized in that it comprises a Shimesuru display means.

同様に、上記課題を解決するため、本発明に係る情報表示方法は、周囲の音を集音して音情報として取得する集音ステップと、集音ステップにおいて取得された音情報からユーザの音声を示すユーザ音声情報を抽出するユーザ音声抽出ステップと、集音ステップにおいて取得された音情報からユーザ以外の他者の音声を示す他者音声情報を抽出する他者音声抽出ステップと、ユーザ音声抽出ステップにおいて抽出されたユーザ音声情報に基づき、ユーザの感情レベルを識別すると共に、他者音声抽出ステップにおいて抽出された他者音声情報に基づき、他者の感情レベルを識別する感情解析ステップと、感情解析ステップにおいて識別されたユーザの感情レベルと、他者の感情レベルとに基づいて、ユーザに提供する情報を決定する情報決定ステップと、情報決定ステップにおいて決定された情報を表示してユーザに提示する表示ステップと、を備えることを特徴とする。 Similarly, in order to solve the above-described problem, an information display method according to the present invention includes a sound collection step of collecting ambient sound and acquiring it as sound information, and a user's voice from the sound information acquired in the sound collection step. A user voice extraction step for extracting user voice information indicating a voice, a voice extraction step for extracting voice information of another person other than the user from the sound information acquired in the sound collection step, and a user voice extraction An emotion analysis step for identifying a user's emotion level based on the user voice information extracted in the step, and an emotion analysis step for identifying the emotion level of the other person based on the other person's voice information extracted in the other person's voice extraction step; Information determination for determining information to be provided to the user based on the emotion level of the user identified in the analysis step and the emotion level of the other person A step, a display step of presenting to the user by displaying the information determined by the information determining step, characterized in that it comprises a.

このような通信端末及び情報表示方法によれば、ユーザの音声に基づいてユーザの感情レベルを識別すると共に、ユーザの周辺の音情報から抽出された他者の音声に基づいて他者の感情レベルも識別される。そして、これらのユーザの感情レベルと他者の感情レベルとに基づいてユーザに提供する情報が決定される。このため、ユーザ自身の感情レベルとユーザの周囲にいる他者の感情レベルとを比較して、ユーザと周囲の他者との相対的関係の影響も考慮して、ユーザが置かれている状況を精度良く推定することが可能となり、この結果、ユーザの状況に応じた適切な情報を提供することができる。 According to such a communication terminal and an information display method, the emotion level of the other person is identified based on the voice of the other person extracted from the sound information of the surroundings of the user, while identifying the emotion level of the user based on the voice of the user. Are also identified. Then, information to be provided to the user is determined based on the emotion level of these users and the emotion level of others. For this reason, the situation where the user is placed by comparing the emotion level of the user himself with the emotion level of others around the user, and also taking into account the influence of the relative relationship between the user and others around the user As a result, it is possible to provide appropriate information according to the user's situation.

また、本発明に係る通信端末において、集音手段は、周囲の音を常時集音して音情報として取得することが好適である。これにより、通話中以外でもユーザが置かれている状況を認識することが可能となり、より一層適切な情報をユーザに提供することができる。 In the communication terminal according to the present invention, it is preferable that the sound collecting means always collects surrounding sounds and obtains them as sound information. As a result, it is possible to recognize the situation where the user is placed even during a call, and to provide the user with more appropriate information.

本発明に係る通信端末及び情報表示方法によれば、ユーザの心理状態を精度良く推定し、ユーザの心理状態に応じた適切な情報を提供することができる。 According to the communication terminal and the information display method according to the present invention, it is possible to accurately estimate the user's psychological state and provide appropriate information according to the user's psychological state.

本発明の一実施形態に係る通信端末の機能ブロック図である。It is a functional block diagram of the communication terminal which concerns on one Embodiment of this invention. 通信端末のハードウェア構成図である。It is a hardware block diagram of a communication terminal. 出力情報記憶部の構成の一例を示す図である。It is a figure which shows an example of a structure of an output information storage part. 図３に示す出力情報それぞれにおいて、ユーザと他者との感情レベルによって想定される場の状況や、その出力情報を選択する根拠を示す図である。It is a figure which shows the grounds which select the situation of the place assumed by the emotion level of a user and others in each output information shown in FIG. 3, and the output information. 出力情報記憶部の構成の一例を示す図である。It is a figure which shows an example of a structure of an output information storage part. 図５に示す出力情報それぞれにおいて、ユーザと他者との感情レベルによって想定される場の状況や、その出力情報を選択する根拠を示す図である。In each output information shown in FIG. 5, it is a figure which shows the condition of the place assumed by the emotion level of a user and others, and the basis for selecting the output information. 本実施形態の通信端末において実行されるサンプル音声登録処理を示すフローチャートである。It is a flowchart which shows the sample audio | voice registration process performed in the communication terminal of this embodiment. 本実施形態の通信端末において実行される情報表示処理を示すフローチャートである。It is a flowchart which shows the information display process performed in the communication terminal of this embodiment.

以下、図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一又は同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description is omitted.

図１は、本発明の一実施形態に係る通信端末の機能ブロック図である。図１に示すように、通信端末１は、集音部（集音手段）１１、音声登録部１２、ユーザ音声記憶部１３、音情報記憶部１４、ユーザ音声抽出部（ユーザ音声抽出手段）１５、他者音声抽出部（他者音声抽出手段）１６、感情解析部（感情解析手段）１７、出力情報記憶部１８、出力情報決定部（情報決定手段）１９、及び情報表示部（表示手段）２０を備えている。 FIG. 1 is a functional block diagram of a communication terminal according to an embodiment of the present invention. As shown in FIG. 1, the communication terminal 1 includes a sound collection unit (sound collection unit) 11, a voice registration unit 12, a user voice storage unit 13, a sound information storage unit 14, and a user voice extraction unit (user voice extraction unit) 15. , Other person voice extraction unit (other person voice extraction unit) 16, emotion analysis unit (emotion analysis unit) 17, output information storage unit 18, output information determination unit (information determination unit) 19, and information display unit (display unit) 20 is provided.

図２は、通信端末１のハードウェア構成図である。通信端末１は、物理的には、ＣＰＵ（Central Processing Unit）１１１、主記憶装置であるＲＡＭ（Random Access Memory）１１２及びＲＯＭ（Read Only Memory）１１３、入力デバイスである入力キー等の操作部１１４、ディスプレイ１１５、無線通信部１１６、マイク１１７などを有する端末装置として構成されている。図１に示す通信端末１の各機能は、図２に示すＣＰＵ１１１、ＲＡＭ１１２等のハードウェア上に所定のコンピュータソフトウェアを読み込ませることにより、ＣＰＵ１１１の制御のもとで操作部１１４、ディスプレイ１１５、無線通信部１１６、マイク１１７を動作させるとともに、ＲＡＭ１１２やＲＯＭ１１３におけるデータの読み出し及び書き込みを行うことで実現される。 FIG. 2 is a hardware configuration diagram of the communication terminal 1. The communication terminal 1 physically includes a CPU (Central Processing Unit) 111, a RAM (Random Access Memory) 112 and a ROM (Read Only Memory) 113 that are main storage devices, and an operation unit 114 such as an input key that is an input device. , A display device 115, a wireless communication unit 116, a microphone 117, and the like. Each function of the communication terminal 1 shown in FIG. 1 is configured such that predetermined computer software is loaded on hardware such as the CPU 111 and the RAM 112 shown in FIG. This is realized by operating the communication unit 116 and the microphone 117 and reading and writing data in the RAM 112 and the ROM 113.

次に、図１に示す通信端末１の各機能について説明する。 Next, each function of the communication terminal 1 shown in FIG. 1 will be described.

集音部１１は、周囲の音を常時集音して音情報として取得する。集音部１１は、取得した音情報を音情報記憶部１４に送信する。また、集音部１１は、通信端末１が後述するユーザ音声登録処理を実行する際には、取得した音情報を音声登録部１２に送信する。なお、集音部１１は、一般的には、通信端末１に付属されているマイク１１７である。 The sound collecting unit 11 always collects surrounding sounds and acquires them as sound information. The sound collection unit 11 transmits the acquired sound information to the sound information storage unit 14. The sound collection unit 11 transmits the acquired sound information to the voice registration unit 12 when the communication terminal 1 executes a user voice registration process described later. Note that the sound collecting unit 11 is generally a microphone 117 attached to the communication terminal 1.

音声登録部１２は、ユーザ音声を登録する部分である。音声登録部１２は、後述するユーザ音声登録処理において、集音部１１が取得した音情報を受信し、この音情報から音声情報を抽出する。そして、この抽出した音声情報を、通信端末１のユーザのサンプル音声を示すサンプル音声情報として、ユーザ音声記憶部１３に送信して格納する。 The voice registration unit 12 is a part for registering user voice. The voice registration unit 12 receives the sound information acquired by the sound collection unit 11 in the user voice registration process described later, and extracts the voice information from the sound information. Then, the extracted voice information is transmitted to and stored in the user voice storage unit 13 as sample voice information indicating the sample voice of the user of the communication terminal 1.

ユーザ音声記憶部１３は、通信端末１のユーザのサンプル音声を示すサンプル音声情報を保持する部分である。ユーザ音声記憶部１３は、音声登録部１２からサンプル音声情報を受信すると、保持しているサンプル音声情報を更新する。また、ユーザ音声記憶部１３は、ユーザ音声抽出部１５に対して、保持しているサンプル音声情報を提供する。 The user voice storage unit 13 is a part that holds sample voice information indicating the sample voice of the user of the communication terminal 1. When the user voice storage unit 13 receives the sample voice information from the voice registration unit 12, the user voice storage unit 13 updates the held sample voice information. In addition, the user voice storage unit 13 provides the held sample voice information to the user voice extraction unit 15.

音情報記憶部１４は、集音部１１により取得されて音情報を保持する部分である。音情報記憶部１４は、本実施形態では、集音部１１から送信される音情報のうち最新の５分間の音情報を常に保持するよう構成されている。音情報記憶部１４は、ユーザ音声抽出部１５に対して、保持している音情報を提供する。 The sound information storage unit 14 is a part that holds sound information acquired by the sound collection unit 11. In this embodiment, the sound information storage unit 14 is configured to always hold the latest five minutes of sound information among the sound information transmitted from the sound collection unit 11. The sound information storage unit 14 provides the held sound information to the user voice extraction unit 15.

ユーザ音声抽出部１５は、音情報記憶部１４に保持されている音情報から、ユーザの音声を示すユーザ音声情報を抽出する部分である。具体的には、ユーザ音声抽出部１５は、定期的に音情報記憶部１４から音情報を取得して、ユーザ音声記憶部１３に保持されているサンプル音声情報を利用して、音情報にユーザ音声が含まれているか否かを確認する。音情報にユーザ音声が含まれていた場合には、音情報からユーザ音声情報を抽出し、抽出したユーザ音声情報を感情解析部１７に送信すると共に、ユーザ音声情報を除外した音情報を他者音声抽出部１６に送信する。 The user voice extraction unit 15 is a part that extracts user voice information indicating the voice of the user from the sound information held in the sound information storage unit 14. Specifically, the user voice extraction unit 15 periodically acquires the sound information from the sound information storage unit 14 and uses the sample voice information held in the user voice storage unit 13 to obtain the sound information from the user. Check if audio is included. When the user information is included in the sound information, the user sound information is extracted from the sound information, the extracted user sound information is transmitted to the emotion analysis unit 17, and the sound information excluding the user sound information is sent to the other person. It transmits to the voice extraction unit 16.

他者音声抽出部１６は、音情報から通信端末１のユーザ以外の他者の音声を示す他者音声情報を抽出する部分である。具体的には、他者音声抽出部１６は、ユーザ音声抽出部１５よりユーザ音声情報を除外した音情報を受信すると、この音情報を解析して他者音声情報を抽出し、抽出した他者音声情報を感情解析部１７に送信する。 The other person voice extraction unit 16 is a part that extracts the other person voice information indicating the voice of the other person other than the user of the communication terminal 1 from the sound information. Specifically, when the other person voice extraction unit 16 receives the sound information from which the user voice information is excluded from the user voice extraction unit 15, the other person voice extraction unit 16 analyzes the sound information and extracts the other person voice information, The voice information is transmitted to the emotion analysis unit 17.

なお、本実施形態では、「他者」とは、集音部１１で音声を拾える程度の距離にユーザの周囲に存在している人間であって、ユーザと直接会話しているか、またはユーザの周囲で他の人間と会話しており、少なからずユーザの心理状態に影響を与えうる人間のことをいうものとする In this embodiment, the “other” is a person who is present around the user at such a distance that the sound collecting unit 11 can pick up the voice, and is directly talking to the user or the user's A person who has a conversation with another person in the surrounding area and can affect the psychological state of the user.

感情解析部１７は、ユーザ音声抽出部１５により抽出されたユーザ音声情報に基づき通信端末１のユーザの感情レベルを識別すると共に、他者音声抽出部１６により抽出された他者音声情報に基づき、ユーザの周囲に存在する他者の感情レベルを識別する。感情解析部１７は、具体的には、音声情報の音量、波形、ピッチ、音韻などの情報を利用して、感情レベルを識別する。本実施形態では、感情レベルとして、大枠では「喜」、「怒」、「哀」、「楽」の４パターンが設定されており、さらに「怒」に関しては「激しい怒り」、「怒」、「内面的な怒り」の３パターンに細分化されて設定されている。なお、感情解析部１７における感情推定には周知の手法を用いればよく、このような手法については、例えば国際公開番号ＷＯ００／６２２７９などに記載されている。 The emotion analysis unit 17 identifies the emotion level of the user of the communication terminal 1 based on the user voice information extracted by the user voice extraction unit 15, and based on the other person voice information extracted by the other person voice extraction unit 16, Identify the emotional levels of others around the user. Specifically, the emotion analysis unit 17 identifies the emotion level using information such as the volume, waveform, pitch, and phoneme of the voice information. In this embodiment, four patterns of “joy”, “anger”, “sorrow”, and “easy” are set as the emotion level, and regarding “anger”, “violent anger”, “anger”, It is subdivided into three patterns of “inner anger”. It should be noted that a known technique may be used for emotion estimation in the emotion analysis unit 17, and such a technique is described in, for example, International Publication No. WO00 / 62279.

出力情報記憶部１８は、ユーザの感情レベルと他者の感情レベルに応じた出力情報を保持している。つまり、出力情報記憶部１８は、ユーザの感情と周囲の人間の感情を掛け合わせたときにどのような出力をするのが適当かというルールを規定するものである。 The output information storage unit 18 holds output information corresponding to the emotion level of the user and the emotion level of the other person. In other words, the output information storage unit 18 defines a rule as to what kind of output is appropriate when the user's emotion and the surrounding human emotion are multiplied.

図３〜６を参照して、出力情報記憶部１８の具体的な構成について説明する。図３は、出力情報記憶部１８の構成の一例を示す図である。図３に示すように、出力情報記憶部１８は、ユーザの４段階の感情レベル（「喜」、「怒」、「哀」、「楽」）の１つと、他者の４段階の感情レベルの１つに対応する合計１６パターンの出力情報が保持されている。また、図４は、図３に示す１６パターンの出力情報それぞれにおいて、ユーザと他者との感情レベルによって想定される、ユーザが置かれている場の状況や、その出力情報を選択する根拠を示す図である。この例では、ユーザの感情レベルが「喜」の場合、他者の感情レベルが「喜」ならば、「ユーザも周囲も喜んでいる状況なので、共に喜ぶ」という根拠に基づいて（図４参照）、出力情報は「よかったね」となる。また、他者の感情レベルが「怒」ならば、「ユーザは喜んでいるが周囲は怒っているので、有頂天になっていそうなユーザに注意喚起する」という根拠に基づいて、出力情報は「冷静に！」となる。このように、出力情報記憶部１８は、ユーザの感情レベルが同一であっても、他者の感情レベルが異なる場合には、ユーザが置かれているそれぞれの場の状況に応じて別の出力情報が選択されるように構成されている。 A specific configuration of the output information storage unit 18 will be described with reference to FIGS. FIG. 3 is a diagram illustrating an example of the configuration of the output information storage unit 18. As shown in FIG. 3, the output information storage unit 18 includes one of the user's four levels of emotion levels (“joy”, “anger”, “sorrow”, “easy”) and the other level of emotion levels. A total of 16 patterns of output information corresponding to one of these are held. Also, FIG. 4 shows the situation of the place where the user is placed and the basis for selecting the output information, which are assumed according to the emotion level between the user and others in each of the 16 patterns of output information shown in FIG. FIG. In this example, when the emotion level of the user is “pleasant”, if the emotion level of the other person is “pleasant”, it is based on the ground that “the user and the surroundings are delighted, so they are happy” (see FIG. 4). ), The output information is “good”. Also, if the emotion level of the other person is “angry”, the output information is “based on the ground that“ the user is happy but the surroundings are angry, so alert the user who seems to be ecstatic ”. Keep calm! " In this way, the output information storage unit 18 can output different outputs according to the situation of each place where the user is placed when the emotion level of the other person is different even if the emotion level of the user is the same. Information is configured to be selected.

図５は、図３においてユーザ及び他者の感情レベルが共に「怒」である部分３１について、感情レベルをさらに細分化したときの出力情報を示す図である。図５に示すように、出力情報記憶部１８は、感情レベル「怒」に関しては、ユーザ及び他者の感情レベルがさらに３パターン（「激しい怒り」、「怒」、「内面的な怒り」）に細分化され、ユーザ及び他者の感情レベルに応じて合計９パターンの出力情報が保持されている。また、図６は、図４と同様に、図５に示すユーザ及び他者の感情レベルが「怒」の部分３１の９パターンの出力情報それぞれにおいて、ユーザと他者との感情レベルによって想定される、ユーザが置かれている場の状況や、その出力情報を選択する根拠を示す図である。図５、６に示すように、ユーザ及び他者が共に怒りの感情を抱いている場合でも、各自の怒りの度合いに応じてユーザの置かれる場の状況は異なるものであると考えられ、それぞれの場の状況に応じた別の出力情報が選択されるように構成されている。 FIG. 5 is a diagram showing output information when the emotion level is further subdivided for the portion 31 in which the emotion level of both the user and the other person is “angry” in FIG. 3. As shown in FIG. 5, regarding the emotion level “anger”, the output information storage unit 18 further has three patterns of emotion levels of the user and others (“violent anger”, “anger”, “inner anger”). A total of nine patterns of output information are held in accordance with the emotional levels of the user and others. Also, FIG. 6 is assumed based on the emotion level between the user and the other person in each of the nine patterns of output information of the portion 31 where the emotion level of the user and the other person is “angry” as shown in FIG. It is a figure which shows the grounds for selecting the output situation and the situation of the place where the user is placed. As shown in FIGS. 5 and 6, even when the user and others are both angry, the situation of the place where the user is placed is considered to be different depending on the degree of anger of each person, Different output information according to the situation of the place is selected.

図１に戻り、出力情報決定部１９は、感情解析部１７により識別されたユーザ及び他者の感情レベルに基づいて、出力情報記憶部１８を参照してユーザに提供する出力情報を決定する。 Returning to FIG. 1, the output information determination unit 19 determines output information to be provided to the user with reference to the output information storage unit 18 based on the emotion levels of the user and others identified by the emotion analysis unit 17.

情報表示部２０は、出力情報決定部１９により決定された出力情報を表示してユーザに提示する部分である。情報表示部２０は、一般的には、通信端末１のディスプレイ１１５である。 The information display unit 20 is a part that displays the output information determined by the output information determination unit 19 and presents it to the user. The information display unit 20 is generally the display 115 of the communication terminal 1.

次に、図７、８に示すフローチャートを用いて、本実施形態の通信端末１において実行される処理を説明すると共に、本実施形態に係る情報表示方法について説明する。図７は、本実施形態の通信端末１において実行されるサンプル音声登録処理を示すフローチャートである。このサンプル音声登録処理は、主に音声登録部１２により実行される。 Next, processing executed in the communication terminal 1 of the present embodiment will be described using the flowcharts shown in FIGS. 7 and 8, and an information display method according to the present embodiment will be described. FIG. 7 is a flowchart showing sample voice registration processing executed in the communication terminal 1 of the present embodiment. This sample voice registration process is mainly executed by the voice registration unit 12.

まず、音声登録部１２によりサンプル音声登録処理が開始されると（Ｓ１０１）、例えばディスプレイ１１５にユーザの音声を登録するためユーザに発声を促す旨の指示が表示される。そして、集音部１１により集音が開始され（Ｓ１０１）、集音されたユーザの音声を含む音情報が音声登録部１２に送信される。 First, when the sample voice registration process is started by the voice registration unit 12 (S101), for example, an instruction for prompting the user to speak is displayed on the display 115 to register the user's voice. Then, sound collection is started by the sound collection unit 11 (S101), and sound information including the collected user's voice is transmitted to the sound registration unit 12.

次に、音声登録部１２により、集音部１１から受信した音情報から音声情報が抽出され（Ｓ１０３）、抽出された音声が１種類か否かが確認される（Ｓ１０４）。抽出された音声が１種類ではない場合は、ユーザの声として登録できないので、集音をやり直すべくステップＳ１０３に戻る。抽出された音声が１種類である場合は、抽出された音声がユーザの音声を抽出するためのサンプル音声としてユーザ音声記憶部１３に登録される（Ｓ１０５）。 Next, the voice registration unit 12 extracts voice information from the sound information received from the sound collection unit 11 (S103), and confirms whether or not the extracted voice is one type (S104). If the extracted voice is not one type, it cannot be registered as the user's voice, and the process returns to step S103 to redo the sound collection. When the extracted voice is one type, the extracted voice is registered in the user voice storage unit 13 as sample voice for extracting the user voice (S105).

図８は、本実施形態の通信端末１において実行される情報表示処理を示すフローチャートである。この情報表示処理は、図７に示したサンプル音声登録処理が実行され、ユーザのサンプル音声がユーザ音声記憶部１３に登録された後に実施される。 FIG. 8 is a flowchart showing information display processing executed in the communication terminal 1 of the present embodiment. This information display process is performed after the sample voice registration process shown in FIG. 7 is executed and the user's sample voice is registered in the user voice storage unit 13.

まず、ユーザ音声抽出部１５により、音情報記憶部１４から集音部１１が集音した最新５分間の通信端末１の周囲の音情報が取得され、この音情報から音声情報が抽出される（Ｓ２０１：集音ステップ）と、この音声情報がユーザ音声記憶部１３に保持されているサンプル音声情報と比較される（Ｓ２０２）。 First, the sound information around the communication terminal 1 for the latest 5 minutes collected by the sound collection unit 11 is acquired from the sound information storage unit 14 by the user sound extraction unit 15 and the sound information is extracted from the sound information ( The voice information is compared with the sample voice information held in the user voice storage unit 13 (S201: sound collection step) (S202).

次に、ステップＳ２０２の比較結果に基づいて音情報記憶部１４の音情報にユーザ音声が含まれているか否かが確認される（Ｓ２０３）。音情報にユーザ音声が含まれていた場合にはステップＳ２０４に移行する。音情報にユーザ音声が含まれていない場合にはステップＳ２０１に戻り、再びユーザ音声抽出部１５が音情報記憶部１４の音情報に音声が含まれているのを検知するまで待機する。 Next, based on the comparison result of step S202, it is confirmed whether or not the user voice is included in the sound information in the sound information storage unit 14 (S203). If the user information is included in the sound information, the process proceeds to step S204. If the user information is not included in the sound information, the process returns to step S201 and waits until the user sound extracting unit 15 detects again that the sound information in the sound information storage unit 14 includes the sound.

ステップＳ２０３において音情報にユーザ音声が含まれていると判定された場合、ユーザ音声抽出部１５により、音情報からユーザ音声情報が抽出され（Ｓ２０４：ユーザ音声抽出ステップ）、抽出されたユーザ音声情報が感情解析部１７に送信されると共に、ユーザ音声情報を除外した音情報が他者音声抽出部１６に送信される。 When it is determined in step S203 that the user information is included in the sound information, the user sound information is extracted from the sound information by the user sound extraction unit 15 (S204: user sound extraction step), and the extracted user sound information is extracted. Is transmitted to the emotion analysis unit 17, and the sound information excluding the user sound information is transmitted to the other person sound extraction unit 16.

次に、他者音声抽出部１６により、ユーザ音声抽出部１５がユーザ音声情報を除外した音情報から他者音声情報が抽出され（Ｓ２０５：他者音声抽出ステップ）、抽出された他者音声情報が感情解析部１７に送信される。 Next, the other person voice extraction unit 16 extracts the other person voice information from the sound information from which the user voice information is excluded by the user voice extraction unit 15 (S205: Other person voice extraction step). Is transmitted to the emotion analysis unit 17.

さらに、感情解析部１７により、ユーザ音声情報に基づき通信端末１のユーザの感情レベルが識別されると共に、他者音声情報に基づき、ユーザの周囲に存在する他者の感情レベルが識別される（Ｓ２０６：感情解析ステップ）。 Furthermore, the emotion analysis unit 17 identifies the emotion level of the user of the communication terminal 1 based on the user voice information, and identifies the emotion level of the others present around the user based on the other person voice information ( S206: Emotion analysis step).

そして、出力情報決定部１９により、ユーザ及び他者の感情レベルに基づいて、出力情報記憶部１８を参照してユーザに提供する出力情報が決定され（Ｓ２０７：情報決定ステップ）、情報表示部２０により、出力情報決定部１９が決定した出力情報がディスプレイ１１５に表示されてユーザに提示される（Ｓ２０８：表示ステップ）。 Then, the output information determination unit 19 determines output information to be provided to the user with reference to the output information storage unit 18 based on the emotion levels of the user and others (S207: information determination step), and the information display unit 20 Thus, the output information determined by the output information determination unit 19 is displayed on the display 115 and presented to the user (S208: display step).

以上に説明したように、本実施形態に係る通信端末１及び情報表示方法によれば、ユーザの音声に基づいてユーザの感情レベルを識別すると共に、ユーザの周辺の音情報から抽出された他者の音声に基づいて他者の感情レベルも識別される。そして、これらのユーザの感情レベルと他者の感情レベルとに基づいてユーザに提供する情報が決定される。このため、ユーザ自身の感情レベルとユーザの周囲にいる他者の感情レベルとを比較して、ユーザと周囲の他者との相対的関係の影響も考慮して、ユーザが置かれている状況を精度良く推定することが可能となり、この結果、ユーザの状況に応じた適切な情報を提供することができる。 As described above, according to the communication terminal 1 and the information display method according to the present embodiment, the emotion level of the user is identified based on the user's voice and the other person extracted from the sound information around the user. The other person's emotional level is also identified based on the voice. Then, information to be provided to the user is determined based on the emotion level of these users and the emotion level of others. For this reason, the situation where the user is placed by comparing the emotion level of the user himself with the emotion level of others around the user, and also taking into account the influence of the relative relationship between the user and others around the user As a result, it is possible to provide appropriate information according to the user's situation.

また、集音部１１が周囲の音を常時集音して音情報として取得するため、通話中以外でもユーザが置かれている状況を認識することが可能となり、より一層適切な情報をユーザに提供することができる。 In addition, since the sound collecting unit 11 always collects ambient sounds and obtains them as sound information, it is possible to recognize the situation where the user is placed even during a call, and to provide the user with more appropriate information. Can be provided.

以上、本発明に係る通信端末及び情報表示方法について好適な実施形態を挙げて説明したが、本発明は上記実施形態に限定されるものではない。例えば、上記実施形態では、ユーザに提供する出力情報としてディスプレイ１１５に提示するメッセージとしたが、この他、文字入力機能における文字変換候補や絵文字を適用してもよい。 The communication terminal and the information display method according to the present invention have been described with reference to the preferred embodiments, but the present invention is not limited to the above embodiments. For example, in the above embodiment, a message presented on the display 115 as output information to be provided to the user is used, but in addition to this, a character conversion candidate or a pictograph in the character input function may be applied.

また、出力情報を決めるためのユーザ及び他者の感情レベルは、上記実施形態で例示した「喜」、「怒」、「哀」、「楽」の他に、例えば「憤怒」、「嫌気」、「好意」など他の感情を適用してもよい。 In addition to the “joy”, “anger”, “sorrow”, and “easy” exemplified in the above embodiment, the emotion level of the user and others for determining the output information is, for example, “anger”, “anaerobic” Other emotions such as “favor” may be applied.

また、音情報記憶部１４に保持される音情報を解析して会話内容を判別し、ユーザと他者との関係（友人、上司、部下、恋人、家族など）や、会話の場所（会社、自宅、デパートなど）といったより詳細な情報を識別可能とし、この情報に応じて適宜選択可能な複数の出力情報記憶部１８を備え、より一層ユーザの心理状態に適合した出力情報を提供することができるよう構成してもよい。 Also, the sound information held in the sound information storage unit 14 is analyzed to determine the content of the conversation, the relationship between the user and others (friend, boss, subordinate, lover, family, etc.), the location of the conversation (company, It is possible to identify more detailed information such as home, department store, etc., and provide a plurality of output information storage units 18 that can be appropriately selected according to this information, and to provide output information further adapted to the psychological state of the user You may comprise so that it can do.

１…通信端末、１１…集音部（集音手段）、１２…音声登録部、１３…ユーザ音声記憶部、１４…音情報記憶部、１５…ユーザ音声抽出部（ユーザ音声抽出手段）、１６…他者音声抽出部（他者音声抽出手段）、１７…感情解析部（感情解析手段）、１８…出力情報記憶部、１９…出力情報決定部（情報決定手段）、２０…情報表示部（表示手段）。
DESCRIPTION OF SYMBOLS 1 ... Communication terminal, 11 ... Sound collection part (sound collection means), 12 ... Voice registration part, 13 ... User voice storage part, 14 ... Sound information storage part, 15 ... User voice extraction part (user voice extraction means), 16 ... other person voice extraction unit (other person voice extraction unit), 17 ... emotion analysis unit (emotion analysis unit), 18 ... output information storage unit, 19 ... output information determination unit (information determination unit), 20 ... information display unit ( Display means).

Claims

Sound collecting means for collecting surrounding sound and obtaining it as sound information;
User voice extraction means for extracting user voice information indicating the user's voice from the sound information acquired by the sound collection means;
Another person voice extraction means for extracting other person voice information indicating the voice of another person other than the user from the sound information acquired by the sound collecting means;
Emotion that identifies the user's emotion level based on the user voice information extracted by the user voice extraction means and also identifies the emotion level of the other person based on the other person's voice information extracted by the other person's voice extraction means Analysis means;
Information determining means for determining information to be provided to the user based on the emotion level of the user identified by the emotion analyzing means and the emotion level of the other;
Display means for displaying the information determined by the information determination means and presenting it to the user;
A communication terminal comprising:

The communication terminal according to claim 1, wherein the sound collecting unit constantly collects surrounding sounds and obtains them as sound information.

A sound collection step for collecting ambient sound and acquiring it as sound information;
A user voice extraction step for extracting user voice information indicating a user voice from the sound information acquired in the sound collection step;
Other person voice extraction step of extracting other person voice information indicating the voice of another person other than the user from the sound information acquired in the sound collecting step;
Emotion that identifies the user's emotion level based on the user voice information extracted in the user voice extraction step and also identifies the emotion level of the other person based on the other person's voice information extracted in the other person's voice extraction step An analysis step;
An information determination step for determining information to be provided to the user based on the emotion level of the user identified in the emotion analysis step and the emotion level of the other person,
A display step of displaying the information determined in the information determination step and presenting it to the user;
An information display method comprising: