JP2015050470A

JP2015050470A - Terminal device and speech communication data processing method

Info

Publication number: JP2015050470A
Application number: JP2013178425A
Authority: JP
Inventors: 香織上田; Kaori Ueda
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2013-08-29
Filing date: 2013-08-29
Publication date: 2015-03-16
Anticipated expiration: 2033-08-29
Also published as: JP6189684B2

Abstract

PROBLEM TO BE SOLVED: To utilize speech communication data obtained during speech communication in a terminal device.SOLUTION: A terminal device records speech communication data of a speaker and extracts a time when the speaker is in a specific emotional state (a normal state or any emotional state) from the speaker's voice level (voice pitch and volume) recorded in the speech communication data. On the basis of the time when the speaker is in the specific emotional state during the speech communication and the voice level during the speaker is in the emotional state, the terminal device determines emotional degree of the speaker after the speech communication is over, associates the determined emotional degree with another speaker, and records it in a speech communication record. Moreover, the terminal device changes a display mode of a prescribed screen of application (application equipped with a telephone directory, speech communication history, a contact list, or the like) when the speaker is communicating with the another speaker, in accordance with the emotional degree recorded in the speech communication record.

Description

本発明は、端末装置に関し、特に、通話データを処理する端末装置及び通話データ処理方法に関する。 The present invention relates to a terminal device, and more particularly to a terminal device and a call data processing method for processing call data.

携帯電話やスマートフォンは、コミュニケーション向上のための様々な通話機能の拡張が提案されている。例えば、下記特許文献１には、端末内にキャラクタデータを格納しておき、携帯電話端末が受信した音声信号のレベルを検出し、その音声信号のレベルに応答してキャラクタのアニメーションを携帯電話端末の画面に表示する技術が開示されている。 For mobile phones and smartphones, expansion of various call functions for improving communication has been proposed. For example, in Patent Document 1 below, character data is stored in a terminal, the level of an audio signal received by the mobile phone terminal is detected, and character animation is transmitted in response to the level of the audio signal. A technique for displaying on the screen is disclosed.

また、下記特許文献２には、効果音／音楽を音声通話時に付加する気分モード通話をするか否かを選択させ、気分モード通話を行う場合、生体センサ又はメールの文字データから検知したユーザの感情状態を分析して気分モードと気分レベルを設定し、その気分モードと気分レベルにより、効果音又は音楽を選択したり変更したりする技術が開示されている。 Further, in Patent Document 2 below, when a mood mode call for adding a sound effect / music at the time of a voice call is selected, and when a mood mode call is performed, the user's detected from the biometric sensor or the character data of the mail A technique is disclosed in which an emotional state is analyzed to set a mood mode and a mood level, and sound effects or music are selected or changed according to the mood mode and the mood level.

特開２００５−６４９３９号公報JP 2005-64939 A 特開２０１２−４８８５号公報JP 2012-4885 A

しかしながら、上記の特許文献１，２に記載の技術には、いずれも以下のような課題がある。まず、通話中のコミュニケーション効果を向上させることを目的とするため、通話中以外の場面には適用できない。また、通話音声に合わせて、キャラクタを表示したり効果音を流したりしても、必ずしも通話中のユーザに効果的であるとは限らず、コミュニケーション向上効果が常に増すとは言い難い。また、多くの人間は、誰かとすぐに繋がるために携帯電話等を使用するが、その相手とどのくらい関わっているかは、通話履歴やメール・ＳＭＳ(Short Message Service)のやりとりの回数など、各アプリケーション単位
でしかみることができない。 However, the techniques described in Patent Documents 1 and 2 have the following problems. First, it aims at improving the communication effect during a call, and therefore cannot be applied to scenes other than during a call. Further, even if a character is displayed or a sound effect is played in accordance with the call voice, it is not always effective for the user who is calling, and it is difficult to say that the communication improvement effect is always increased. In addition, many people use mobile phones to connect with someone quickly, but how much they are involved with that person depends on the application history such as call history and the number of mail / SMS (Short Message Service) exchanges. Can only be seen in units.

本発明は、上記した課題を解決するためになされたものであり、通話中に得られた通話データを通話後にも活用し、コミュニケーション向上に役立たせることを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to utilize call data obtained during a call even after the call to improve communication.

上記した課題を解決するために本発明の端末装置は、通話者の通話音声を記録する通話記録部と、前記通話記録部に記録された前記通話者の通話音声から、当該通話者が特定の感情状態にある時間を抽出する状態抽出部と、前記特定の感情状態にある時間、及び、当該感情状態にある時間中の音声レベルに基づいて、前記通話者の感情度を通話終了後に判定する判定部と、前記判定部により判定された感情度に応じて、通話終了以降、通話相手に関する情報を含む所定画面の表示態様を変化させて提示する提示部と、
と、を備えることを特徴とする。 In order to solve the above-described problem, a terminal device according to the present invention includes a call recording unit that records a caller's call voice and a caller's call voice recorded in the call record unit. Based on the state extraction unit that extracts the time in the emotional state, the time in the specific emotional state, and the voice level during the time in the emotional state, the emotional level of the caller is determined after the call ends A determining unit, and a presenting unit that changes and presents a display mode of a predetermined screen including information related to the other party of the call after the end of the call according to the emotion level determined by the determining unit;
And.

本発明において、前記音声レベルは、前記通話者が通話中に発する音声の高さ、音声の大きさであること、前記感情度は、笑い度であること、前記所定画面は、電話帳、通話履
歴、発信履歴、着信履歴、コンタクトリストの表示画面のうち、少なくとも一つを含むことを特徴とする。 In the present invention, the voice level is a voice level and a loudness level that the caller makes during a call, the emotion level is a laughter level, and the predetermined screen is a phone book, a call It includes at least one of a history, a call history, a call history, and a contact list display screen.

さらに、本発明において、画像を記憶する記憶部と、前記提示部は、前記感情度の判断結果に基づいて、当該感情度に対応した画像を前記記憶部から抽出し、前記所定画面に表示させる画像抽出部を含むことを特徴とする。 Furthermore, in the present invention, the storage unit that stores an image and the presenting unit extract an image corresponding to the emotion level from the storage unit based on the determination result of the emotion level, and display the image on the predetermined screen. An image extracting unit is included.

本発明によれば、通話中に得られたデータを、通話後、他者とのコミュニケーション状況の確認に役立てることができる。 According to the present invention, data obtained during a call can be used to confirm the communication status with another person after the call.

本発明の実施の形態に係る音声通話端末の基本動作の概念を示す図である。It is a figure which shows the concept of the basic operation | movement of the voice call terminal which concerns on embodiment of this invention. 本発明の実施の形態に係る音声通話端末のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the voice call terminal which concerns on embodiment of this invention. 本発明の実施の形態に係る音声通話端末の制御部の機能ブロックを示す図である。It is a figure which shows the functional block of the control part of the voice call terminal which concerns on embodiment of this invention. 本発明の実施の形態に係る笑い度判定の概念を示す図である。It is a figure which shows the concept of the laughter degree determination which concerns on embodiment of this invention. 本発明の実施の形態に係る電話帳アプリケーションの表示画面を示す図である。It is a figure which shows the display screen of the telephone directory application which concerns on embodiment of this invention. 本発明の実施の形態に係る他のアプリケーションの表示画面を示す図である。It is a figure which shows the display screen of the other application which concerns on embodiment of this invention.

以下、添付図面を参照して、本発明を実施するための形態（以下、本実施形態という）について詳細に説明する。なお、本実施形態の説明の全体を通して同じ要素には同じ番号を付している。 DESCRIPTION OF EMBODIMENTS Hereinafter, a mode for carrying out the present invention (hereinafter referred to as this embodiment) will be described in detail with reference to the accompanying drawings. Note that the same numbers are assigned to the same elements throughout the description of the present embodiment.

（実施形態の基本動作概念）
図１は、本発明の実施形態に係る端末装置の一例である音声通話端末の基本動作の概念を示す図である。本発明における「感情度」とは、通話者の通話中における感情状態（平常状態及び喜怒哀楽のいずれかの状態）の度合いを意味するが、以下では説明を具体的にするため、上記感情度の中から特に「笑い度」を選択した場合について音声通話端末の処理フローの主なステップについて説明する。 (Basic operation concept of embodiment)
FIG. 1 is a diagram showing a concept of basic operation of a voice call terminal which is an example of a terminal device according to an embodiment of the present invention. The “emotion level” in the present invention means the degree of emotional state (either normal state or emotional state) during the call of the caller. The main steps of the processing flow of the voice call terminal will be described in particular when “laughter” is selected from the degrees.

図示するように、まず、ステップＳ１では、相手との通話中、自分（端末の所持者）と相手の笑い声を端末内の記憶部に録音する。 As shown in the drawing, first, in step S1, during a call with the other party, the laughter of himself (the owner of the terminal) and the other party is recorded in the storage unit in the terminal.

次に、ステップＳ２では、通話中の単位時間ごとの「笑い声のレベル」（笑い声の高さ及び／又は大きさ）と「笑いを検出した時間」を抽出する。 Next, in step S2, “the level of laughter” (the level and / or magnitude of the laughter) and “the time when laughter was detected” are extracted for each unit time during the call.

そして、ステップＳ３では、笑い声のレベルをＸ軸、笑いを検出した時間をＹ軸とした図右に示すようなグラフ（感情度判定グラフと呼ぶ）を予め設けておき、ステップＳ２で抽出した通話中の単位時間ごとの笑い声のレベルの合計値（Ｘａ）と、全通話時間の中で笑いを検出した合計時間（Ｙａ）とを求め、前記グラフと照らし合わせる。 In step S3, a graph (referred to as an emotional degree determination graph) as shown on the right of the figure with the level of laughter as the X axis and the time when laughter is detected as the Y axis is provided in advance, and the call extracted in step S2 The total value (Xa) of the level of laughter for each unit time and the total time (Ya) at which laughter was detected in the total call time are obtained and compared with the graph.

最後に、ステップＳ４では、前記グラフ上で，Ｘ軸とＹ軸の交わった点（Ｘａ，Ｙａ）に当て嵌った笑い度（例えば、０（標準），１，２，３，４の５段階）に対応する図示するようなＵＩ（User Interface)を通話相手に対応付けて電話帳に格納する。 Finally, in step S4, the degree of laughter (for example, 0 (standard), 1, 2, 3, 4) applied to the point (Xa, Ya) where the X axis and the Y axis intersect on the graph. A UI (User Interface) corresponding to () is associated with the other party and stored in the telephone directory.

このようにすることで、現状では相手の情報を登録する役割しか担っていない電話帳を、相手とコミュニケーションを取る際の入り口として、その相手とどれくらい連絡を取り合っているのかを可視化し、端末の通話機能を拡大させることができる。例えば、笑った時間が長かった通話相手の電話帳の表示を参照することで、その相手と楽しく話しあったことが思いだされる。また、前回の通話で相手を怒らせてしまったような場合には、次にコミュニケーションを取る際の注意喚起にもなる。 In this way, the phone book, which currently has only the role of registering the other party's information, can be used as a gateway to communicate with the other party and visualize how much contact with the other party is possible. The call function can be expanded. For example, by referring to the display of the phone book of the other party who has laughed for a long time, it is remembered that he had a pleasant conversation with the other party. Also, if you anger the other party in the previous call, it will also alert you when you next communicate.

（実施形態の構成）
図２は、本発明の実施の形態に係る音声通話端末のハードウェア構成を示す図である。図示するように、本実施形態に係る音声通話端末１０は、携帯電話又はスマートフォンを例示している。音声通話端末１０は、ＣＰＵ１２（Central Processing Unit)と記憶部１３（メモリ）を含む制御部１１と、表示画面を備えた表示部１４（ディスプレイ）と、ユーザのキー操作を受け付けるキー操作部１５と、無線通信を制御する無線通信部１６、マイクやスピーカを含む音声入出力部１７とを、含んで構成される。その他、撮像部１８（カメラ）や各種センサ１９を備えてもよい。 (Configuration of the embodiment)
FIG. 2 is a diagram showing a hardware configuration of the voice call terminal according to the embodiment of the present invention. As illustrated, the voice call terminal 10 according to the present embodiment exemplifies a mobile phone or a smartphone. The voice call terminal 10 includes a control unit 11 including a CPU 12 (Central Processing Unit) and a storage unit 13 (memory), a display unit 14 (display) having a display screen, and a key operation unit 15 that receives user key operations. A wireless communication unit 16 that controls wireless communication, and an audio input / output unit 17 including a microphone and a speaker. In addition, an imaging unit 18 (camera) and various sensors 19 may be provided.

制御部１１は、音声通話端末１０としての全体的な動作を統括的に制御する。制御部１１は、記憶部１３に格納されたプログラム（ＯＳ、アプリケーションプログラムなど）に基づいて処理を実行するコンピュータ（マイクロプロセッサ）を備えており、このプログラムにおいて指示された手順に従って処理を行う。 The control unit 11 controls the overall operation of the voice call terminal 10 as a whole. The control unit 11 includes a computer (microprocessor) that executes processing based on a program (OS, application program, etc.) stored in the storage unit 13, and performs processing according to a procedure instructed in this program.

記憶部１３には、ＣＰＵ１２において処理に利用される各種データが保持される。具体的に、ＣＰＵ１２によって実行されるプログラムの他、プログラムの処理過程で利用される一時的なデータなど各種データが保持される。記憶部１３は、例えば不揮発性の記憶装置（ＳＳＤ（Solid State Drive）やＳＤメモリカード（Secure Digital memory card）などの不揮発性半導体メモリ）や非不揮発性のランダムアクセス可能な記憶装置（例えば、ＳＲＡＭ、ＤＲＡＭ）などによって構成される。本実施形態に係る制御部１１の具体的な動作については、後述する。 Various data used for processing in the CPU 12 is stored in the storage unit 13. Specifically, in addition to the program executed by the CPU 12, various data such as temporary data used in the process of the program are held. The storage unit 13 is, for example, a nonvolatile storage device (a nonvolatile semiconductor memory such as an SSD (Solid State Drive) or an SD memory card (Secure Digital memory card)) or a non-volatile random accessible storage device (for example, an SRAM). , DRAM). Specific operations of the control unit 11 according to the present embodiment will be described later.

表示部１４は、液晶パネルや有機ＥＬ（Electro-Luminescence）などの表示装置を用いて構成され、制御部１１により生成される映像信号に応じた画像を表示する。なお、上記したキー操作部１５と表示部１４は、操作部と表示部とが一体形成されたタッチパネルで構成してもよい。 The display unit 14 is configured using a display device such as a liquid crystal panel or an organic EL (Electro-Luminescence), and displays an image corresponding to the video signal generated by the control unit 11. The key operation unit 15 and the display unit 14 described above may be configured by a touch panel in which the operation unit and the display unit are integrally formed.

キー操作部１５は、例えば、電源キー、通話キー（発呼、着呼）、ボリュームキー、数字キー、文字キー、選択キー、決定キーなど、各種の機能が割り当てられたキー、あるいはボタンを有している。例えば、数字キーがユーザによって操作された場合に、その操作内容に対応する信号を生成して制御部１１に入力する。なお、キー操作部１５は、一部又は全部をタッチセンサにより構成されてもよく、この場合、タッチ操作に応じた信号が生成され、制御部１１に入力される。 The key operation unit 15 includes keys or buttons to which various functions are assigned, such as a power key, a call key (calling / calling), a volume key, a numeric key, a character key, a selection key, and a determination key. doing. For example, when a numeric key is operated by a user, a signal corresponding to the operation content is generated and input to the control unit 11. The key operation unit 15 may be partially or entirely configured by a touch sensor. In this case, a signal corresponding to the touch operation is generated and input to the control unit 11.

無線通信部１６は、音声通話網、データ通信網の通信ネットワークに接続される図示せぬ基地局との間で無線通信を行う。例えば、制御部１１から供給される送信データに所定の変調処理を施して無線信号に変換し、アンテナを介して送出する。また、アンテナにおいて受信される基地局からの無線信号に所定の復調処理を施して受信データに変換し、制御部１１に出力する。 The wireless communication unit 16 performs wireless communication with a base station (not shown) connected to a communication network such as a voice communication network or a data communication network. For example, the transmission data supplied from the control unit 11 is subjected to a predetermined modulation process to be converted into a radio signal and transmitted via an antenna. In addition, a predetermined demodulation process is performed on a radio signal from the base station received by the antenna to convert it into reception data, which is output to the control unit 11.

音声入出力部１７は、主に音声通話、音響録音、音響再生において使用され、マイクを介して入力される音声信号や、スピーカから出力される音声信号の入出力処理を行う。すなわち、マイクから入力される音声を増幅し、アナログ−デジタル変換を行い、更に符号
化などの信号処理を施し、デジタルの音声データに変換して制御部１１に出力する。また、制御部１１から出力される音声データに復号化、デジタル−アナログ変換、増幅などの信号処理を施し、アナログの音声信号に変換してスピーカに出力する。また、入力された音声信号を文字テキストに変換する音声認識機能を備えていてもよい。なお、音による通知の代わりに、振動発生部（図示せず）が発生させる筐体の振動でメールや電話の着信などを通知することもできる。 The voice input / output unit 17 is mainly used in voice call, sound recording, and sound reproduction, and performs input / output processing of a voice signal input via a microphone and a voice signal output from a speaker. That is, the sound input from the microphone is amplified, subjected to analog-digital conversion, further subjected to signal processing such as encoding, converted into digital sound data, and output to the control unit 11. In addition, the audio data output from the control unit 11 is subjected to signal processing such as decoding, digital-analog conversion, and amplification, and is converted into an analog audio signal and output to a speaker. Moreover, you may provide the speech recognition function which converts the input audio | voice signal into a character text. Instead of notification by sound, it is also possible to notify an incoming mail or telephone call by vibration of the casing generated by a vibration generating unit (not shown).

撮像部１８（カメラ）は、写真やビデオの撮影の他、ＯＣＲやバーコードを撮像することで、キー操作や音声認識以外での文字入力や、各種画像を入力することができる。また、各種センサ１９としては、例えば、加速度センサ、近接センサ、温度センサ、生体センサ（発汗センサ、脈拍センサ等）、方位センサなどを備えていてもよい。 The imaging unit 18 (camera) can input characters other than key operations and voice recognition and various images by taking pictures of OCR and barcode as well as taking pictures and videos. The various sensors 19 may include, for example, an acceleration sensor, a proximity sensor, a temperature sensor, a biological sensor (perspiration sensor, pulse sensor, etc.), a direction sensor, and the like.

なお、上記した音声通話端末１０は、具体的には、ＣＰＵ１２と、記憶部１３（メモリ）を含む周辺ＬＳＩによって構成され、制御中枢となる制御部１１が持つ機能は、ＣＰＵ１２がメモリに記録されたプログラムを逐次読み出し、実行することにより実現される。また、表示部１４と、キー操作部１５と、無線通信部１６と、音声入出力部１７とが持つ機能は、ＣＰＵ１２によりプログラマブルに制御される周辺ＬＳＩによって実現され、図示せぬ入出力ポートを介して制御部１１が持つ機能を実現する。 The voice call terminal 10 described above is specifically composed of a peripheral LSI including a CPU 12 and a storage unit 13 (memory), and the functions of the control unit 11 serving as a control center are recorded in the memory by the CPU 12. This is realized by sequentially reading and executing the program. The functions of the display unit 14, the key operation unit 15, the wireless communication unit 16, and the voice input / output unit 17 are realized by a peripheral LSI that is programmably controlled by the CPU 12, and an input / output port (not shown) is provided. The functions of the control unit 11 are realized.

（実施形態の動作） (Operation of the embodiment)

以下、図３と図４を用いて、本実施形態の携帯端末装置の動作について詳しく説明する。図３は、本発明の実施の形態に係る音声通話端末の制御部の機能ブロックを示す図である。また、図４は、本発明の実施の形態に係る感情度判定の概念を示す図である。 Hereinafter, the operation of the mobile terminal device of the present embodiment will be described in detail with reference to FIGS. 3 and 4. FIG. 3 is a diagram showing functional blocks of the control unit of the voice call terminal according to the embodiment of the present invention. FIG. 4 is a diagram showing the concept of emotion level determination according to the embodiment of the present invention.

図３で示すように、制御部１１は、後述するように、処理部として、通話記録部１１ａ、状態抽出部１１ｂ、判定部１１ｃ、提示部１１ｄ、画像抽出部１１ｅの各機能ブロックに分けることができる。各機能ブロックは、統合したり、さらに分割したりしてもよい。また、制御部１１に含まれる記憶部１３は、後述するように、通話音声データ１３ａ，感情度判定テーブル１３ｂ、通話記録１３ｃ、感情度別イラスト１３ｄ、及びユーザ写真集１３ｅを格納している。 As shown in FIG. 3, as will be described later, the control unit 11 is divided into functional blocks of a call recording unit 11a, a state extraction unit 11b, a determination unit 11c, a presentation unit 11d, and an image extraction unit 11e as processing units. Can do. Each functional block may be integrated or further divided. Further, the storage unit 13 included in the control unit 11 stores call voice data 13a, an emotion level determination table 13b, a call record 13c, an emotion level illustration 13d, and a user photo collection 13e, as will be described later.

通話記録部１１ａは、音声入出力部１７のマイクと無線通信部１６からの音声入力を受けて、音声通話端末１０の所持者である本人（自分）と通話相手との音声通話を録音し、記憶部１３に通話音声データ１３ａとして記録する。このとき、各種センサ２０のうち、発汗センサ、脈拍センサなどの通話者の生体情報を検知するセンサが装備されている場合は、通話時間中、その生体情報を別途記録するようにしてもよい。なお、通話音声データ１３ａには、通話全体を記録してもよいし、無音部を除いて記録してもよい。 The call recording unit 11a receives a voice input from the microphone of the voice input / output unit 17 and the wireless communication unit 16, and records a voice call between the person who is the owner of the voice call terminal 10 (self) and the other party, Recorded in the storage unit 13 as call voice data 13a. At this time, in the case where various sensors 20 are equipped with sensors that detect the caller's biological information, such as a sweat sensor and a pulse sensor, the biological information may be separately recorded during the call time. In the call voice data 13a, the entire call may be recorded or may be recorded except for the silent part.

状態抽出部１１ｂは、記録された通話音声データ１３ａを分析し、まず、自分と通話相手との発話部分を分離する。そして、発話者毎に通話中に発した音声レベル（音の高さ、大きさ）から、通話中における特定の感情状態を検出し、その特定の感情状態の開始時点と終了時点を記録する。感情状態の検出には、生体センサからの情報やユーザプロファイルの情報（性別、年代等）が得られる場合は、それらの情報を利用してもよい。特定の感情状態とは、通話者が平常な感情にある状態の他、喜怒哀楽のいずれか１つに選択された感情の状態をいう。この感情状態の検出は、概念的には図４（ａ）に示すが、公知の技術を利用してもよい。ただし、図中の符号ＡとＢは、特定の感情状態（この例では、楽（笑い）の状態）の開始時点と終了時点を示し、符号Ｃはその感情状態が持続した時間を示している。なお、感情状態の抽出には、前述した生体センサからの情報を参照するようにしてもよい。 The state extraction unit 11b analyzes the recorded call voice data 13a and first separates the utterance portion between the user and the call partner. Then, a specific emotional state during the call is detected from the voice level (pitch, loudness) uttered during the call for each speaker, and the start time and end time of the specific emotional state are recorded. In the detection of the emotional state, when information from a biosensor or user profile information (gender, age, etc.) is obtained, such information may be used. The specific emotional state refers to the emotional state selected as one of the emotional state as well as the state where the caller is in a normal emotional state. Although this emotional state detection is conceptually shown in FIG. 4A, a known technique may be used. However, reference signs A and B in the figure indicate the start time and end time of a specific emotional state (in this example, a state of comfort (laughing)), and reference character C indicates the time that the emotional state lasted. . In addition, you may make it refer the information from the biometric sensor mentioned above for extraction of an emotional state.

状態抽出部１１ｂの抽出処理が終わると、判定部１１ｃが呼び出され、判定部１１ｃは、抽出された特定の感情状態が持続していた時間を求め、当該特定の感情状態が持続していた時間における音声レベル（声の高さ、大きさ）を単位時間毎に検出する。そして、通話全体にわたって、特定の感情が持続していた時間の合計（Ｘａ）、及びその間の単位時間毎の音声レベルの合計値（Ｙａ）を求める。そして、求めた合計値（Ｘａ、Ｙａ）を感情度判定テーブル１３ｂを参照して、当該感情の高まりの度合いである感情度を通話終了後に判定する。判定部１１ｃで判定された感情度は、通話相手情報（通話相手の識別情報）に対応付けて記憶部に記憶される。感情度は、例えば、平常な状態（標準）を感情度０とし、特定の感情の高まりに応じて、感情度１〜４などの段階値（レベル）で判定する。また、感情度判定テーブル１３ｂとは、図１で説明した感情度判定グラフ（図４（ｃ）に再掲）をコンピュータが処理しやすいようにテーブル形式で表したものである（図４（ｂ）参照）。ただし、図４（ｃ）のグラフ中の曲線は、単純化のため半径ｒ_ｎ（ｎ＝０〜４）の真円弧としているが、一般的には任意の直線や曲線の組み合わせであってよい。 When the extraction process of the state extraction unit 11b is completed, the determination unit 11c is called, and the determination unit 11c obtains the time during which the extracted specific emotional state has been maintained, and the time during which the specific emotional state has been maintained. The voice level (pitch, loudness) is detected every unit time. Then, the total time (Xa) during which the specific emotion has been maintained over the entire call and the total value (Ya) of the voice level per unit time during that time are obtained. Then, the obtained total value (Xa, Ya) is determined with reference to the emotion level determination table 13b, and the emotion level, which is the degree of increase in the emotion, is determined after the call ends. The emotion level determined by the determination unit 11c is stored in the storage unit in association with the call partner information (call partner identification information). The emotion level is determined by a step value (level) such as an emotion level of 1 to 4 according to the rise of a specific emotion, with a normal state (standard) as an emotion level of 0, for example. The emotion level determination table 13b represents the emotion level determination graph described in FIG. 1 (reprinted in FIG. 4C) in a table format so that the computer can easily process it (FIG. 4B). reference). However, the curve in the graph of FIG. 4C is a true arc with a radius r _n (n = 0 to 4) for simplification, but in general, it may be an arbitrary straight line or combination of curves. .

提示部１１ｄは、感情度の判定結果を電話帳などのアプリケーションに対して、ＵＩ（ユーザインターフェース）として提供する役目を果たす。ＵＩの提供とは、例えば電話帳アプリケーションの場合、感情度の判定結果に応じて、電話番号等の通話相手に関する情報を含む所定画面中の表示態様を提示することをいう。表示態様を提示するとは、例えば、効果的な画像を追加したり、電話番号や相手の名称等の文字の大きさ、フォント、文字装飾（斜体、ボールド、下線等）を変化させたり、文字や背景の色を変化させたりして、アプリケーションの画面に表示させることをいう。 The presentation unit 11d serves to provide the emotion level determination result as a UI (user interface) to an application such as a telephone directory. For example, in the case of a phone book application, the provision of a UI refers to presenting a display mode on a predetermined screen including information on a call partner such as a telephone number in accordance with an emotion level determination result. Presenting the display mode includes, for example, adding an effective image, changing the character size such as a phone number and the name of the other party, font, character decoration (italic, bold, underline, etc.) It means changing the background color and displaying it on the application screen.

提示部１１ｄは、ＡＰＩ（Application User Interface)を備えており、各アプリケ
ーションから呼び出すことができる。また、提示部１１ｄは、通話記録１３ｃに感情度が記録されていれば、最新の通話記録の感情度又は一定期間の感情度の平均値に応じて、予め用意された感情度別イラスト１３ｄから、その記録された感情度に対応するイラストの画像を抽出する。あるいは、イラストの画像の代わりに、記憶部（端末内部の記憶部１３又は外部のサーバのデータベースを併用してもよい）に格納され、通話相手が写ったユーザ写真集１３ｅから、通話記録１３ｃに記録された感情度に対応する通話相手の顔写真を抽出するようにしてもよい。抽出されたイラスト、顔写真は、提示部１１ｄ又は提示部１１ｄを呼び出したアプリケーションによって、音声通話端末１０の表示部１４（ディスプレイ画面）に表示される。このようにすることで、通話中に得られた音声データを通話中だけでなく、他のアプリケーションにも適用できるので、コミュニケーション向上により効果的となる。 The presentation unit 11d includes an API (Application User Interface) and can be called from each application. In addition, if the emotion level is recorded in the call record 13c, the presenting unit 11d uses the emotion level illustration 13d prepared in advance according to the emotion level of the latest call record or the average value of the emotion level for a certain period. Then, an illustration image corresponding to the recorded emotion level is extracted. Alternatively, instead of the illustration image, it is stored in a storage unit (the storage unit 13 inside the terminal or an external server database may be used in combination), and from the user photo album 13e in which the other party is shown, to the call record 13c. You may make it extract the other party's face picture corresponding to the recorded feeling degree. The extracted illustration and face photograph are displayed on the display unit 14 (display screen) of the voice call terminal 10 by the presentation unit 11d or an application that calls the presentation unit 11d. In this way, the voice data obtained during the call can be applied not only to the call but also to other applications, which is more effective for improving communication.

図５は、本発明の実施の形態に係る電話帳アプリケーションの表示画面を示す図である。図５（ａ）には、通話記録に記録された笑い度（感情度）に応じて、予め用意されたイラストの画像を段階的に変化させ、さらに電話番号の文字の大きさや背景色を段階的に変化させる画面の一例を示している。また、図５（ｃ）には、通話相手の顔写真が笑い度に応じてユーザ写真集から抽出されて、イラスト画像の代わりに表示されている画面の一例を示している。顔写真は、通話相手の単独の写真でなくともよく、複数の人物の中に通話相手が映っているものでもよい。本人と通話相手のツーショットが写真があればベストである。もちろん、怒った顔など、通話相手の感情度に対応する画像が写真集にない場合は、イラストの画像が使われる。 FIG. 5 is a diagram showing a display screen of the telephone directory application according to the embodiment of the present invention. In FIG. 5 (a), the illustration image prepared in advance is changed step by step according to the laughter level (feeling level) recorded in the call record, and the phone number character size and background color are changed step by step. 2 shows an example of a screen that is changed dynamically. FIG. 5C shows an example of a screen in which the face photo of the other party is extracted from the user photo collection according to the degree of laughter and displayed instead of the illustration image. The face photograph does not have to be a single photograph of the other party, but may be one in which the other party is reflected in a plurality of persons. It ’s best if there ’s a picture of the person and the other party ’s two-shot. Of course, if there is no image in the photobook that corresponds to the emotion level of the other party, such as an angry face, an illustration is used.

図６は、本発明の実施の形態に係る他のアプリケーションの表示画面を示す図である。この図では電話帳以外のアプリケーションとして、図６（ａ）に通話履歴（実際には通話しなかった発信履歴、着信履歴を含む）と、図６（ｂ）にグループ会話で用いられるようなコンタクトリストの表示画面の例を示している。コミュニケーションリストでは、一対
一の通話だけでなく、グループで会話した際のメンバ全員の感情度を表示するようにしてもよい。 FIG. 6 is a diagram showing a display screen of another application according to the embodiment of the present invention. In this figure, as applications other than the phone book, FIG. 6A shows a call history (including an outgoing call history and an incoming call history that were not actually called), and FIG. 6B shows a contact used in a group conversation. An example of a list display screen is shown. In the communication list, not only the one-to-one call but also the emotion level of all members when talking in a group may be displayed.

（実施形態の効果）
以上の説明のように、現状では相手の情報を登録する役割しか担っていない電話帳を、相手とコミュニケーションを取る際に、その通話相手とどれくらい連絡を取り合っているのかを視覚化することができる。例えば、笑った時間が長かった相手の電話帳の表示態様が変化することで、その相手と楽しく話したということが視覚化される。また、前回の通話で相手を怒らせてしまったような場合、次にコミュニケーションを取る際に配慮することを忘れないようにするなど注意喚起にも役立つ。 (Effect of embodiment)
As explained above, when communicating with the other party, you can visualize how much contact you have with the other party when communicating with the other party. . For example, by changing the display mode of the phone book of the other party who has laughed for a long time, it is visualized that he / she talked happily with the other party. Also, if you have angered the other party in the previous call, it will help to alert you to remember to consider the next communication.

また、得られた音声データを通話中だけでなく、通話終了後に他のアプリケーション（電話帳以外には、通話履歴、コンタクトリストなど）の使用場面にも利用できるので、コミュニケーション向上により効果的である。すなわち、通話中のコミュニケーションの結果が様々なアプリケーションの機能のＵＩに反映されることにより、ユーザにとってさらにその機能を使う機会が増え、コミュニケーションが増々深まることが期待される。 In addition, the obtained voice data can be used not only during a call but also when using other applications (call history, contact list, etc. other than the phone book) after the call ends, which is more effective for improving communication. . That is, by reflecting the result of communication during a call in the UI of various application functions, it is expected that the user will have more opportunities to use the functions and the communication will deepen.

また、図２の本発明の実施形態のハードウェア構成の説明では、音声通話端末として、携帯電話やスマートフォンを例示したが、必ずしも携帯型の端末でなくともよく、音声電話機能や電話帳機能を備えたＰＣ（Personal Computer)、タブレット端末、ＰＤＡ（Personal Digital Assistants)等であってもよい。 In the description of the hardware configuration of the embodiment of the present invention in FIG. 2, a mobile phone or a smartphone is exemplified as the voice call terminal. However, the voice call function and the phone book function are not necessarily required to be a portable terminal. It may be a PC (Personal Computer), a tablet terminal, a PDA (Personal Digital Assistants), or the like.

なお、本発明はテレビ電話に適用してもよい。具体的には、上記実施形態における特定の感情状態が持続していた時間に代えて、抽出部が撮像部により取得されたテレビ電話中の通話者の画像に基づき、笑顔が持続していた時間を抽出する。そして、抽出部が抽出した笑顔が持続していた時間、および、当該時間における笑顔度のレベルに応じて、判定部は前記通話者の笑顔度を判定する。さらに、提示部は、判定部が判定した笑顔度に基づいて、通話者の情報を含む所定の画面の表示態様を変化させて提示する。
なお、テレビ電話中の音声と映像は記憶部に記録されてもよいし、記憶部に記録されずにテレビ電話中に判定された笑顔度と、笑顔が持続していた時間のみを記録することとしてもよい。
上記態様によれば、テレビ電話中のデータを、通話終了後、他者とのコミュニケーション状況の確認に役立てることができる効果を奏する。 The present invention may be applied to a videophone. Specifically, instead of the time during which the specific emotional state has been maintained in the above embodiment, the time when the smile has been maintained based on the image of the caller during the videophone call obtained by the image pickup unit by the extraction unit To extract. Then, the determination unit determines the smile level of the caller according to the time when the smile extracted by the extraction unit is maintained and the level of smile level at the time. Further, the presenting unit changes and presents a display mode of a predetermined screen including information on the caller based on the smile level determined by the determining unit.
Note that the audio and video during a videophone call may be recorded in the storage unit, or only the degree of smile determined during the videophone call and not recorded in the storage unit, and only the time during which the smile was maintained It is good.
According to the said aspect, there exists an effect which can be used for the confirmation of the communication condition with others after the telephone call of the data during a videophone call.

以上、本発明の好ましい実施形態について詳述したが、本発明の技術的範囲は上記実施形態に記載の範囲には限定されないことは言うまでもない。上記実施形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。またその様な変更、又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although preferred embodiment of this invention was explained in full detail, it cannot be overemphasized that the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above embodiment. Further, it is apparent from the description of the scope of claims that embodiments with such changes or improvements can also be included in the technical scope of the present invention.

１０…音声通話端末、１１…制御部、１１ａ…通話記録部、１１ｂ…状態抽出部、１１ｃ…判定部、１１ｄ…提示部、１１ｅ…画像抽出部１１ｅ、１２…ＣＰＵ、１３…記憶部、１３ａ…通話音声データ、１３ｂ…感情度判定テーブル、１３ｃ…通話記録、１３ｄ…感情度別イラスト、１３ｅ…ユーザ写真集、１４…表示部、１５…キー操作部、１６…無線通信部、１７…音声出力部、１８…撮像部、１９…各種センサ DESCRIPTION OF SYMBOLS 10 ... Voice call terminal, 11 ... Control part, 11a ... Call recording part, 11b ... State extraction part, 11c ... Determination part, 11d ... Presentation part, 11e ... Image extraction part 11e, 12 ... CPU, 13 ... Storage part, 13a Call voice data, 13b ... Emotion level determination table, 13c ... Call record, 13d ... Illustration according to emotion level, 13e ... User photo book, 14 ... Display unit, 15 ... Key operation unit, 16 ... Wireless communication unit, 17 ... Voice Output unit, 18 ... imaging unit, 19 ... various sensors

Claims

A call recording unit for recording caller's call voice;
A state extraction unit that extracts a time during which the caller is in a specific emotion state from the call voice of the caller recorded in the call recording unit;
A determination unit that determines the emotion level of the caller based on the time in the specific emotional state and the voice level during the time in the emotional state;
A presenting unit that changes and presents a display mode of a predetermined screen including information on the other party of the call after the call ends according to the emotion level determined by the determination unit;
A terminal device comprising:

The terminal device according to claim 1, wherein the voice level is a voice level and a voice level that the caller makes during a call.

The terminal device according to claim 1, wherein the emotion level is a laughter level.

4. The terminal device according to claim 1, wherein the predetermined screen includes at least one of a phone book, a call history, a call history, a call history, and a contact list display screen. .

A storage unit for storing images;
The presenting unit includes an image extracting unit that extracts an image corresponding to the emotion level from the storage unit based on the determination result of the emotion level and displays the image on the predetermined screen. The terminal device according to any one of 4.

A call recording step for recording a caller's call voice;
A state extracting step of extracting a time during which the caller is in a specific emotion state from the call voice of the caller recorded in the call recording step;
A determination step of determining the emotion level of the caller after the call based on the time in the specific emotional state and the voice level during the time in the emotional state;
In accordance with the degree of emotion determined in the determination step, after the call is ended, a presentation step of changing and presenting a display mode of a predetermined screen including information about the call partner;
A call data processing method characterized by comprising: