JP4957119B2

JP4957119B2 - Information processing device

Info

Publication number: JP4957119B2
Application number: JP2006219778A
Authority: JP
Inventors: 宴克辰巳
Original assignee: Fujitsu Toshiba Mobile Communication Ltd
Current assignee: Fujitsu Mobile Communications Ltd
Priority date: 2006-08-11
Filing date: 2006-08-11
Publication date: 2012-06-20
Anticipated expiration: 2026-08-11
Also published as: JP2008048030A

Description

本発明は情報処理装置に係り、特に、テレビ電話を行うことができるようにした情報処理装置に関する。 The present invention relates to an information processing apparatus, and more particularly, to an information processing apparatus capable of making a videophone call.

近年、大容量の画像データや音声データを、インターネットを介してユーザ同士の間でやり取りするテレビ電話システムが知られている。 2. Description of the Related Art In recent years, videophone systems that exchange large volumes of image data and audio data between users via the Internet are known.

例えばユーザＡとユーザＢとの間でテレビ電話システムを利用する場合、ユーザＡのコンピュータＡでは、ユーザＢのコンピュータＢから送信されてきたユーザＢの画像が表示され、音声が出力される。また、ユーザＢのコンピュータＢでは、ユーザＡのコンピュータＡから送信されてきたユーザＡの画像が表示され、音声が出力される。これにより、ユーザＡとユーザＢとの間で、画像と音声によるコミュニケーションを図ることができる。なお、このテレビ電話システムは、１対１のみならず、１対複数にも適用することが可能である。 For example, when the video phone system is used between the user A and the user B, the user A's computer A displays the image of the user B transmitted from the user B's computer B and outputs sound. Further, the computer B of the user B displays the image of the user A transmitted from the computer A of the user A and outputs a sound. Thereby, the communication by an image and an audio | voice can be aimed between the user A and the user B. FIG. This videophone system can be applied not only to one-to-one but also to one-to-one.

また、画像と音声によるコミュニケーションを図る方法として、テレビ電話システム以外にも、テレビ会議システムなどが知られている。 In addition to the video phone system, a video conference system and the like are known as a method for communication using images and sounds.

１対複数で画像と音声によるコミュニケーションを図るテレビ会議システムにおいては、端末からのコマンドに応じて、予め蓄積された映像資料を読み出し、各端末からの映像とともに１つの映像に合成することにより、端末からの映像とともに、蓄積された資料映像や過去の映像などを所望の形態で表示利用したり、会議履歴を蓄積して所望の形態で表示し閲覧したりすることができる技術が提案されている（例えば、特許文献１参照）。 In a video conference system that performs one-to-multiple video and audio communication, in response to a command from the terminal, the video material stored in advance is read out and combined with the video from each terminal into a single video. A technology has been proposed that can be used to display and use the stored document video and past video in a desired format as well as the video from the video, or to accumulate and display the conference history in a desired format. (For example, refer to Patent Document 1).

一方、携帯電話機においても、近年、基地局などを介して無線通信により携帯電話機間でテレビ電話を行う技術が提案されている。特に、携帯電話機間においては、現在、１対１のテレビ電話が実用化されている。
特開２００１−３１３９１５号公報 On the other hand, in recent years, a technique for making a videophone call between mobile phones by wireless communication via a base station or the like has also been proposed for mobile phones. In particular, between mobile phones, a one-to-one videophone is currently in practical use.
JP 2001-313915 A

しかしながら、携帯電話機において１対複数のテレビ電話を行う場合（すなわち、多地点間でのテレビ電話を行う場合）、携帯電話機に設けられた表示部の大きさは限定されているため、通話の相手の顔を表示部にすべて表示しようとすると、１人当たりに割り当てられる表示面積が小さくなってしまい、通話の相手の顔を表示しても誰であるかを判別することが困難であるという課題があった。 However, when making a one-to-multiple videophone call on a mobile phone (that is, when making a videophone call between multiple points), the size of the display unit provided on the mobile phone is limited. If all the faces are displayed on the display unit, the display area allocated per person becomes small, and it is difficult to determine who the person is even if the face of the other party is displayed. there were.

このような課題は、特許文献１に提案されている技術によっても解決することはできない。 Such a problem cannot be solved even by the technique proposed in Patent Document 1.

本発明は、このような状況に鑑みてなされてものであり、複数のユーザが多地点間でテレビ電話を行う場合において、通話状況に応じて表示部に表示されるユーザの画像の配置を好適に制御することができる情報処理装置を提供することを目的とする。 The present invention is made in view of such a situation, and when a plurality of users make a videophone call between multiple points, it is preferable to arrange a user image displayed on the display unit according to a call situation. It is an object of the present invention to provide an information processing apparatus that can be controlled automatically.

本発明の情報処理装置は、上述した課題を解決するために、複数の情報処理装置から画像信号および音声信号をそれぞれ取得し、取得された音声信号に基づいて音声認識処理を行う音声認識手段と、複数の画像信号に基づく画像を表示する表示手段と、音声認識手段により音声認識された発言に基づいて、名前を呼ばれる回数の解析を行い、その解析結果に基づいて、表示手段により表示される複数の画像信号に基づく画像の配置に関する優先度を算出する解析手段と、解析手段により算出された優先度に応じて、表示手段により表示される複数の画像信号に基づく画像の配置を制御する制御手段とを備えることを特徴とする。 In order to solve the above-described problem, the information processing apparatus according to the present invention acquires an image signal and a sound signal from a plurality of information processing apparatuses, and performs speech recognition processing based on the acquired sound signal. The display means for displaying an image based on a plurality of image signals, and the number of times the name is called is analyzed based on the speech recognized by the voice recognition means, and the display means displays the result based on the analysis result. Analyzing means for calculating priority for image arrangement based on a plurality of image signals, and control for controlling arrangement of images based on the plurality of image signals displayed by the display means in accordance with the priority calculated by the analyzing means Means.

本発明の情報処理装置においては、複数の情報処理装置から画像信号および音声信号がそれぞれ取得され、取得された音声信号に基づいて音声認識処理が行われ、複数の画像信号に基づく画像が表示され、音声認識された発言に基づいて、名前を呼ばれる回数の解析が行われ、その解析結果に基づいて、表示される複数の画像信号に基づく画像の配置に関する優先度が算出され、算出された優先度に応じて、表示される複数の画像信号に基づく画像の配置が制御される。 In the information processing apparatus of the present invention, image signals and audio signals are respectively acquired from the plurality of information processing apparatuses, and voice recognition processing is performed based on the acquired audio signals, and images based on the plurality of image signals are displayed. Based on the speech-recognized utterance, the number of times the name is called is analyzed, and based on the analysis result, the priority for the arrangement of the images based on the plurality of displayed image signals is calculated, and the calculated priority The arrangement of images based on a plurality of displayed image signals is controlled according to the degree.

本発明によれば、複数のユーザが多地点間でテレビ電話を行う場合において、通話状況に応じて表示部に表示されるユーザの画像の配置を好適に制御することができる。 ADVANTAGE OF THE INVENTION According to this invention, when a some user makes a videophone between many points, arrangement | positioning of the user's image displayed on a display part according to a telephone call condition can be controlled suitably.

以下、本発明の実施の形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明に係る情報処理装置として適用可能な携帯電話機１の内部の構成を表している。 FIG. 1 shows an internal configuration of a mobile phone 1 applicable as an information processing apparatus according to the present invention.

図１に示されるように、携帯電話機１は、携帯電話機１の各部を統括的に制御する主制御部２１に対して、電源回路部２２、操作入力制御部２３、画像エンコーダ２４、カメラインタフェース部２５、ＬＣＤ（Liquid Crystal Display）制御部２６、多重分離部２８、変復調回路部２９、音声コーデック３０、記憶部３７、および音楽制御部３８がメインバス３１を介して互いに接続されるとともに、画像エンコーダ２４、画像デコーダ２７、多重分離部２８、変復調回路部２９、音声コーデック３０、および記録再生部３５が同期バス３２を介して互いに接続されて構成される。 As shown in FIG. 1, the mobile phone 1 has a power supply circuit unit 22, an operation input control unit 23, an image encoder 24, and a camera interface unit with respect to a main control unit 21 that comprehensively controls each unit of the mobile phone 1. 25, an LCD (Liquid Crystal Display) control unit 26, a demultiplexing unit 28, a modulation / demodulation circuit unit 29, an audio codec 30, a storage unit 37, and a music control unit 38 are connected to each other via a main bus 31, and an image encoder 24, an image decoder 27, a demultiplexing unit 28, a modulation / demodulation circuit unit 29, an audio codec 30, and a recording / reproducing unit 35 are connected to each other via a synchronization bus 32.

電源回路部２２は、ユーザの操作により終話・電源キーがオン状態にされると、バッテリパックから各部に対して電力を供給することにより携帯電話機１を動作可能な状態に起動する。 When the call termination / power key is turned on by the user's operation, the power supply circuit unit 22 activates the mobile phone 1 by supplying power from the battery pack to each unit.

主制御部２１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、およびＲＡＭ（Random Access Memory）などからなり、ＣＰＵは、ＲＯＭに記憶されているプログラムまたは記憶部３７からＲＡＭにロードされた各種のアプリケーションプログラムに従って各種の処理を実行するとともに、種々の制御信号を生成し、各部に供給することにより携帯電話機１を統括的に制御する。ＲＡＭは、ＣＰＵが各種の処理を実行する上において必要なデータなどを適宜記憶する。 The main control unit 21 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The CPU is loaded into the RAM from a program stored in the ROM or the storage unit 37. In addition to executing various processes in accordance with various application programs, the mobile phone 1 is comprehensively controlled by generating various control signals and supplying them to the respective units. The RAM appropriately stores data necessary for the CPU to execute various processes.

なお、主制御部２１には、現在の日付と時刻を計測するタイマが内蔵されている。 The main control unit 21 has a built-in timer that measures the current date and time.

ここで、ＣＰＵが実行する種々のアプリケーションプログラムは、ＲＯＭや記憶部３７に予めインストールしておくことができる。また、ＣＰＵが実行する種々のアプリケーションプログラムは、図示せぬ基地局を介して通信によって携帯電話機１にダウンロードすることにより、記憶部３７にインストールすることができる。さらに、ＣＰＵが実行する種々のアプリケーションプログラムは、メモリカード３６に記録しておき、記録再生部３５によって読み出して、記憶部３７にインストールすることも可能である。 Here, various application programs executed by the CPU can be installed in the ROM or the storage unit 37 in advance. Various application programs executed by the CPU can be installed in the storage unit 37 by downloading to the mobile phone 1 by communication via a base station (not shown). Furthermore, various application programs executed by the CPU can be recorded in the memory card 36, read out by the recording / playback unit 35, and installed in the storage unit 37.

携帯電話機１は、主制御部２１の制御に基づいて、音声通話モード時にマイクロフォン１５で集音した音声信号を音声コーデック３０によってディジタル音声信号に変換、圧縮し、これを変復調回路部２９でスペクトラム拡散処理し、送受信回路部３３でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナ３４を介して送信する。 Based on the control of the main control unit 21, the cellular phone 1 converts the voice signal collected by the microphone 15 in the voice call mode into a digital voice signal by the voice codec 30, compresses it, and spreads the spectrum by the modulation / demodulation circuit unit 29. Then, after the digital / analog conversion process and the frequency conversion process are performed by the transmission / reception circuit unit 33, the signal is transmitted via the antenna 34.

また、携帯電話機１は、音声通話モード時にアンテナ３４で受信した受信信号を増幅して周波数変換処理及びアナログディジタル変換処理を施し、変復調回路部２９でスペクトラム逆拡散処理し、音声コーデック３０によって伸張し、アナログ音声信号に変換した後、変換されたアナログ音声信号をスピーカ１６を介して出力する。 Also, the cellular phone 1 amplifies the received signal received by the antenna 34 in the voice call mode, performs frequency conversion processing and analog-digital conversion processing, performs spectrum despreading processing by the modulation / demodulation circuit unit 29, and decompresses it by the voice codec 30. After the conversion to an analog audio signal, the converted analog audio signal is output via the speaker 16.

携帯電話機１は、画像信号を送信しない場合には、ＣＣＤカメラ１２で撮像した画像信号をカメラインタフェース部３５及びＬＣＤ制御部３６を介して液晶ディスプレイ１３に直接表示する。 When the mobile phone 1 does not transmit an image signal, the mobile phone 1 directly displays the image signal captured by the CCD camera 12 on the liquid crystal display 13 via the camera interface unit 35 and the LCD control unit 36.

携帯電話機１は、データ通信モード時（またはテレビ電話を行う際）に画像信号を送信する場合、ＣＣＤカメラ１２で撮像された画像信号をカメラインタフェース部２５を介して画像エンコーダ２４に供給する。 When transmitting an image signal in the data communication mode (or when making a videophone call), the mobile phone 1 supplies the image signal captured by the CCD camera 12 to the image encoder 24 via the camera interface unit 25.

画像エンコーダ２４は、ＣＣＤカメラ１２から供給された画像信号を、例えばＭＰＥＧ（Moving Picture Experts Group）４などの所定の符号化方式によって圧縮符号化することにより符号化画像信号に変換し、変換された符号化画像信号を多重分離部２８に送出する。このとき同時に携帯電話機１は、ＣＣＤカメラ１２で撮像中にマイクロフォン１５で集音した音声を音声コーデック３０を介してディジタルの音声信号として多重分離部２８送出する。 The image encoder 24 converts the image signal supplied from the CCD camera 12 into an encoded image signal by compressing and encoding the image signal using a predetermined encoding method such as MPEG (Moving Picture Experts Group) 4, for example. The encoded image signal is sent to the demultiplexing unit 28. At the same time, the cellular telephone device 1 sends out the sound collected by the microphone 15 during imaging by the CCD camera 12 as a digital audio signal via the audio codec 30.

多重分離部２８は、画像エンコーダ２４から供給された符号化画像信号と音声コーデック３０から供給された音声信号とを所定の方式で多重化し、その結果得られる多重化信号を変復調回路部２９でスペクトラム拡散処理し、送受信回路部３３でディジタルアナログ変換処理及び周波数変換処理を施した後にアンテナ３４を介して送信する。 The demultiplexing unit 28 multiplexes the encoded image signal supplied from the image encoder 24 and the audio signal supplied from the audio codec 30 by a predetermined method, and the resulting multiplexed signal is converted into a spectrum by the modulation / demodulation circuit unit 29. The signal is subjected to spreading processing, subjected to digital / analog conversion processing and frequency conversion processing by the transmission / reception circuit unit 33, and then transmitted through the antenna 34.

また、携帯電話機２は、データ通信モード時（またはテレビ電話を行う際）に動画像ファイルのデータを受信する場合、アンテナ３４を介して基地局（図示せず）から受信した受信信号を変復調回路部２９でスペクトラム逆拡散処理し、その結果得られる多重化信号を多重分離部２８に送出する。 Further, when receiving data of a moving image file in the data communication mode (or when making a videophone call), the mobile phone 2 modulates a demodulation signal received from a base station (not shown) via the antenna 34. The unit 29 performs spectrum despreading processing, and sends the resulting multiplexed signal to the demultiplexing unit 28.

多重分離部２８は、多重化信号を分離することにより符号化画像信号と音声信号とに分け、同期バス３２を介して符号化画像信号を画像デコーダ２７に供給すると共に音声信号を音声コーデック３０に供給する。画像デコーダ２７は、符号化画像信号をＭＰＥＧ４などの所定の符号化方式に対応した復号化方式でデコードすることにより再生動画像信号を生成し、生成された再生動画像信号をＬＣＤ制御部３６を介して液晶ディスプレイ１３に供給する。これにより、例えば動画像ファイルに含まれる動画像データが表示される。 The demultiplexing unit 28 divides the multiplexed signal into an encoded image signal and an audio signal, supplies the encoded image signal to the image decoder 27 via the synchronization bus 32, and supplies the audio signal to the audio codec 30. Supply. The image decoder 27 generates a reproduction moving image signal by decoding the encoded image signal by a decoding method corresponding to a predetermined encoding method such as MPEG4, and sends the generated reproduction moving image signal to the LCD control unit 36. To the liquid crystal display 13. Thereby, for example, moving image data included in the moving image file is displayed.

このとき同時に音声コーデック３０は、音声信号をアナログ音声信号に変換した後、これをスピーカ１６に供給し、これにより、例えば、動画像ファイルに含まる音声信号が再生される。 At the same time, the audio codec 30 converts the audio signal into an analog audio signal, and then supplies the analog audio signal to the speaker 16, thereby reproducing, for example, the audio signal included in the moving image file.

記憶部３７は、例えば、電気的に書換えや消去が可能な不揮発性メモリであるフラッシュメモリ素子などからなり、主制御部２１のＣＰＵにより実行される種々のアプリケーションプログラムや種々のデータ群を格納している。また、記憶部３７は、必要に応じて、ユーザの操作に応じて受信した電子メールや、受信したＷｅｂページなどにリンクされた動画像ファイルに含まれる動画像データなどを記憶する。 The storage unit 37 includes, for example, a flash memory element that is an electrically rewritable and erasable nonvolatile memory, and stores various application programs executed by the CPU of the main control unit 21 and various data groups. ing. In addition, the storage unit 37 stores e-mail received in response to a user operation, moving image data included in a moving image file linked to the received Web page, and the like as necessary.

音楽制御部３８は、記憶部３７に記憶されているオーディオデータの再生動作および一時停止動作や、巻戻し機能、早送り機能、音量ダウン動作、音量アップ動作などの実行を制御する。 The music control unit 38 controls the execution of a reproduction operation and a pause operation of the audio data stored in the storage unit 37, a rewind function, a fast forward function, a volume down operation, a volume up operation, and the like.

図２は、本発明に係る情報処理装置に適用可能な携帯電話機１が実行することが可能な機能的な構成を表している。 FIG. 2 shows a functional configuration that can be executed by the mobile phone 1 applicable to the information processing apparatus according to the present invention.

音声認識機能４１は、例えば図１の主制御部２１などにより実現され、予め設定され所定の時間（例えば、５分間など）内で、音声コーデック３０によって伸張された音声信号（ディジタル信号）を取得し、取得された伸張後の音声信号から無効な音およびノイズを除去するとともに、無効な音およびノイズが除去された後の音声信号に基づいて音声認識処理を実行する。すなわち、音声認識機能４１は、無効な音およびノイズが除去された後の音声信号から特徴量を抽出し、抽出された特徴量に基づいて所定の単語列をテレビ電話の参加者の発言（音声）として選択し、選択されたテレビ電話の参加者の発言（音声）に関するデータである発言データを解析機能４２に逐次供給する。 The voice recognition function 41 is realized by, for example, the main control unit 21 in FIG. 1, and acquires a voice signal (digital signal) expanded by the voice codec 30 within a predetermined time (for example, 5 minutes). Then, the invalid sound and noise are removed from the obtained decompressed voice signal, and voice recognition processing is executed based on the voice signal after the invalid sound and noise are removed. In other words, the voice recognition function 41 extracts a feature amount from the voice signal from which invalid sounds and noises have been removed, and a predetermined word string is uttered by a participant of the videophone based on the extracted feature amount (voice). ), And utterance data, which is data relating to the utterance (voice) of the participant of the selected videophone, is sequentially supplied to the analysis function 42.

解析機能４２は、例えば図１の主制御部２１などにより実現され、音声認識機能４１から供給されたテレビ電話の参加者の発言データを逐次取得し、取得されたテレビ電話の参加者の発言データに基づいて、所定の解析（テレビ電話の参加者の発言回数やキーワードの発言回数などの解析）を行う。解析機能４２は、その解析結果に基づいて、テレビ電話の参加者とその参加者の発言回数などが対応付けられて登録されている発言履歴データベース４３を予め設定された所定の時間ごとに更新するとともに、更新された発言履歴データベース４３を参照して、現在の通話状況に応じた、液晶ディスプレイ１３に表示されるテレビ電話の参加者の画像の配置に関する優先度を算出し、算出された優先度に関するデータである優先度データを画像配置制御機能４４に供給する。なお、発言履歴データベース４３は、例えば図１の記憶部３７などにより実現される。 The analysis function 42 is realized by, for example, the main control unit 21 of FIG. 1, and sequentially acquires the speech data of the videophone participant supplied from the voice recognition function 41, and the acquired speech data of the videophone participant is acquired. Based on the above, a predetermined analysis (analysis of the number of utterances of a videophone participant or the number of utterances of a keyword) is performed. Based on the analysis result, the analysis function 42 updates the utterance history database 43 in which the participant of the videophone and the utterance count of the participant are registered in association with each other at a predetermined time. At the same time, referring to the updated message history database 43, the priority regarding the arrangement of the image of the participant of the videophone displayed on the liquid crystal display 13 according to the current call status is calculated, and the calculated priority Priority data that is data relating to the image is supplied to the image arrangement control function 44. Note that the speech history database 43 is realized by, for example, the storage unit 37 of FIG.

画像配置制御機能４４は、例えば図１の主制御部２１などにより実現され、解析機能４２から供給された優先度データを取得し、取得された優先度データに基づいてテレビ電話を行う際に液晶ディスプレイ１３に表示されるテレビ電話の参加者の画像の配置を制御するための画像配置制御信号を生成し、生成された画像配置制御信号をＬＣＤ制御部２６に供給する。 The image arrangement control function 44 is realized by, for example, the main control unit 21 in FIG. 1 and the like, acquires the priority data supplied from the analysis function 42, and performs a liquid crystal display when performing a videophone call based on the acquired priority data. An image arrangement control signal for controlling the arrangement of the images of the videophone participants displayed on the display 13 is generated, and the generated image arrangement control signal is supplied to the LCD control unit 26.

次に、図３のフローチャートは、図２の携帯電話機１における画像配置制御処理を示している。なお、この画像配置制御処理は、複数のユーザ（例えば、Ａさん、Ｂさん、Ｃさん、およびＤさんからなる４人のユーザ）によりテレビ電話（またはテレビ会議など）を開始するときに同時に並行して開始される。 Next, the flowchart of FIG. 3 shows image arrangement control processing in the mobile phone 1 of FIG. Note that this image arrangement control processing is performed simultaneously when a plurality of users (for example, four users including Mr. A, Mr. B, Mr. C, and Mr. D) start a videophone (or a video conference). And start.

ステップＳ１において、音声認識機能４１は、予め設定された所定の時間（例えば、５分間など）内で、音声コーデック３０によって伸張された音声信号（ディジタル信号）を逐次取得する。なお、例えばＡさん、Ｂさん、Ｃさん、およびＤさんからなる４人のユーザの間でテレビ電話を行う際に、複数の携帯電話機１（他のユーザの携帯電話機１以外にも自分の携帯電話機１も含まれる）から取得される音声信号および画像信号には、少なくとも、どの携帯電話機１からの音声信号または画像信号であるかを示す制御情報（例えば、携帯電話機１の電話番号など）が付加されており、この制御情報に基づいて、テレビ電話を行っているどの携帯電話機１からの音声信号または画像信号であるかを判別することができる。 In step S1, the voice recognition function 41 sequentially acquires voice signals (digital signals) expanded by the voice codec 30 within a predetermined time (for example, 5 minutes) set in advance. For example, when making a videophone call between four users including Mr. A, Mr. B, Mr. C, and Mr. D, a plurality of mobile phones 1 (in addition to the mobile phones 1 of other users) In the audio signal and the image signal acquired from the telephone 1 (including the telephone 1), at least control information (for example, the telephone number of the mobile telephone 1) indicating which mobile telephone 1 is the audio signal or the image signal is included. Based on this control information, it is possible to determine from which mobile phone 1 performing a videophone call an audio signal or an image signal.

ステップＳ２において、音声認識機能４１は、取得された伸張後の音声信号から無効な音およびノイズを除去するとともに、無効な音およびノイズが除去された後の音声信号に基づいて音声認識処理を実行する。すなわち、音声認識機能４１は、無効な音およびノイズが除去された後の音声信号から特徴量を抽出し、抽出された特徴量に基づいて所定の単語列をテレビ電話の参加者（Ａさん、Ｂさん、Ｃさん、およびＤさんのいずれか）の発言（音声）として選択する。 In step S2, the speech recognition function 41 removes invalid sounds and noises from the acquired decompressed speech signals, and executes speech recognition processing based on the speech signals after the invalid sounds and noises are removed. To do. In other words, the voice recognition function 41 extracts a feature amount from the voice signal from which invalid sounds and noises are removed, and a predetermined word string is extracted from the extracted feature amount as a videophone participant (Mr. A, The voice (voice) of any of Mr. B, Mr. C, and Mr. D) is selected.

例えばＢさんからＡさんに「ＸＸですよね、Ａさん」という発言がなされた場合、音声認識処理が実行されることで、４人のうちいずれかのテレビ電話の参加者の発言として「ＸＸですよね、Ａさん」が選択される。 For example, when Mr. B says to Mr. A, “It ’s XX, Mr. A”, the voice recognition process is executed, so that one of the four participants in the videophone call says “XX. “Mr. A” is selected.

音声認識機能４１は、選択されたテレビ電話の参加者の発言（音声）に関するデータである発言データを解析機能４２に逐次供給する。この発言データには、例えば、４人のうちいずれかのテレビ電話の参加者の発言として選択された「ＸＸですよね、Ａさん」に関するデータや、どの携帯電話機１からの音声信号であるかを示す制御情報（例えば、携帯電話機１の電話番号など）が含まれる。 The speech recognition function 41 sequentially supplies speech data, which is data relating to speech (speech) of the selected videophone participant, to the analysis function 42. In this message data, for example, the data related to “XX, Mr. A” selected as the message of one of the four videophone participants, and the voice signal from which mobile phone 1 are used. Control information (for example, a telephone number of the mobile phone 1) is included.

ステップＳ３において、解析機能４２は、音声認識機能４１から供給されたテレビ電話の参加者の発言データを逐次取得し、取得されたテレビ電話の参加者の発言データに基づいて、所定の解析（テレビ電話の参加者の発言回数やキーワードの発言回数などの解析）を行う。 In step S3, the analysis function 42 sequentially acquires the utterance data of the videophone participant supplied from the voice recognition function 41, and performs a predetermined analysis (TVV) based on the acquired utterance data of the videophone participant. Analysis of the number of utterances of telephone participants and the number of utterances of keywords).

具体的には、まず、解析機能４２は、取得されたテレビ電話の参加者の発言データに基づいてテレビ電話の参加者を特定する。すなわち、例えば図４に示されるように、ＢさんからＡさんに「ＸＸですよね、Ａさん」という発言がなされた場合に、その後、ＡさんからＢさんに「はい、そうです。」という発言（応答）がなされると、発言（応答）をしたテレビ電話の参加者が「Ａさん」であるということを特定することができる。同様の処理により、Ｂさん、Ｃさん、およびＤさんを特定することができる。 Specifically, first, the analysis function 42 identifies the participant of the videophone based on the acquired speech data of the participant of the videophone. That is, for example, as shown in FIG. 4, when Mr. B says to Mr. A, “You are XX, Mr. A,” then Mr. A says to Mr. B, “Yes, yes.” When (response) is made, it is possible to specify that the participant of the videophone who made the speech (response) is “Mr. A”. By the same processing, Mr. B, Mr. C, and Mr. D can be specified.

図４は、テレビ電話に参加しているユーザの所有する携帯電話機１の電話番号と参加者名の対応関係を表している。 FIG. 4 shows the correspondence between the telephone number of the mobile phone 1 owned by the user participating in the videophone and the participant name.

図４のテーブルの第１列目乃至第２列目には、左から順に、「電話番号」および「参加者名」が記載されており、それぞれ、テレビ電話に参加している参加者（ユーザ）の所有する携帯電話機１の電話番号、および、その携帯電話機１の電話番号に対応する参加者名を示している。 In the first column to the second column of the table in FIG. 4, “phone number” and “participant name” are described in order from the left, and each participant (user) participating in the videophone call is described. ) And the participant name corresponding to the telephone number of the mobile phone 1.

図４のテーブルの第１行目は、「電話番号」が「０９０−２３４５−××××」であり、テレビ電話に参加しているユーザの所有する携帯電話機１の電話番号が「０９０−２３４５−××××」であることを示している。「参加者名」は「Ａ」であり、その携帯電話機１の電話番号（「０９０−２３４５−××××」）に対応する参加者名が「Ａ」であることを示している。 In the first row of the table of FIG. 4, the “phone number” is “090-2345-xxx”, and the phone number of the mobile phone 1 owned by the user participating in the videophone is “090-”. 2345-xxxx ”. The “participant name” is “A”, which indicates that the participant name corresponding to the telephone number (“090-2345-xxx”) of the mobile phone 1 is “A”.

図４のテーブルの第２行目は、「電話番号」が「０９０−７５２３−××××」であり、テレビ電話に参加しているユーザの所有する携帯電話機１の電話番号が「０９０−７５２３−××××」であることを示している。「参加者名」は「Ｂ」であり、その携帯電話機１の電話番号（「０９０−７５２３−××××」）に対応する参加者名が「Ｂ」であることを示している。 In the second row of the table of FIG. 4, the “phone number” is “090-7523-xxx”, and the phone number of the mobile phone 1 owned by the user participating in the videophone is “090- 7523-xxx ”. “Participant name” is “B”, which indicates that the participant name corresponding to the telephone number (“090-7523-xxx”) of the mobile phone 1 is “B”.

なお、図４のテーブルの第３行目乃至第４行目についても同様であり、その説明は繰り返しになるので省略する。 Note that the same applies to the third to fourth rows of the table of FIG.

次に、解析機能４２は、取得されたテレビ電話の参加者の発言データに含まれる制御情報（例えば、携帯電話機１の電話番号など）に基づいて、テレビ電話の参加者のうち、現在の話者を特定する。例えばテレビ電話の参加者のうち、現在の話者がＡさん（電話番号が「０９０−２３４５−××××」の携帯電話機１を所有するユーザ）である場合、テレビ電話の参加者の発言データに含まれる制御情報（「０９０−２３４５−××××」の携帯電話機１の電話番号）に基づいて、現在の話者がＡさんであると特定される。 Next, the analysis function 42 selects the current talk among the participants of the videophone based on the control information (for example, the telephone number of the mobile phone 1) included in the acquired speech data of the participant of the videophone. Identify the person. For example, when the current speaker is Mr. A (the user who owns the mobile phone 1 whose telephone number is “090-2345-xxx”) among the participants of the videophone, the remarks of the participant of the videophone Based on the control information included in the data (the telephone number of the mobile phone 1 of “090-2345-xxx”), the current speaker is identified as Mr. A.

図５は、テレビ電話に参加しているユーザの所有する携帯電話機１の電話番号、参加者名、および発言中である話者の対応関係を表している。なお、図５のテーブルの第１列目および第２列目の「電話番号」および「参加者名」は、図４のテーブルの第１列目および第２列目の「電話番号」および「参加者名」と同様であり、その説明は繰り返しになるので省略する。 FIG. 5 shows the correspondence between the telephone number of the mobile phone 1 owned by the user participating in the videophone, the name of the participant, and the speaker who is speaking. The “phone numbers” and “participant names” in the first and second columns of the table of FIG. 5 are the “telephone numbers” and “ This is the same as “Participant Name”, and the description thereof will be omitted because it will be repeated.

図５のテーブルの第３列目には、「発言中」が記載されており、現在の発言中である話者であるか否かを示している。 In the third column of the table of FIG. 5, “speaking” is described, indicating whether or not the speaker is currently speaking.

図５のテーブルの第１行目の場合、「発言中」は「○」であり、参加者名「Ａ」であるユーザが現在の発言中である話者であることを示している。 In the case of the first row of the table of FIG. 5, “in speech” is “◯”, indicating that the user with the participant name “A” is the speaker who is currently speaking.

図５のテーブルの第２行目の場合、「発言中」は「―」であり、参加者名「Ｂ」であるユーザが現在の発言中である話者ではないことを示している。 In the case of the second row of the table of FIG. 5, “speaking” is “−”, indicating that the user with the participant name “B” is not the speaker currently speaking.

なお、図５のテーブルの第３行目乃至第４行目についても同様であり、その説明は繰り返しになるので省略する。 Note that the same applies to the third to fourth rows of the table of FIG.

勿論、テレビ電話の参加者のうち、現在の話者が複数（例えば、２人など）存在する場合、テレビ電話の参加者の発言データに含まれる制御情報に基づいて、複数の参加者が現在の話者として特定される。 Of course, if there are multiple (for example, two) current speakers among the participants of the videophone, the plurality of participants are currently selected based on the control information included in the speech data of the participants of the videophone. Identified as a speaker.

さらに、解析機能４２は、取得されたテレビ電話の参加者の発言データに基づいて、予め設定された所定の時間内での、４人の参加者Ａ乃至Ｄの発言回数、キーワードの発言回数、名前を呼ばれる回数、および返事（相槌を含む）をした回数などを解析する。これにより、話者以外であっても、例えば発言回数、キーワードの発言回数、および名前を呼ばれる回数が多ければ会話の中心人物（または会話の中心人物に近い人）と認識することができるし、また、返事をした回数が多ければ会話の中で相手役になっていると認識することができる。 Further, the analysis function 42 is based on the acquired utterance data of the participants of the videophone, and the number of utterances of the four participants A to D, the number of utterances of the keywords within a predetermined time set in advance, Analyze the number of times the name is called and the number of times the answer (including the answer) is made. As a result, even if it is not a speaker, for example, if the number of utterances, the number of utterances of a keyword, and the number of times a name is called are large, it can be recognized as a central person of conversation (or a person close to the central person of conversation) In addition, if the number of replies is large, it can be recognized that the player is a partner in the conversation.

なお、キーワードは、テレビ電話による会話の内容で繰り返し使用される単語をキーワードとするようにしてもよいし、ユーザの好みに応じて、予め設定するようにしてもよい。 The keyword may be a word that is repeatedly used in the content of a videophone conversation, or may be set in advance according to the user's preference.

図６は、テレビ電話に参加しているユーザの所有する携帯電話機１の電話番号、参加者名、発言中である話者、発言回数、キーワードの発言回数、名前を呼ばれる回数、および返事をした回数の対応関係を表している。なお、図６のテーブルの第１列目乃至第３列目の「電話番号」、「参加者名」、および「発言中」は、図５のテーブルの第１列目乃至第３列目の「電話番号」、「参加者名」、および「発言中」と同様であり、その説明は繰り返しになるので省略する。 FIG. 6 shows the telephone number of the mobile phone 1 owned by the user participating in the videophone, the name of the participant, the speaker who is speaking, the number of utterances, the number of utterances of the keyword, the number of times the name is called, and the reply It shows the correspondence of the number of times. Note that “phone numbers”, “participant names”, and “speaking” in the first to third columns of the table of FIG. 6 are the first to third columns of the table of FIG. This is the same as “telephone number”, “participant name”, and “speaking”, and a description thereof will be omitted.

図６のテーブルの第４列目乃至８列目には、左から順に、「発言回数」、「キーワードの発言回数」、「名前を呼ばれる回数」、および「返事をした回数」が記載されており、それぞれ、テレビ電話の参加者が会話中に発言した回数、テレビ電話を行う際の所定のテーマに関するキーワードを発言した回数、他の参加者から名前が呼ばれた回数、および、他の参加者に対して返事をした回数を示している。 In the fourth column to the eighth column of the table of FIG. 6, “the number of utterances”, “the number of utterances of the keyword”, “the number of times the name is called”, and “the number of replies” are described in order from the left. The number of times a videophone participant speaks during a conversation, the number of times a keyword related to a certain theme when making a videophone call, the number of times a name is called by another participant, and other participation The number of times the person has been answered.

図６のテーブルの第１行目の場合、「発言回数」は「７」であり、テレビ電話の参加者（参加者Ａ）が会話中に発言した回数が「７」であることを示している。「キーワードの発言回数」は「３」であり、テレビ電話を行う際の所定のテーマに関するキーワードを発言した回数が「３」であることを示している。「名前を呼ばれる回数」は「６」であり、他の参加者から名前（「Ａさん」という名前）が呼ばれた回数が「６」であることを示している。「返事をした回数」は「５」であり、他の参加者に対して返事をした回数が「５」であることを示している。 In the case of the first row in the table of FIG. 6, the “speech count” is “7”, and the videophone participant (participant A) speaks during the conversation is “7”. Yes. “Keyword utterance count” is “3”, which indicates that the number of utterances of a keyword related to a predetermined theme when making a videophone call is “3”. “The number of times the name is called” is “6”, which indicates that the number of times the name (named “Mr. A”) is called by another participant is “6”. “Number of replies” is “5”, indicating that the number of replies to other participants is “5”.

図６のテーブルの第２行目の場合、「発言回数」は「１」であり、テレビ電話の参加者（参加者Ｂ）が会話中に発言した回数が「１」であることを示している。「キーワードの発言回数」は「１」であり、テレビ電話を行う際の所定のテーマに関するキーワードを発言した回数が「１」であることを示している。「名前を呼ばれる回数」は「２」であり、他の参加者から名前（「Ａさん」という名前）が呼ばれた回数が「２」であることを示している。「返事をした回数」は「２」であり、他の参加者に対して返事をした回数が「２」であることを示している。 In the case of the second row of the table of FIG. 6, the “number of utterances” is “1”, and the number of utterances during the conversation by the participant (participant B) of the videophone is “1”. Yes. “Keyword utterance count” is “1”, which indicates that the number of utterances of keywords related to a predetermined theme when making a videophone call is “1”. The “number of times the name is called” is “2”, indicating that the number of times the name (named “Mr. A”) is called by another participant is “2”. “Number of replies” is “2”, indicating that the number of replies to other participants is “2”.

なお、図６のテーブルの第３行目乃至第４行目についても同様であり、その説明は繰り返しになるので省略する。 Note that the same applies to the third to fourth rows of the table of FIG.

ステップＳ４において、解析機能４２は、予め設定された所定の時間ごとに、その解析結果に基づいて、テレビ電話の参加者と発言回数などが対応付けられて登録されている発言履歴データベース４３を更新する。 In step S4, the analysis function 42 updates the utterance history database 43 in which videophone participants and the number of utterances are registered in association with each other based on the analysis result at predetermined time intervals set in advance. To do.

例えば、Ａ乃至Ｄの４人でテレビ電話を行った際に、図６のテーブルに示されるような解析結果に基づいて発言履歴データベース４３を更新する場合、図７に示されるように発言履歴データベース４３が更新される。 For example, when the utterance history database 43 is updated based on the analysis results as shown in the table of FIG. 6 when four people A to D make a videophone call, the utterance history database as shown in FIG. 43 is updated.

なお、例えば参加者Ａ乃至Ｄの４人でのテレビ電話が開始されると、予め設定された所定の時間ごとに画像配置制御処理が繰り返され、図７に示されるように発言履歴データベース４３が更新された後、予め設定された所定の時間が経過すると、例えば図８に示されるように発言履歴データベース４３が更新される。 For example, when a videophone call is started with four participants A to D, the image arrangement control process is repeated at a predetermined time set in advance, and the speech history database 43 is stored as shown in FIG. After the update, when a predetermined time set in advance elapses, the message history database 43 is updated as shown in FIG. 8, for example.

図８の例の場合、参加者名「Ｄ」であるユーザが現在の発言中である話者であり、その他の「発言回数」、「キーワードの発言回数」、「名前を呼ばれる回数」、および「返事をした回数」がそれぞれ更新されている。 In the case of the example in FIG. 8, the user with the participant name “D” is the speaker who is currently speaking, and the other “number of utterances”, “number of keyword utterances”, “number of times the name is called”, and “Number of replies” has been updated.

なお、本発明の実施形態においては、予め設定された所定の時間内に取得された音声信号に基づいて音声認識処理を行い、音声認識された発言（音声）を一括して解析し、その解析結果に基づいて、テレビ電話の参加者と発言回数などが対応付けられて登録されている発言履歴データベース４３を予め設定された所定の時間ごとに一括して更新するようにしたが、このような場合に限られず、逐次取得された音声信号に基づいて逐次音声認識処理を行い、音声認識された発言（音声）を逐次解析し、発言履歴データベース４３を逐次更新するようにしてもよい。 In the embodiment of the present invention, a speech recognition process is performed based on a speech signal acquired within a predetermined time set in advance, and the speech (speech) that has been speech-recognized is collectively analyzed. On the basis of the result, the utterance history database 43 in which the participants of the videophone and the number of utterances are registered in association with each other is updated in batches at predetermined time intervals. However, the present invention is not limited to this, and it is also possible to sequentially perform speech recognition processing based on sequentially acquired speech signals, sequentially analyze speech-recognized speech (speech), and update the speech history database 43 sequentially.

ステップＳ５において、解析機能４２は、更新された発言履歴データベース４３を参照して、現在の通話状況に応じた、液晶ディスプレイ１３に表示されるテレビ電話の参加者（例えばＡ乃至Ｄ）の画像の配置に関する優先度（すなわち、参加者Ａ乃至Ｄの４人でのテレビ電話において会話の頻度と重要度などが高いため、液晶ディスプレイ１３に表示画面において優先的に表示する度合い）を算出する。 In step S5, the analysis function 42 refers to the updated message history database 43 and displays the image of the videophone participant (for example, A to D) displayed on the liquid crystal display 13 according to the current call status. The priority relating to the arrangement (that is, the degree of priority on the display screen on the liquid crystal display 13 because the frequency and importance of the conversation are high in the videophone with four participants A to D) is calculated.

具体的には、図７の例の場合、参加者Ａが現在発言中である話者であることから、例えば参加者Ａ乃至Ｄの４人のうち、参加者Ａについて最も高く優先度が算出され、残りの参加者Ｃ、Ｄ、Ｂの順で優先度が低く算出される（参加者Ｂが最も優先度が低く算出される）。この優先度の算出に際して、発言回数や返事をした回数などを単純に加算するようにしてもよいし、それぞれの回数ごとに重み付けを行った後加算するようにしてもよい。 Specifically, in the case of the example in FIG. 7, since the participant A is a speaker who is currently speaking, for example, among the four participants A to D, the highest priority is calculated for the participant A The priority is calculated in the order of the remaining participants C, D, and B (participant B is calculated with the lowest priority). In calculating the priority, the number of utterances, the number of replies, and the like may be simply added, or may be added after weighting for each number of times.

解析機能４２は、算出された優先度に関するデータである優先度データを画像配置制御機能４４に供給する。 The analysis function 42 supplies priority data, which is data relating to the calculated priority, to the image arrangement control function 44.

ステップＳ６において、画像配置制御機能４４は、解析機能４２から供給された優先度データを取得し、取得された優先度データに基づいて、テレビ電話を行う際に液晶ディスプレイ１３に表示される参加者（図７の例の場合、Ａ乃至Ｄ）の画像の配置を制御するための画像配置制御信号を生成し、生成された画像配置制御信号をＬＣＤ制御部２６に供給する。 In step S6, the image arrangement control function 44 acquires the priority data supplied from the analysis function 42, and the participant displayed on the liquid crystal display 13 when making a videophone call based on the acquired priority data. An image arrangement control signal for controlling the arrangement of the images (A to D in the example of FIG. 7) is generated, and the generated image arrangement control signal is supplied to the LCD control unit 26.

例えば図７の例の場合、参加者Ａが現在発言中である話者であることから、参加者Ａ乃至Ｄの４人のうち、参加者Ａについて最も高く優先度が算出され、例えば図９に示されるように、参加者Ａの画像がメインＸ−１に表示される一方、他の参加者Ｂ乃至ＤがメインＸ−１の下のサブＸ−２乃至４で表示されるように参加者の画像の配置を制御するための画像配置制御信号が生成される。 For example, in the case of the example of FIG. 7, since the participant A is a speaker who is currently speaking, the highest priority is calculated for the participant A among the four participants A to D. For example, FIG. As shown, the participant A's image is displayed on the main X-1, while the other participants B to D participate in the sub-X-2 to 4 below the main X-1. An image arrangement control signal for controlling the arrangement of the person's image is generated.

ＬＣＤ制御部２６は、画像デコーダ２７から供給された複数のデコード後の画像信号を取得し、画像配置制御機能４４から供給された画像配置制御信号に基づいて、取得された複数のデコード後の画像信号に基づく参加者（例えばＡ乃至Ｄ）の画像を所望の位置に配置させて液晶ディスプレイ１３に表示させる。 The LCD control unit 26 acquires a plurality of decoded image signals supplied from the image decoder 27, and acquires the plurality of decoded images obtained based on the image arrangement control signal supplied from the image arrangement control function 44. An image of a participant (for example, A to D) based on the signal is arranged at a desired position and displayed on the liquid crystal display 13.

ステップＳ７において、液晶ディスプレイ１３は、ＬＣＤ制御部２６の制御に従い、図９に示されるように、複数のデコード後の画像信号に基づく参加者（例えばＡ乃至Ｄ）の画像を予め設定された所定の時間ごとに更新して表示する。 In step S7, the liquid crystal display 13 controls the images of the participants (for example, A to D) based on a plurality of decoded image signals as shown in FIG. Update and display every hour.

これにより、ユーザは、テレビ電話において会話の中心となっているユーザ（あるいは、発言中のユーザ）の顔を容易に判別することができ、テレビ電話において会話の中心となっているユーザ（あるいは、発言中のユーザ）が誰であるかを容易に認識することができる。 Thereby, the user can easily determine the face of the user who is the center of conversation in the videophone (or the user who is speaking), and the user who is the center of conversation in the videophone (or It is possible to easily recognize who the user is speaking).

なお、ステップＳ６および７において、例えば図７の例の場合、参加者Ａが現在発言中である話者であり、参加者Ａについて最も高く優先度が算出され、その次に参加者Ｃ、Ｄ、Ｂの順で高く優先度が算出されることから、例えば図１０に示されるように、参加者Ａの画像がメインＸ−１に表示され、参加者Ｃの画像がメインＸ−１の下の少し大きめのサブＸ−２に表示され、参加者Ｄの画像がメインＸ−１の下の少し大きめのサブＸ−３で表示され、そして、参加者Ｂの画像がメインＸ−１の下の少し大きめのサブＸ−４で表示されるように参加者の画像の配置を制御するための画像配置制御信号が生成されるようにしてもよい。 In steps S6 and S7, for example, in the example of FIG. 7, the participant A is a speaker who is currently speaking, and the highest priority is calculated for the participant A, and then the participants C, D Since the priority is calculated in the order of B and B, for example, as shown in FIG. 10, the image of the participant A is displayed on the main X-1, and the image of the participant C is below the main X-1. Is displayed on the slightly larger sub X-2, the image of the participant D is displayed on the slightly larger sub X-3 below the main X-1, and the image of the participant B is displayed below the main X-1. An image arrangement control signal for controlling the arrangement of the images of the participants may be generated so as to be displayed in the slightly larger sub X-4.

また、例えば１０人でテレビ電話を行う場合、参加者Ａが他の参加者から呼ばれる回数が多く、参加者Ａを中心に会話が成立していると認識されるときには、例えば図１１に示されるように、参加者Ａの画像がメインＸ−４に表示され、参加者Ａの会話の相手をしていると認識される優先度の高い他の参加者（Ｂ乃至Ｇ）の画像がサブＸ−１乃至Ｘ−３およびＸ−５乃至Ｘ−７に表示されるように参加者の画像の配置を制御するための画像配置制御信号が生成されるようにしてもよい。 Further, for example, when ten people make a videophone call, when it is recognized that the participant A is called many times by other participants and the conversation is centered on the participant A, for example, FIG. Thus, the image of the participant A is displayed on the main X-4, and the images of the other high-priority participants (B to G) that are recognized as the conversation partner of the participant A are sub-X. Image arrangement control signals for controlling the arrangement of the images of the participants may be generated so as to be displayed on -1 to X-3 and X-5 to X-7.

その後、処理はステップＳ１に戻り、ステップＳ１以降の処理が繰り返される。 Thereafter, the process returns to step S1, and the processes after step S1 are repeated.

本発明の実施形態においては、複数の携帯電話機１から取得された音声信号に基づいて音声認識処理を行うとともに、音声認識された発言に関するデータである発言データに基づいて所定の解析を行い、その解析結果に基づいて液晶ディスプレイ１３に表示されるテレビ電話の参加者の画像の配置に関する優先度を算出し、算出された優先度に基づいて、液晶ディスプレイ１３に表示される参加者の画像の配置を制御するようにしたので、複数のユーザが多地点間でテレビ電話を行う場合において、通話状況に応じて表示部（液晶ディスプレイ１３）に表示されるユーザの画像の配置を好適に制御することができる。これにより、携帯電話機１のように表示部の大きさが限られる場合であっても、ユーザは、テレビ電話において会話の中心となっているユーザ（あるいは、発言中のユーザ）の顔を容易に判別することができ、テレビ電話において会話の中心となっているユーザ（あるいは、発言中のユーザ）が誰であるかを容易に認識することができる。従って、テレビ電話を行う場合における利便性を向上させることができる。 In the embodiment of the present invention, voice recognition processing is performed based on voice signals acquired from a plurality of mobile phones 1, and predetermined analysis is performed based on utterance data which is data related to voice-recognized utterances. Based on the analysis result, the priority regarding the arrangement of the image of the participant of the videophone displayed on the liquid crystal display 13 is calculated, and the arrangement of the image of the participant displayed on the liquid crystal display 13 is calculated based on the calculated priority. Therefore, when a plurality of users make a videophone call between multiple points, the arrangement of the user images displayed on the display unit (liquid crystal display 13) is preferably controlled according to the call status. Can do. Thereby, even when the size of the display unit is limited as in the mobile phone 1, the user can easily see the face of the user (or the user who is speaking) who is the center of conversation in the videophone. It is possible to discriminate, and it is possible to easily recognize who is the user (or the user who is speaking) who is the center of conversation in the videophone. Therefore, the convenience in making a videophone call can be improved.

なお、「通話状況」とは、複数のユーザにより行われるテレビ電話における種々の状況、例えば現在の発言者が誰であるのか、その発言者の発言回数はどれくらいか、会話の中心人物は誰か、会話の内容はどのような内容であるかなどに関する状況と定義する。 In addition, “call situation” means various situations in videophone calls performed by a plurality of users, for example, who is the current speaker, how many times the speaker speaks, who is the central person in the conversation, The content of the conversation is defined as a situation regarding what kind of content it is.

また、図３を参照して説明した画像配置制御処理においては、予め設定された所定の時間（例えば、５分間など）ごとに、現在の通話状況に応じて、液晶ディスプレイ１３に表示される参加者の画像の配置を制御するようにしたが、テレビ電話を行う際の内容や参加者に応じて画像配置制御処理を繰り返す所定の時間を変更するようにしてもよい。 In addition, in the image arrangement control process described with reference to FIG. 3, the participation displayed on the liquid crystal display 13 at a predetermined time (for example, 5 minutes) according to the current call status. Although the arrangement of the person's image is controlled, the predetermined time for repeating the image arrangement control process may be changed according to the contents of the videophone call and the participants.

なお、本発明は、携帯電話機１以外にも、ＰＤＡ（Personal Digital Assistant）、パーソナルコンピュータ、その他の情報処理装置にも適用することができる。 The present invention can be applied to a PDA (Personal Digital Assistant), a personal computer, and other information processing apparatuses in addition to the mobile phone 1.

また、本発明の実施形態において説明した一連の処理は、ソフトウェアにより実行させることもできるが、ハードウェアにより実行させることもできる。 The series of processes described in the embodiments of the present invention can be executed by software, but can also be executed by hardware.

さらに、本発明の実施形態では、フローチャートのステップは、記載された順序に沿って時系列的に行われる処理の例を示したが、必ずしも時系列的に処理されなくとも、並列的あるいは個別実行される処理をも含むものである。 Furthermore, in the embodiment of the present invention, the steps of the flowchart show an example of processing performed in time series in the order described, but parallel or individual execution is not necessarily performed in time series. The processing to be performed is also included.

本発明に係る情報処理装置に適用可能な携帯電話機の内部の構成を示すブロック図。1 is a block diagram showing an internal configuration of a mobile phone applicable to an information processing apparatus according to the present invention. 本発明に係る情報処理装置に適用可能な携帯電話機が実行することができる機能的な構成を示すブロック図。The block diagram which shows the functional structure which the mobile telephone applicable to the information processing apparatus which concerns on this invention can perform. 図２の携帯電話機における画像配置制御処理を説明するフローチャート。3 is a flowchart for explaining image arrangement control processing in the mobile phone of FIG. 2. テレビ電話に参加しているユーザの所有する携帯電話機の電話番号、および参加者名の対応関係を示す図。The figure which shows the correspondence of the telephone number of the mobile telephone which the user who has participated in the videophone, and a participant name. テレビ電話に参加しているユーザの所有する携帯電話機の電話番号、参加者名、および発言中である話者の対応関係を示す図。The figure which shows the correspondence of the telephone number of the mobile telephone which the user who has participated in the videophone, a participant name, and the speaker who is speaking. テレビ電話に参加しているユーザの所有する携帯電話機の電話番号、参加者名、発言中である話者、発言回数、キーワードの発言回数、名前を呼ばれる回数、および返事をした回数の対応関係を示す図。Correspondence relationship between the phone number of the mobile phone owned by the user participating in the videophone, the name of the participant, the speaker who is speaking, the number of utterances, the number of utterances of the keyword, the number of times the name is called, and the number of replies FIG. 図２の発言履歴データベースの構成例を示す図。The figure which shows the structural example of the speech log | history database of FIG. 図２の発言履歴データベースの他の構成例を示す図。The figure which shows the other structural example of the speech log | history database of FIG. 図１の液晶ディスプレイに表示される参加者の画像の配置例を示す図。The figure which shows the example of arrangement | positioning of the image of the participant displayed on the liquid crystal display of FIG. 図１の液晶ディスプレイに表示される参加者の画像の他の配置例を示す図。The figure which shows the other example of arrangement | positioning of the image of the participant displayed on the liquid crystal display of FIG. 図１の液晶ディスプレイに表示される参加者の画像の他の配置例を示す図。The figure which shows the other example of arrangement | positioning of the image of the participant displayed on the liquid crystal display of FIG.

Explanation of symbols

１…携帯電話機、１１…操作キー、１２…ＣＣＤカメラ、１３…液晶ディスプレイ、１４…サブディスプレイ、１５…マイクロフォン、１６…スピーカ、２１…主制御部、２２…電源回路、２３…操作入力制御部、２４…画像エンコーダ、２５…カメラI/F部、２６…ＬＣＤ制御部、２７…画像デコーダ、２８…多重分離部、２９…変復調回路部、３０…音声コーデック、３１…メインバス、３２…同期バス、３３…送受信回路部、３４…アンテナ、３５…記録再生部、３６…メモリカード、３７…記憶部、３８…音楽制御部、４１…音声認識機能、４２…解析機能、４３…発言履歴データベース、４４…画像配置制御機能。 DESCRIPTION OF SYMBOLS 1 ... Mobile phone, 11 ... Operation key, 12 ... CCD camera, 13 ... Liquid crystal display, 14 ... Sub-display, 15 ... Microphone, 16 ... Speaker, 21 ... Main control part, 22 ... Power supply circuit, 23 ... Operation input control part , 24 ... Image encoder, 25 ... Camera I / F unit, 26 ... LCD control unit, 27 ... Image decoder, 28 ... Demultiplexing unit, 29 ... Modulation / demodulation circuit unit, 30 ... Audio codec, 31 ... Main bus, 32 ... Synchronization Bus 33, transmission / reception circuit unit 34 34 antenna 35 recording / reproduction unit 36 memory card 37 storage unit 38 music control unit 41 speech recognition function 42 analysis function 43 speech history database 44. Image layout control function.

Claims

In an information processing apparatus that performs videophone calls between multiple points via wireless communication,
Voice recognition means for acquiring image signals and voice signals from the plurality of information processing apparatuses, and performing voice recognition processing based on the acquired voice signals;
Display means for displaying an image based on a plurality of the image signals;
Based on the speech voice-recognized by the voice recognition means, the number of times the name is called is analyzed, and based on the result of the analysis, the priority regarding the arrangement of the images based on the plurality of image signals displayed by the display means An analysis means for calculating
An information processing apparatus comprising: control means for controlling arrangement of images based on the plurality of image signals displayed by the display means according to the priority calculated by the analysis means.

Wherein the by the analysis result analyzing means, say the number of times, claim, characterized in that contained speech number of keywords, at least one of the number of times of the return that is 1
The information processing apparatus described in 1.

2. The information processing according to claim 1, wherein the control unit performs control to enlarge and arrange an image based on the plurality of image signals displayed by the display unit as the priority is higher. apparatus.

The information processing apparatus according to claim 1, further comprising storage means for storing an analysis result by the analysis means.