JP4850509B2

JP4850509B2 - Communication terminal device and image display method in communication terminal device

Info

Publication number: JP4850509B2
Application number: JP2005374049A
Authority: JP
Inventors: 克彦清水
Original assignee: Kyocera Corp
Current assignee: Kyocera Corp
Priority date: 2005-12-27
Filing date: 2005-12-27
Publication date: 2012-01-11
Anticipated expiration: 2025-12-27
Also published as: JP2007180657A

Description

本発明は、通信端末装置及び通信端末装置における画像表示方法に関する。 The present invention relates to a communication terminal device and an image display method in the communication terminal device.

従来、多地点通信を実現する技術として、テレビ会議システムがある。テレビ会議システムでは、多数の端末から送られてきた画像データを一画面に表示し、音声データと共に各々の端末へ送信することにより、多地点を繋ぐテレビ会議を実現する。
このようなテレビ会議システムでは、例えば４名で会議を行う場合には、画面を縦横に２分割して４等分にするが、これでは話者を判別し難いので、話者を視覚的に判り易くするために、特許文献１や特許文献２のような技術が考え出されている。特許文献１に記載の技術は、話者を表示する領域（スクリーン）の表示面積を他の会議参加者を表示する領域（スクリーン）よりも大きくするものであり、特許文献２に記載の技術は、話者のスクリーンの明るさを他の会議参加者のスクリーンの明るさよりも明るくするものである。 Conventionally, there is a video conference system as a technique for realizing multipoint communication. In the video conference system, image data sent from a large number of terminals is displayed on one screen, and transmitted to each terminal together with audio data, thereby realizing a video conference connecting multiple points.
In such a video conferencing system, for example, when a meeting is held with four people, the screen is divided into four equal parts by dividing the screen vertically and horizontally, but this makes it difficult to distinguish the speaker, so the speaker is visually identified. In order to make it easy to understand, techniques such as Patent Document 1 and Patent Document 2 have been devised. The technique described in Patent Document 1 is such that the display area of an area (screen) for displaying a speaker is larger than the area (screen) for displaying other conference participants. The screen brightness of the speaker is made brighter than the screen brightness of other conference participants.

上記のように、テレビ会議システムでは、会議参加者の人数に応じて画面を矩形に分割しているが、これでは、会議参加者の人数によっては、画面の分割が困難になり、場合によっては画面に使用されない部分が生じてしまい、画面を有効活用できないという問題がある。
また、話者のスクリーンを他の会議参加者のスクリーンよりも大きくする場合、他の会議参加者のスクリーンが話者のスクリーンに覆われないように表示させ続けようとすると、他の会議参加者のスクリーンの面積は小さくすることになるが、その際、これらの大きさの異なる各スクリーンを好適に配置することは困難である。更に、話者が入れ替わると、各スクリーンの大小関係を更新することになるが、各スクリーンが矩形であると、各スクリーンの大きさを連続的に変化させつつ、大きさが変わった各スクリーンを好適に再配置することは、困難であるだけでなく視覚的に煩わしい。 As described above, in the video conference system, the screen is divided into rectangles according to the number of conference participants. However, depending on the number of conference participants, it may be difficult to divide the screen. There is a problem that a portion that is not used on the screen is generated, and the screen cannot be effectively used.
Also, if you make the speaker's screen larger than the screens of other conference participants, if you try to keep the other conference participants' screens uncovered by the speaker's screen, However, it is difficult to appropriately arrange the screens having different sizes. Furthermore, when the speaker changes, the size relationship of each screen will be updated. If each screen is a rectangle, the size of each screen will be changed continuously, and the size of each screen will change. Proper relocation is not only difficult, but also visually cumbersome.

本出願人は、上述のような問題を解決しようとする技術として、特許文献３〜８を出願している。これらの技術では、会議参加者の顔部分を楕円形状に切り出してスクリーンとし、各スクリーンを一画面に表示している。そして、各会議参加者が発する音声の大きさに応じて各スクリーンの表示面積を変化させることにより、話者が大きく表示されるようにするとともに、他の会議参加者の画像を好適に配置するようにしている。 The present applicant has applied for Patent Documents 3 to 8 as techniques for solving the above-described problems. In these techniques, the face part of the conference participant is cut into an oval shape to form a screen, and each screen is displayed on one screen. Then, by changing the display area of each screen in accordance with the volume of sound produced by each conference participant, the speaker is displayed in a large size, and images of other conference participants are preferably arranged. I am doing so.

一方、携帯電話機等の携帯端末装置は、多機能化が進んでおり、近年は、上述のテレビ会議システムのような複数人で顔を見ながら話すことができる機能の開発が進められている。携帯端末装置においては、画面も操作部も小型化されていることから、他の画面及び操作部が充実したシステムに比べて、手動の操作によらず自動的に各スクリーンの大きさや配置を制御することの重要度は高い。
特開平６−１４１３１０号公報特開平６−１４１３１１号公報特願２００５−３４４２９６号公報特願２００５−３４４７５２号公報特願２００５−３４４７５３号公報特願２００５−３４４７５４号公報特願２００５−３４４７５５号公報特願２００５−３４４７５６号公報 On the other hand, mobile terminal devices such as mobile phones are becoming more and more multifunctional, and in recent years, functions such as the above-described video conference system that allow a plurality of people to talk while looking at their faces are being developed. In mobile terminal devices, both the screen and the operation unit are miniaturized, so the size and arrangement of each screen are automatically controlled without manual operation, compared to a system with other screens and operation units. The importance of doing is high.
JP-A-6-141310 JP-A-6-141131 Japanese Patent Application No. 2005-344296 Japanese Patent Application No. 2005-344752 Japanese Patent Application No. 2005-344753 Japanese Patent Application No. 2005-344754 Japanese Patent Application No. 2005-344755 Japanese Patent Application No. 2005-344756

ところで、話者は、必ずしも一人であるとは限らない。例えば、企画や研究成果を発表する会議等の場合には、発表者が主たる話者となり、他の会議参加者は主に聞き手となるが、聞き手は、相槌や質問等の発言を行うことがある。このような場合、上記従来例によって、聞き手が相槌を打つ度に、話者が入れ替わったとしてスクリーンの大きさが入れ替わったり発表者及び聞き手のスクリーンが明滅したりするのでは、視覚的に煩わしく、ユーザの会議への集中が阻害されて好ましくない。 By the way, the speaker is not necessarily one person. For example, in the case of a meeting that announces a plan or research result, the presenter is the main speaker, and the other conference participants are mainly listeners. is there. In such a case, according to the above-described conventional example, every time the listener hits the screen, the size of the screen is changed or the screen of the presenter and the listener flickers because the speaker is changed. This is undesirable because the user's concentration on the conference is hindered.

更に、上記従来例では、スクリーンの大きさに無関係に輝度制御を行った場合、大きいスクリーンでは消費電流が大きく増加し、バッテリー駆動等の通信端末装置の電池の持ちに悪影響を与え、例えば通話可能時間が短くなる等の不都合を生じさせる恐れがある。
つまり、特許文献１、２のような従来技術は、固定的に設置して用いられるテレビ会議システムを前提に考えられたものであり、輝度制御を行う場合に消費電流に対する配慮を行っていない。バッテリー駆動等の通信端末装置では、表示デバイスでの消費電流を一定値以下に抑える等、電池の持ち（通話可能時間等）への影響を抑える配慮が必要である。 Furthermore, in the above conventional example, when brightness control is performed regardless of the size of the screen, the current consumption greatly increases on a large screen, which adversely affects the battery life of the battery-powered communication terminal device, for example, enabling a call There is a risk of causing inconveniences such as shortening the time.
That is, the prior arts such as Patent Documents 1 and 2 have been considered on the premise of a video conference system that is fixedly installed and used, and do not consider current consumption when performing luminance control. In a battery-driven communication terminal device, it is necessary to consider the influence on the battery holding time (such as a callable time), for example, by suppressing the current consumption in the display device to a certain value or less.

本発明は、上述した事情に鑑みてなされたもので、話者を判別し易いように各通信相手の画像を表示するにあたり、ユーザの集中を妨げないように各通信相手の画像を表示することを目的とする。 The present invention has been made in view of the above-described circumstances, and displays images of each communication partner so as not to disturb the user's concentration when displaying the image of each communication partner so that the speaker can be easily identified. With the goal.

上記課題を解決するために、本発明では、通信端末装置に係る第１の手段として、複数の通信相手の画像及び音声を受信し、表示手段に前記各画像を表示すると共に前記音声を発音する通信端末装置において、前記表示手段における前記各画像が占める領域の大きさに基づいて、前記各領域の輝度の上限値を設定する上限設定手段と、前記上限値を超えない範囲で前記各領域の輝度を制御する輝度制御手段とを備えるものを採用した。 In order to solve the above-described problem, in the present invention, as a first means related to a communication terminal device, images and sounds of a plurality of communication partners are received, the images are displayed on a display means, and the sound is pronounced. In the communication terminal device, upper limit setting means for setting an upper limit value of luminance of each area based on a size of an area occupied by each image in the display means, and a range of each area within a range not exceeding the upper limit value The thing provided with the brightness | luminance control means which controls a brightness | luminance was employ | adopted.

また、通信端末装置に係る第２の手段として、上記第１の手段において、前記輝度制御手段は、前記音声の大きさに基づいて、当該音声と共に受信された前記画像が占める前記領域の輝度を制御するものを採用した。 Further, as a second means related to the communication terminal device, in the first means, the brightness control means determines the brightness of the area occupied by the image received together with the sound based on the volume of the sound. The one to control is adopted.

通信端末装置に係る第３の手段として、上記第１又は２の手段において、前記上限設定手段は、前記表示手段に許容される消費電流を超えない範囲で輝度の上限値を設定し、前記輝度制御手段は、前記領域の輝度の制御にあたって、前記領域の大きさを加味するものを採用した。 As a third means related to the communication terminal device, in the first or second means, the upper limit setting means sets an upper limit value of luminance within a range not exceeding a current consumption allowed for the display means, and the luminance For the control of the luminance of the area, a control means that takes into account the size of the area is adopted.

通信端末装置に係る第４の手段として、上記第１から３の何れかの手段において、前記音声の大きさに基づいて、当該音声と共に受信された前記画像が占める前記領域の大きさを変更する表示面積制御手段を更に備えるものを採用した。 As a fourth means related to the communication terminal device, in any one of the first to third means, the size of the area occupied by the image received together with the sound is changed based on the size of the sound. A device further equipped with display area control means was adopted.

通信端末装置に係る第５の手段として、上記第１から４の何れかの手段において、前記表示手段の各画素は、自発光素子によって構成されているものを採用した。 As a fifth means related to the communication terminal device, in any one of the first to fourth means, each pixel of the display means is constituted by a self-luminous element.

更に、本発明では、通信端末装置における画像表示方法に係る第１の手段として、複数の通信相手の画像及び音声を受信し、表示手段に前記各画像を表示すると共に前記音声を発音する通信端末装置における画像表示方法であって、前記表示手段における前記各画像が占める領域の大きさに基づいて、前記各領域の輝度の上限値を設定し、前記上限値を超えない範囲で前記各領域の輝度を制御する方法を採用した。 Furthermore, in the present invention, as a first means related to an image display method in a communication terminal device, a communication terminal that receives images and sounds of a plurality of communication partners, displays each image on a display means, and produces the sound. An image display method in the apparatus, wherein an upper limit value of luminance of each region is set based on a size of a region occupied by each image in the display unit, and the upper limit value of each region is not exceeded. A method of controlling the brightness was adopted.

また、通信端末装置における画像表示方法に係る第２の手段として、上記第１の手段において、前記音声の大きさに基づいて、当該音声と共に受信された前記画像が占める前記領域の輝度を制御する方法を採用した。 Further, as a second means related to the image display method in the communication terminal device, in the first means, the luminance of the area occupied by the image received together with the sound is controlled based on the volume of the sound. The method was adopted.

通信端末装置における画像表示方法に係る第３の手段として、上記第１又は２の手段において、前記表示手段に許容される消費電流を超えない範囲で輝度の上限値を設定し、前記領域の輝度の制御にあたって、前記領域の大きさを加味する方法を採用した。 As a third means related to the image display method in the communication terminal device, in the first or second means, an upper limit value of luminance is set within a range not exceeding the current consumption allowed for the display means, and the luminance of the area In the control of the above, a method in which the size of the region is taken into consideration was adopted.

通信端末装置における画像表示方法に係る第４の手段として、上記第１から３の何れかの手段において、前記音声の大きさに基づいて、当該音声と共に受信された前記画像が占める前記領域の大きさを変更する方法を採用した。 As a fourth means related to the image display method in the communication terminal device, in any one of the first to third means, the size of the region occupied by the image received together with the sound based on the sound volume. The method of changing the length was adopted.

更に、本発明では、通信端末装置に係る第６の手段として、複数の通信相手の画像及び音声を受信して前記画像を前記音声と共に再生する通信端末装置において、前記音声の大きさに基づいて前記通信相手の発言の開始及び終了を検出し、前記画像を再生する際の輝度を当該画像と共に受信した前記音声の大きさに基づいて設定し、前記発言が終了した際には、前記輝度を前記発言の持続時間に応じて減衰させ、前記発言が終了した際には、前記輝度を、前記発言の持続時間が長いほど速く、前記発言の持続時間が短いほど遅く、減衰させるものを採用した。 Furthermore, in the present invention, as a sixth means related to the communication terminal device, in a communication terminal device that receives images and sounds of a plurality of communication partners and reproduces the images together with the sound, the communication terminal device is based on the volume of the sound. The start and end of the communication partner's speech is detected, and the brightness at the time of reproducing the image is set based on the volume of the sound received together with the image, and when the speech ends, the brightness is set. Attenuation is performed according to the duration of the speech, and when the speech is finished, the brightness is increased as the duration of the speech is longer, and the brightness is decreased as the duration of the speech is shorter . .

本発明によれば、複数人の画像を一画面に表示して音声と共に再生するにあたり、各画像から抽出した特定の領域の輝度を制御することができるので、例えば話者の輝度を聞き手の輝度に対して相対的に高くすることによって話者を特定しやすくでき、輝度を高くするにあたって、領域の大きさに基づいて設定した上限値を超えないようにすることにより、大きく表示されていて十分な注目を得ている領域を更に明るく光らせることによる電力の浪費を防止することができる。 According to the present invention, when displaying images of a plurality of people on one screen and reproducing them together with sound, it is possible to control the brightness of a specific area extracted from each image. It is easy to identify the speaker by making it relatively high, and when making the brightness high, by making sure that it does not exceed the upper limit set based on the size of the area, it is displayed large enough It is possible to prevent waste of electric power caused by brightening a region that has received a lot of attention.

以下、本発明の一実施形態について図面及び数式を参照して説明する。図１及び図２は、本発明の通信端末装置の実施の一形態である携帯電話機の要部の構成例を示す図であって、図１はエンコード装置を示すブロック図であり、図２はデコード装置を示すブロック図である。
携帯電話機は、送信元となるエンコード装置１０と、受信側となるデコード装置２０とを有し、多地点通信可能に構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings and mathematical expressions. 1 and 2 are diagrams showing a configuration example of a main part of a mobile phone which is an embodiment of a communication terminal device according to the present invention. FIG. 1 is a block diagram showing an encoding device, and FIG. It is a block diagram which shows a decoding apparatus.
The mobile phone has an encoding device 10 as a transmission source and a decoding device 20 as a reception side, and is configured to be capable of multipoint communication.

エンコード装置１０は、音声入力部１０１、画像入力部１０２、操作部１０３、音声符号化処理部１０４、画像符号化処理部１０５、端末制御部１０６、制御情報生成部１０７、送信パケット生成部１０８、及びネットワークＩ／Ｆ（インタフェース）１０９を備えている。 The encoding apparatus 10 includes an audio input unit 101, an image input unit 102, an operation unit 103, an audio encoding processing unit 104, an image encoding processing unit 105, a terminal control unit 106, a control information generation unit 107, a transmission packet generation unit 108, And a network I / F (interface) 109.

音声入力部１０１は、マイクロフォン等からなる。画像入力部１０２は、デジタルカメラ等からなる。操作部１０３は、キーを備え、ユーザによる入力を受け付ける。音声符号化処理部１０４は、音声入力部１０１により入力される音声データを符号化する。画像符号化処理部１０５は、画像入力部１０２により入力される画像データを符号化する。端末制御部１０６は、操作部１０３の入力情報に基づいて携帯電話機を制御する。制御情報生成部１０７は、端末制御部１０６が携帯電話機に対して行う制御に基づいて制御情報を生成する。
送信パケット生成部１０８は、音声符号化処理部１０４が符号化した音声データ、画像符号化処理部１０５が符号化した画像データ、制御情報生成部１０８が生成した制御情報を送信パケットとして生成する。ネットワークＩ／Ｆ１０９は、ネットワークを介して、送信パケット生成部１０８が生成した送信パケットを通信相手の携帯電話機やサーバに送信する。 The voice input unit 101 includes a microphone or the like. The image input unit 102 includes a digital camera or the like. The operation unit 103 includes keys and accepts input from the user. The audio encoding processing unit 104 encodes audio data input from the audio input unit 101. The image encoding processing unit 105 encodes the image data input from the image input unit 102. The terminal control unit 106 controls the mobile phone based on input information from the operation unit 103. The control information generation unit 107 generates control information based on the control that the terminal control unit 106 performs on the mobile phone.
The transmission packet generation unit 108 generates audio data encoded by the audio encoding processing unit 104, image data encoded by the image encoding processing unit 105, and control information generated by the control information generation unit 108 as transmission packets. The network I / F 109 transmits the transmission packet generated by the transmission packet generation unit 108 to the mobile phone or server of the communication partner via the network.

デコード装置２０は、ネットワークＩ／Ｆ２０１、操作部２０２、受信パケット解析部２０３、音声復号処理部２０４、画像復号処理部２０５、画像制御部２０６、音量修正部２０７、音声出力部２０８、画像補正部２０９、画像出力部２１０（表示手段）、及び自端末制御部２１１を備えている。 The decoding device 20 includes a network I / F 201, an operation unit 202, a received packet analysis unit 203, an audio decoding processing unit 204, an image decoding processing unit 205, an image control unit 206, a volume correction unit 207, an audio output unit 208, and an image correction unit. 209, an image output unit 210 (display means), and a local terminal control unit 211.

ネットワークＩ／Ｆ２０１は、音声データ、画像データ、制御情報等を含むパケットを通信相手の携帯電話機やサーバから受信する。操作部２０２は、キーを備え、ユーザによる入力を受け付ける。受信パケット解析部２０３は、ネットワークＩ／Ｆ２０１が受信したパケットを解析し、音声データ、画像データ、制御情報等を抽出する。音声復号処理部２０４は、受信パケット解析部２０３により抽出された音声データを復号する。画像復号処理部２０５は、受信パケット解析部２０３により抽出された画像データを復号する。
画像制御部２０６は、画像復号処理部２０５により復号された画像データ、制御情報、音声復号処理部２０４により復号された音声データに基づいて、画像データに基づいた画像を画像出力部２１０に表示する際の表示位置、表示サイズ、輝度等を制御する。音量修正部２０７は、音声復号処理部２０４により復号された音声の音量を修正する。音声出力部２０８は、スピーカ等であり、音量修正部２０７で修正された音量で音声データに基づいた音声を発音する。画像補正部２０９は、画像制御部２０６により制御された画像データを補正する。画像出力部２１０は、各画素を有機ＥＬ（electro-luminescence）素子（自発光素子）で構成したものであり、画像補正部２０９で補正された画像データに基づいて複数の通信相手の画像を表示する。自端末制御部２１１は、操作部２０２からの入力情報に基づいて、画像制御部２０６に、自端末の天地情報等の制御情報を与える。 The network I / F 201 receives a packet including audio data, image data, control information, and the like from a mobile phone or a server as a communication partner. The operation unit 202 includes keys and accepts input from the user. The received packet analysis unit 203 analyzes a packet received by the network I / F 201 and extracts voice data, image data, control information, and the like. The voice decoding processing unit 204 decodes the voice data extracted by the received packet analysis unit 203. The image decoding processing unit 205 decodes the image data extracted by the received packet analysis unit 203.
The image control unit 206 displays an image based on the image data on the image output unit 210 based on the image data decoded by the image decoding processing unit 205, the control information, and the audio data decoded by the audio decoding processing unit 204. The display position, display size, brightness, etc. are controlled. The volume correction unit 207 corrects the volume of the audio decoded by the audio decoding processing unit 204. The audio output unit 208 is a speaker or the like, and produces a sound based on the audio data with the volume corrected by the volume correction unit 207. The image correction unit 209 corrects the image data controlled by the image control unit 206. The image output unit 210 is configured by configuring each pixel with an organic EL (electro-luminescence) element (self-emitting element), and displays images of a plurality of communication partners based on the image data corrected by the image correction unit 209. To do. The own terminal control unit 211 gives control information such as the top and bottom information of the own terminal to the image control unit 206 based on the input information from the operation unit 202.

なお、エンコード装置１０とデコード装置２０においては、操作部１０３と操作部２０２、ネットワークＩ／Ｆ１０９とネットワークＩ／Ｆ２０１、端末制御部１０６と自端末制御部２１１は共用することが可能である。 In the encoding device 10 and the decoding device 20, the operation unit 103 and the operation unit 202, the network I / F 109 and the network I / F 201, the terminal control unit 106, and the own terminal control unit 211 can be shared.

以下に、画像制御部２０６について説明する。画像制御部２０６は、天地補正部２０６１、顔エリア検出部２０６２（抽出手段）、スクリーン判定部２０６３、切り出し処理部２０６４、サイズ・輝度算出部２０６５（上限設定手段、輝度制御手段、表示面積制御手段）、縮小・拡大処理部２０６６、表示位置算出部２０６７、マッピング処理部２０６８を有する。 The image control unit 206 will be described below. The image control unit 206 includes a top and bottom correction unit 2061, a face area detection unit 2062 (extraction unit), a screen determination unit 2063, a cutout processing unit 2064, and a size / luminance calculation unit 2065 (upper limit setting unit, luminance control unit, display area control unit). ), A reduction / enlargement processing unit 2066, a display position calculation unit 2067, and a mapping processing unit 2068.

天地補正部２０６１は、画像復号処理部２０５にて復号された画像データに関連付けられた天地情報に基づいて、画像データの天地を画像出力部２１０の画面の天地と一致するように補正する。顔エリア検出部２０６２は、画像データから顔のエリアを検出し抽出する。スクリーン判定部２０６３は、顔エリア検出部２０６２にて検出された顔エリアを含むように、話者を表示するスクリーンに表示すべき領域を判定する。切り出し処理部２０６４は、スクリーン判定部２０６３の判定に基づいて、画像データから該当する領域を切り出す。サイズ・輝度算出部２０６５は、音声データの音量に応じて、スクリーンの表示サイズと、スクリーンを画像出力部２１０に表示させる際の輝度とを算出する。縮小・拡大処理部２０６６は、サイズ・輝度算出部２０６５の算出結果に基づいて、切り出し処理部２０６４にて切り出した画像データを縮小・拡大する。表示位置算出部２０６７は、縮小・拡大処理部２０６６にて縮小・拡大された画像データの表示位置を算出する。マッピング処理部２０６８は、表示位置算出部２０６７にて得られた画像出力部２１０上の位置に、縮小・拡大処理部２０６６にて得られた画像データをマッピングする。 The top / bottom correction unit 2061 corrects the top / bottom of the image data to match the top / bottom of the screen of the image output unit 210 based on the top / bottom information associated with the image data decoded by the image decoding processing unit 205. The face area detection unit 2062 detects and extracts a face area from the image data. The screen determination unit 2063 determines an area to be displayed on the screen that displays the speaker so as to include the face area detected by the face area detection unit 2062. The cutout processing unit 2064 cuts out a corresponding area from the image data based on the determination by the screen determination unit 2063. The size / luminance calculation unit 2065 calculates the display size of the screen and the luminance when the screen is displayed on the image output unit 210 according to the volume of the audio data. The reduction / enlargement processing unit 2066 reduces or enlarges the image data cut out by the cutout processing unit 2064 based on the calculation result of the size / luminance calculation unit 2065. The display position calculation unit 2067 calculates the display position of the image data reduced / enlarged by the reduction / enlargement processing unit 2066. The mapping processing unit 2068 maps the image data obtained by the reduction / enlargement processing unit 2066 to the position on the image output unit 210 obtained by the display position calculation unit 2067.

スクリーン判定部２０６３は、顔エリア検出部２０６２にて顔エリアが検出された場合、スクリーン形状として、この顔エリアを包含し顔エリア以外の部分を最小とする楕円形を選択する。また、スクリーン判定部２０６３は、顔エリア検出部２０６２にて、顔エリアが検出されなかった場合、画像を風景と判定し、矩形を選択する。 When the face area detection unit 2062 detects a face area, the screen determination unit 2063 selects an ellipse that includes this face area and minimizes a portion other than the face area as the screen shape. Further, when the face area detection unit 2062 does not detect a face area, the screen determination unit 2063 determines that the image is a landscape and selects a rectangle.

サイズ・輝度算出部２０６５は、音声データの音量、及び過去に受信した音声データの音量に基づいて、スクリーンのサイズ及び輝度を算出する。以下に、本実施形態の特徴部分であるスクリーンの輝度の算出について説明する。 The size / luminance calculation unit 2065 calculates the size and luminance of the screen based on the volume of the audio data and the volume of the audio data received in the past. In the following, calculation of screen brightness, which is a characteristic part of the present embodiment, will be described.

図３は、テレビ会議中の画像出力部２１０の画面の正面図である。ここでは、説明を解り易くするために、企画や研究成果の発表のように、１人が主たる話者であり、他は主に聞き手となっている場合を例とした。
画面には、Ｓ(０)〜Ｓ(３)の４つのスクリーンが表示されている。これらのスクリーンＳ(ｊ)の大小関係は、今回の会議参加者の関係性を視覚的に判り易くするために固定されている。即ち、大きいスクリーンＳ(０)は、主たる話者である企画や研究成果の発表者が表示されるスクリーンであり、小さい３つのスクリーンＳ(１)〜Ｓ(３)は、３人の聞き手それぞれが表示されるスクリーンである。 FIG. 3 is a front view of the screen of the image output unit 210 during the video conference. Here, in order to make the explanation easy to understand, an example is given in which one person is the main speaker and the other is mainly the listener, as in planning and presentation of research results.
On the screen, four screens S (0) to S (3) are displayed. The magnitude relationship of these screens S (j) is fixed to make it easier to visually understand the relationship of the current conference participants. In other words, the large screen S (0) is a screen on which the main speaker, the presenter of the project and research results, is displayed, and the three small screens S (1) to S (3) are the three listeners. Is a screen on which is displayed.

スクリーンＳ(ｊ)には、面積Ｒ(ｊ)、画面１上での輝度Ｌ(ｊ)、音量Ｖ(ｊ)をという３種類の値がそれぞれ対応付けられている。各スクリーンＳ(ｊ)が得るべき注目度をＡ(ｊ)とすると、発言中の人物が表示されているスクリーンＳ(ｊ)が注目を集めるのが望ましいので、各注目度Ａ(ｊ)の比が各音量Ｖ(ｊ)の比に等しくなるように制御する。このことは、式（１）で表される。 The screen S (j) is associated with three types of values: an area R (j), a luminance L (j) on the screen 1, and a volume V (j). Assuming that the attention level to be obtained by each screen S (j) is A (j), it is desirable that the screen S (j) on which the person who is speaking is displayed attracts attention. The ratio is controlled to be equal to the ratio of each volume V (j). This is expressed by equation (1).

Ａ(１)：Ａ(２)： … ：Ａ(ｎ−１)：Ａ(ｎ)
＝Ｖ(１)：Ｖ(２)： … ：Ｖ(ｎ−１)：Ｖ(ｎ) …（１） A (1): A (2): ...: A (n-1): A (n)
= V (1): V (2): ...: V (n-1): V (n) (1)

また、注目度Ａ(ｊ)は、面積Ｒ(ｊ)が大きいほど高くなり、輝度Ｌ(ｊ)が高いほど高くなると考えられるので、面積Ｒ(ｊ)と輝度Ｌ(ｊ)との積とする。このことは、式（２）で表される。 The attention level A (j) is considered to increase as the area R (j) increases, and as the brightness L (j) increases, the attention degree A (j) is calculated as the product of the area R (j) and the brightness L (j). To do. This is expressed by equation (2).

Ａ(ｊ)＝Ｒ(ｊ)×Ｌ(ｊ) …（２） A (j) = R (j) × L (j) (2)

そして、式（１）及び式（２）より、式（３）が得られる。 And Formula (3) is obtained from Formula (1) and Formula (2).

Ｒ(０)×Ｌ(０)：Ｒ(１)×Ｌ(１)： …
… ：Ｒ(ｎ−１)×Ｌ(ｎ−１)：Ｒ(ｎ)×Ｌ(ｎ)
＝Ｖ(０)：Ｖ(１)： … ：Ｖ(ｎ−１)：Ｖ(ｎ) …（３） R (0) × L (0): R (1) × L (1):…
...: R (n-1) x L (n-1): R (n) x L (n)
= V (0): V (1): ...: V (n-1): V (n) (3)

ここで、式（３）から各輝度Ｌ(ｊ)の比を求めるために、Ｖ(ｊ)／Ｒ(ｊ)＝ｐ_ｊとすると、式（４）が得られる。 Here, in order to obtain the ratio of each luminance L (j) from Equation (3), if V (j) / R (j) = p _j , Equation (4) is obtained.

Ｌ(０)：Ｌ(１)： … ：Ｌ(ｎ−１)：Ｌ(ｎ)
＝ｐ_０：ｐ_１： … ：ｐ_ｎ−１：ｐ_ｎ …（４） L (0): L (1): ...: L (n-1): L (n)
= P ₀ : p ₁ : ...: p _n-1 : p _n (4)

ここで、画像出力部２１０には、輝度の初期値Ｌ_ｉｎｉが規定されている。上記の輝度Ｌ(ｊ)のうち、最小のものをＬ_ｍｉｎ＝α×Ｌ_ｉｎｉ（αは定数）と表し、輝度Ｌ_ｍｉｎと他の輝度Ｌ(ｊ)との比をｐ_ｍｉｎ：ｐ_ｊと表すと、他の輝度Ｌ(ｊ)は、式（５）で表される。 Here, an initial luminance value L _ini is defined in the image output unit 210. Among the above luminance L (j), the smallest one is expressed as L _min = α × L _ini (α is a constant), and the ratio between the luminance L _min and the other luminance L (j) is expressed as p _min : p _j In other words, the other luminance L (j) is expressed by Expression (5).

Ｌ(ｊ)＝(ｐ_ｊ／ｐ_ｍｉｎ)×α×Ｌ_ｉｎｉ …（５） L (j) = (p _j / p _min ) × α × L _ini (5)

ここで、画像出力部２１０の各画素を構成する有機ＥＬ素子は、一般的に、電流値で輝度が決まる。即ち、輝度Ｌ(ｊ)は単位面積あたりの電流値Ｉ(ｊ)に比例する。輝度がＬ_ｉｎｉのときの単位面積あたりの電流値をＩ_ｉｎｉとすると、式（５）から式（６）が得られる。 Here, the luminance of the organic EL elements constituting each pixel of the image output unit 210 is generally determined by a current value. That is, the luminance L (j) is proportional to the current value I (j) per unit area. When the current value per unit area when the luminance is L _ini is I _ini , Equation (5) to Equation (6) are obtained.

Ｉ(ｊ)＝(ｐ_ｊ／ｐ_ｍｉｎ)×α×Ｉ_ｉｎｉ …（６） I (j) = (p _j / p _min ) × α × I _ini (6)

そして、これにより、画像出力部２１０が消費する電流値の合計Ｗは、式（７）で表される。 Thus, the total current value W consumed by the image output unit 210 is expressed by Expression (7).

ここで、画像出力部２１０には、許容できる最大の電流値Ｗ_ｍａｘが規定されている。従って、Ｗ≦Ｗ_ｍａｘであるので、定数αは、式（８）で表される。 Here, a maximum allowable current value W _max is defined in the image output unit 210. Therefore, since W ≦ W _max , the constant α is expressed by Expression (8).

このαを用いて各スクリーンＳ(ｊ)の輝度を設定することにより、各スクリーンＳ(ｊ)が得るべき注目度Ａ(ｊ)に応じた輝度設定としつつ、画像出力部２１０が消費する電流値Ｗを許容できる最大の電流値Ｗ_ｍａｘ以下に抑えることが可能となる。
なお、最小の輝度Ｌ_ｍｉｎがＬ_ｉｎｉを超えることがないように、α≦１という条件を加えてもよい。 By setting the luminance of each screen S (j) using this α, the current consumed by the image output unit 210 while setting the luminance according to the degree of attention A (j) to be obtained by each screen S (j) It becomes possible to keep the value W below the maximum allowable current value W _max .
Note that a condition of α ≦ 1 may be added so that the minimum luminance L _min does not exceed L _ini .

さて、図３に示す例に上記の式をあてはめると、まず、この例において、各スクリーンＳ(ｊ)の面積Ｒ(ｊ)の比は、
Ｒ(０)：Ｒ(１)：Ｒ(２)：Ｒ(３)＝２：１：１：１
である。そして、スクリーンＳ(０)に表示されている人物とスクリーンＳ(２)に表示されている人物とが発言中であるとして、各スクリーンＳ(ｊ)の音量Ｖ(ｊ)の比が、
Ｖ(０)：Ｖ(１)：Ｖ(２)：Ｖ(３)＝２：１：２：１
であるとすると、式（３）より、
２Ｌ(０)：Ｌ(１)：Ｌ(２)：Ｌ(３)＝２：１：２：１
上記の式より、
２Ｌ(０)：Ｌ(１)＝２：１
２Ｌ(０)：Ｌ(２)＝２：２
２Ｌ(０)：Ｌ(３)＝２：１
であるので、即ち、
Ｌ(０)：Ｌ(１)＝１：１
Ｌ(０)：Ｌ(２)＝１：２
Ｌ(０)：Ｌ(３)＝１：１
となり、つまり、
Ｌ(０)：Ｌ(１)：Ｌ(２)：Ｌ(３)＝１：１：２：１
である。 Now, when the above formula is applied to the example shown in FIG. 3, first, in this example, the ratio of the area R (j) of each screen S (j) is:
R (0): R (1): R (2): R (3) = 2: 1: 1: 1
It is. Then, assuming that the person displayed on the screen S (0) and the person displayed on the screen S (2) are speaking, the ratio of the volume V (j) of each screen S (j) is:
V (0): V (1): V (2): V (3) = 2: 1: 2: 1
If it is, from Formula (3),
2L (0): L (1): L (2): L (3) = 2: 1: 2: 1
From the above formula,
2L (0): L (1) = 2: 1
2L (0): L (2) = 2: 2
2L (0): L (3) = 2: 1
That is, that is,
L (0): L (1) = 1: 1
L (0): L (2) = 1: 2
L (0): L (3) = 1: 1
That is,
L (0): L (1): L (2): L (3) = 1: 1: 2: 1
It is.

上記の式より、輝度を最小とするスクリーンＳ(ｊ)の輝度をα×Ｌ_ｉｎｉ（α≦１）とすると、
Ｌ(０)＝Ｌ(１)＝Ｌ(３)＝α×Ｌ_ｉｎｉ
Ｌ(２)＝２α×Ｌ_ｉｎｉ
である。 From the above equation, if the luminance of the screen S (j) that minimizes the luminance is α × L _ini (α ≦ 1),
L (0) = L (1) = L (3) = α × L _ini
L (2) = 2α × L _ini
It is.

上記の式より、輝度がＬ_ｉｎｉのときの単位面積あたりの電流値をＩ_ｉｎｉとすると、
Ｉ(０)＝Ｉ(１)＝Ｉ(３)＝α×Ｉ_ｉｎｉ
Ｉ(２)＝２α×Ｉ_ｉｎｉ From the above formula, if the current value per unit area when the luminance is L _ini is I _ini ,
I (0) = I (1) = I (3) = α × I _ini
I (2) = 2α × I _ini

上記の式及び式（７）より、画像出力部２１０が消費する電流値の合計は、
Ｗ＝Ｒ(０)×Ｉ(０)＋Ｒ(１)×Ｉ(１)＋Ｒ(２)×Ｉ(２)＋Ｒ(３)×Ｉ(３)
＝α×Ｉ_ｉｎｉ(Ｒ(０)＋Ｒ(１)＋２Ｒ(２)＋Ｒ(３))
Ｗ≦Ｗ_ｍａｘであるので
α≦Ｗ／(Ｉ_ｉｎｉ(Ｒ(０)＋Ｒ(１)＋２Ｒ(２)＋Ｒ(３))) 且つ α≦１ From the above equation and equation (7), the total current value consumed by the image output unit 210 is
W = R (0) × I (0) + R (1) × I (1) + R (2) × I (2) + R (3) × I (3)
= Α × I _ini (R (0) + R (1) + 2R (2) + R (3))
Since W ≦ W _max , α ≦ W / (I _ini (R (0) + R (1) + 2R (2) + R (3))) and α ≦ 1

上記の式よりαを決定することができ、各スクリーンＳ(ｊ)に設定すべき輝度を決定することができる。 Α can be determined from the above equation, and the luminance to be set for each screen S (j) can be determined.

サイズ・輝度算出部２０６５は、上述の式中の音量Ｖ(ｊ)として、音声復号処理部２０４から入力される音声データの音量を図４及び図５のように輝度算出用に加工したものを用いている。
図４及び図５は、音量の経時変化を表すグラフである。これらのグラフにおいては、音声復号処理部２０４から入力される音声データの音量（入力音量）Ｖ_receiveを実線で示し、この音量を基に輝度算出用に加工した音量（加工音量）Ｖ_displayを破線で示した。
輝度算出に加工音量Ｖ_displayを用いる理由は、入力音量Ｖ_receiveによって輝度を算出するようにすると、輝度が安定せず、ちらついて見えるという問題が起こり、また、相槌等の短い発言のときには、輝度が上昇してから下降するまでの時間が短いので、輝度が高くなっている時間が短く、どのスクリーンＳ(ｊ)に表示されている人物が発言したのか認識しにくいという問題が起こるためである。
そこで、加工音量Ｖ_displayは、発言中は入力音量Ｖ_receiveを平均化したものとし、発言終了後は発言中の加工音量Ｖ_displayから緩やかに減衰させたものとしている。また、発言終了後の減衰は、発言継続時間が短いほど緩やかにしている。
図４は、相槌等の短い発言の場合のものであり、図５は、所定時間以上の長さの発言の場合のものである。図４及び図５において、Ｔ_speakは話している期間（発声期間）、Ｔ_silentは沈黙している期間（静音期間）である。また、Ｖ_levelは所定間隔で設定された音量の閾値である。また、図４（ａ）及び図５（ａ）は、発言中の音量が高めの場合、図４（ｂ）及び図５（ｂ）は、発言中の音量が低めの場合を表している。 The size / luminance calculation unit 2065 is obtained by processing the volume of the audio data input from the audio decoding processing unit 204 for luminance calculation as shown in FIGS. 4 and 5 as the volume V (j) in the above formula. Used.
4 and 5 are graphs showing a change in sound volume with time. In these graphs, the volume (input volume) V _receive of the audio data input from the audio decoding processing unit 204 is indicated by a solid line, and the volume (processed volume) V _display processed for luminance calculation based on this volume is indicated by a broken line. It showed in.
The reason why the processing volume V _display is used for the brightness calculation is that if the brightness is calculated based on the input volume V _receive , there is a problem that the brightness is not stable and flickers. This is because it takes a short time to rise and then falls, so that the time when the brightness is high is short and it is difficult to recognize which person on the screen S (j) has spoken. .
Therefore, the processing volume V _display is obtained by averaging the input volume V _receive during the speech, and after the speech is finished, it is assumed that the processing volume V _{display is} gradually attenuated from the processing volume V _display during the speech. Further, the attenuation after the end of the speech is made gentler as the speech duration time is shorter.
FIG. 4 shows the case of a short utterance such as a conflict, and FIG. 5 shows the case of a utterance longer than a predetermined time. 4 and 5, T _speak is a speaking period (speech period), and T _silent is a silent period (silent period). V _level is a volume threshold set at a predetermined interval. 4 (a) and 5 (a) show the case where the volume during speaking is high, and FIGS. 4 (b) and 5 (b) show the case where the volume during speaking is low.

図６及び図７のフローチャートに従って、発言継続時間に応じた加工音量Ｖ_displayの算出手法について説明する。このチャートに従った処理により、入力音量Ｖ_receive及び加工音量Ｖ_displayの変化に基づいて、立上り状態（state１）、立下り（継続）状態（state２）、安定状態（state３）に振り分けられ、それぞれの状態に適する式によって、加工音量Ｖ_displayが算出される。
ｔは現在時刻、Ｖ_receive(ｔ)は現在の入力音量、Ｖ_receive(ｔ−Δｔ)は現在から時間Δｔだけ過去の時点の入力音量、Ｖ_levelｉはＶ_receive(ｔ−Δｔ)から見て上にある閾値、Ｖ_{levelｉ−１}はＶ_receive(ｔ−Δｔ)から見て下にある閾値、ｔ_baseは変化（立上り又は立下り）検出時の時刻、Ｖ_baseは算出時の基準の音量である。
以下、サイズ・輝度算出部２０６５が行う処理である。この処理は、所定時間おきに繰り返されている。 A method for calculating the processing volume V _display according to the speech duration will be described with reference to the flowcharts of FIGS. 6 and 7. By processing according to this chart, based on the change of the input volume V _receive and the processed volume V _display , it is distributed into a rising state (state 1), a falling (continuation) state (state 2), and a stable state (state 3). The processing volume V _display is calculated by an expression suitable for the state.
t is the current time, V _receive (t) is the current input volume, V _receive (t−Δt) is the input volume at the past time Δt from the present, and V _leveli is _higher than V _receive (t−Δt). , V _leveli-1 is a threshold below from V _receive (t−Δt), t _base is a time when a change (rising or falling) is detected, and V _base is a reference volume at the time of calculation. .
The following processing is performed by the size / luminance calculation unit 2065. This process is repeated every predetermined time.

まず、時間(ｔ−Δｔ)から時間ｔまでの間に、入力音量Ｖ_receive(ｔ)が閾値Ｖ_levelｉをまたいで上昇したことを検出すると（Ｓ１のＹｅｓ）、どの閾値まで上昇したのかを検出し（Ｓ２，Ｓ３）、音量が上がりきった時点の時刻を保存する（Ｓ４）。
また、時間(ｔ−Δｔ)から時間ｔまでの間に、入力音量Ｖ_receive(ｔ)が閾値Ｖ_levelｉをまたいで上昇しなかった場合（Ｓ１のＮｏ）、入力音量Ｖ_receive(ｔ)が閾値Ｖ_{levelｉ−１}をまたいで下降したことを検出すると（Ｓ５のＹｅｓ）、どの閾値まで下降したのかを検出し（Ｓ６，Ｓ７）、音量が下がりきった時点の時刻ｔをｔ_baseとして保存する（Ｓ４）。 First, when it is detected that the input sound volume V _receive (t) has risen across the threshold value V _leveli from time (t−Δt) to time t (Yes in S1), it is detected to which threshold value it has risen. (S2, S3), the time when the volume is fully increased is stored (S4).
Further, when the input volume V _receive (t) does not increase across the threshold value V _leveli from time (t−Δt) to time t (No in S1), the input volume V _receive (t) is the threshold value. When it is detected that descends across the _{V leveli-1} (S5 Yes in) to detect whether lowered to any threshold (S6, S7), to store the time t at which the volume is fully lowered as t _base ( S4).

続いて、時間(ｔ−Δｔ)から時間ｔまでの間に入力音量Ｖ_receive(ｔ)が閾値Ｖ_levelｊをまたいで上昇していた場合（Ｓ１１のＹｅｓ）は、立上り状態（state１）であるとし（Ｓ１２）、式（９）にて加工音量Ｖ_displayを算出する（Ｓ１３）。 Subsequently, when the input sound volume V _receive (t) increases across the threshold V _levelj from time (t−Δt) to time t (Yes in S11), it is assumed that the state is the rising state (state 1). (S12) The processing volume V _display is calculated by equation (9) (S13).

Ｖ_display＝Ｖ_receive …（９） V _display = V _receive (9)

そして、state_preにstateを保存して（Ｓ１４）、処理を終了する。 Then, the state is stored in the state _pre (S14), and the process ends.

時間(ｔ−Δｔ)から時間ｔまでの間に入力音量Ｖ_receive(ｔ)が閾値Ｖ_levelｊをまたいで上昇していなかった場合（Ｓ１１のＮｏ）で、前回のstateであるstate_preが１か３であった場合（Ｓ１５のＹｅｓ）で、時間(ｔ−Δｔ)から時間ｔまでの間に入力音量Ｖ_receive(ｔ)が閾値Ｖ_{levelｊ−１}をまたいで下降していた場合（Ｓ１６のＹｅｓ）には、立下り状態（state２）であるとし（Ｓ１７）、算出基準音量Ｖ_baseに時間Δｔだけ過去の時点の加工音量Ｖ_displayを代入し（Ｓ１８）、式（１０）にて加工音量Ｖ_displayを算出する（Ｓ１９）。 If the input volume V _receive (t) has not increased across the threshold value V _levelj from time (t−Δt) to time t (No in S11), is the previous state state _pre 1? 3 (Yes in S15), and the input volume V _receive (t) has _fallen across the threshold V _levelj−1 from time (t−Δt) to time t (Yes in S16). ) Is assumed to be in the falling state (state 2) (S17), and the processing volume V _display at the past time point Δt is substituted for the calculation reference volume V _base (S18), and the processing volume V is expressed by equation (10). _Display is calculated (S19).

Ｖ_display＝Ｖ_{levelｊ−１}＋(Ｖ_base−Ｖ_{levelｊ−１})×exp(−ａ_ｊ×Ｔ_speak×(ｔ−ｔ_base))
…（１０） V _display = V _levelj−1 + (V _base −V _levelj−1 ) × exp (−a _j × T _speak × (t−t _base ))
(10)

これにより、加工音量Ｖ_displayは、入力音量Ｖ_receiveよりも緩やかに減衰する。
そして、state_preにstateを保存して（Ｓ１４）、処理を終了する。 Thereby, the processing volume V _display attenuates more slowly than the input volume V _receive .
Then, the state is stored in the state _pre (S14), and the process ends.

Ｓ１６で、時間(ｔ−Δｔ)から時間ｔまでの間に入力音量Ｖ_receive(ｔ)が閾値Ｖ_{levelｊ−１}をまたいで下降していなかった場合（Ｓ１６のＮｏ）には、安定状態（state３）であるとし（Ｓ２０）、式（１１）にて加工音量Ｖ_displayを算出する（Ｓ２１）。 If the input volume V _receive (t) has not decreased across the threshold V _levelj−1 from time (t−Δt) to time t in S16 (No in S16), the stable state (state 3 ) (S20), and the processing volume V _display is calculated by equation (11) (S21).

Ｓ１５で、前回のstateであるstate_preが２であった場合（Ｓ１５のＮｏ）で、時間Δｔだけ過去の加工音量Ｖ_displayが入力音量Ｖ_receiveと同等の値まで減衰していなかった場合（Ｓ２２のＹｅｓ）、立下り継続状態（state２）であるとし（Ｓ２３）、加工音量Ｖ_display(ｔ−Δｔ)が閾値Ｖ_{levelｊ−１}に十分近づいたか、即ち加工音量Ｖ_display(ｔ−Δｔ)とＶ_{levelｊ−１}との差が所定値ΔＶ未満になっていれば（Ｓ２４のＹｅｓ）、基準を下の閾値に下げて（Ｓ２５）、式（１０）にて加工音量Ｖ_displayを算出する（Ｓ２６）。 In S15, when the state _pre that is the previous state is 2 (No in S15), the past processing volume V _display has not attenuated to a value equivalent to the input volume V _receive for the time Δt (S22). Yes), the falling continuation state (state 2) is assumed (S23), and the processing volume V _display (t−Δt) has sufficiently approached the threshold V _levelj−1 , that is, the processing volume V _display (t−Δt) and V _If the difference from _levelj−1 is less than the predetermined value ΔV (Yes in S24), the reference is lowered to the lower threshold value (S25), and the processing volume V _display is calculated by equation (10) (S26). .

Ｓ２２で、時間Δｔだけ過去の加工音量Ｖ_displayが入力音量Ｖ_receiveと同等の値まで減衰していた場合（Ｓ２２のＮｏ）には、安定状態（state３）であるとし（Ｓ２７）、式（１１）にて加工音量Ｖ_displayを算出する（Ｓ２８）。 In S22, when the past processed sound volume V _display is attenuated to a value equivalent to the input sound volume V _receive for the time Δt (No in S22), the stable state (state 3) is assumed (S27), and the expression (11) ) To calculate the processing volume V _display (S28).

なお、本実施形態では、音量の増減を全て輝度の増減に振り替えているが、実施にあたっては、音量の増減を輝度の増減だけでなくスクリーンの拡大縮小にも振り分けてもよい。 In the present embodiment, the increase / decrease of the volume is all transferred to the increase / decrease of the luminance. However, in the implementation, the increase / decrease of the volume may be distributed not only to the increase / decrease of the luminance but also to the enlargement / reduction of the screen.

以上、説明したように、マルチスクリーン（マルチウィンドウ）表示が可能な通信端末装置を例にした本実施形態によれば、各通信相手を表示しているスクリーンの大きさ、スクリーンに対応付けられた音声の大きさに応じた輝度制御を行うことにより、適切な注目度を各スクリーンに対して与えることができると共に、大きいスクリーンに過度に高い輝度を適用して消費電流を大きく増加させることを回避できる。
また、スクリーンへの輝度制御を行った場合でも、表示デバイスに許容される消費電流を超えない一定値以下（許容できる最大の電流値つまり上限値）に抑えることができるので、バッテリー駆動である通信端末装置の通話可能時間等の装置の性能（電池の持ち）に悪影響を与えることを回避できる。
また、音声の大きさが最大の話者を特定するのではなく、音声の大きさの比による輝度制御を行うことで、複数の話者が略同程度の音声の大きさで発言している場合でも、誰と誰の議論であるかが視覚的に判り易い画面とすることができ、また、輝度を上げるスクリーンが頻繁に入れ替わって見難い画面となることを回避できる。
更に、受信音量の変化に対して、受信音量が大きく増大した場合つまり発言開始時には即時に輝度を増加させる一方、発言終了後に輝度を減衰させる際には発声期間を考慮に入れ、発声期間が長いほど速く、発声期間が短いほど遅く減衰させている。これにより、新たな発声に対して即時に輝度が上がる一方で、相槌や返事等の短い発言に対して、輝度のが穏やかに減衰してゆくため、短期的に輝度が増加／減衰を起こす目障りなスクリーンの輝度変動をなくすことが可能となる。 As described above, according to the present embodiment taking the communication terminal device capable of multi-screen (multi-window) display as an example, the size of the screen displaying each communication partner is associated with the screen. By performing brightness control according to the volume of the sound, it is possible to give each screen an appropriate degree of attention, and avoid excessive increase in current consumption by applying excessively high brightness to a large screen. it can.
In addition, even when brightness control is performed on the screen, it can be kept below a certain value that does not exceed the current consumption allowed for the display device (the maximum allowable current value, that is, the upper limit value). It is possible to avoid adversely affecting the performance (battery holding) of the device such as the talkable time of the terminal device.
Also, instead of identifying the speaker with the largest voice volume, brightness control is performed by the ratio of the voice volume, so that multiple speakers speak at approximately the same volume. Even in this case, it is possible to make a screen that makes it easy to visually understand who and who are discussing, and it is possible to avoid a screen that increases brightness frequently from being replaced frequently.
Furthermore, when the reception volume is greatly increased with respect to the change in the reception volume, that is, when the speech starts, the luminance is immediately increased. On the other hand, when the luminance is attenuated after the end of the speech, the utterance period is taken into consideration and the speech period is long. The attenuation is faster as the utterance period is shorter. As a result, the brightness immediately increases for a new utterance, but the brightness gradually attenuates for short utterances such as conflicts and replies. It is possible to eliminate the brightness fluctuation of the screen.

本発明の一実施形態における携帯端末装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the portable terminal device in one Embodiment of this invention. 本発明の一実施形態における携帯端末装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the portable terminal device in one Embodiment of this invention. 本発明の一実施形態における画面の正面図である。It is a front view of the screen in one Embodiment of this invention. 本発明の一実施形態における音量の経時変化を表すグラフである。It is a graph showing the time-dependent change of the volume in one Embodiment of this invention. 本発明の一実施形態における音量の経時変化を表すグラフである。It is a graph showing the time-dependent change of the volume in one Embodiment of this invention. 本発明の一実施形態における発言継続時間に応じた輝度制御用音量を得る過程を示すフローチャートである。It is a flowchart which shows the process in which the volume for brightness | luminance control according to the speech continuation time in one Embodiment of this invention is obtained. 本発明の一実施形態における発言継続時間に応じた輝度制御用音量を得る過程を示すフローチャートである。It is a flowchart which shows the process in which the volume for brightness | luminance control according to the speech continuation time in one Embodiment of this invention is obtained.

Explanation of symbols

１０…エンコード装置、１０１…音声入力部、１０２…画像入力部、１０３…操作部、１０４…音声符号化処理部、１０５…画像符号化処理部、１０６…端末制御部、１０７…制御情報生成部、１０８…送信パケット生成部、１０９…ネットワークＩ／Ｆ
２０…デコード装置２０、２０１…ネットワークＩ／Ｆ、２０２…操作部、２０３…受信パケット解析部、２０４…音声復号処理部、２０５…画像復号処理部、２０６…画像制御部、２０７…音量修正部、２０８…音声出力部、２０９…画像補正部、２１０…画像出力部（表示手段）、２１１…自端末制御部
２０６１…天地補正部、２０６２…顔エリア検出部（抽出手段）、２０６３…スクリーン判定部、２０６４…切り出し処理部、２０６５…サイズ・輝度算出部（上限設定手段、輝度制御手段、表示面積制御手段）、２０６６…縮小・拡大処理部、２０６７…表示位置算出部、２０６８…マッピング処理部 DESCRIPTION OF SYMBOLS 10 ... Encoding apparatus 101 ... Audio | voice input part 102 ... Image input part 103 ... Operation part 104 ... Voice encoding process part 105 ... Image encoding process part 106 ... Terminal control part 107 ... Control information generation part 108: Transmission packet generator 109: Network I / F
DESCRIPTION OF SYMBOLS 20 ... Decoding apparatus 20, 201 ... Network I / F, 202 ... Operation part, 203 ... Received packet analysis part, 204 ... Audio | voice decoding process part, 205 ... Image decoding process part, 206 ... Image control part, 207 ... Volume correction part , 208 ... Audio output unit, 209 ... Image correction unit, 210 ... Image output unit (display means), 211 ... Self-terminal control unit 2061 ... Top and bottom correction unit, 2062 ... Face area detection unit (extraction means), 2063 ... Screen determination , 2064 ... cutout processing unit, 2065 ... size / luminance calculation unit (upper limit setting means, luminance control means, display area control means), 2066 ... reduction / enlargement processing unit, 2067 ... display position calculation unit, 2068 ... mapping processing unit

Claims

In a communication terminal device that receives images and sounds of a plurality of communication partners, displays each image on a display means and pronounces the sound,
Upper limit setting means for setting an upper limit value of the luminance of each area based on the size of the area occupied by each image in the display means;
A communication terminal apparatus comprising: luminance control means for controlling the luminance of each area within a range not exceeding the upper limit value.

The communication terminal apparatus according to claim 1, wherein the brightness control unit controls the brightness of the area occupied by the image received together with the sound based on the volume of the sound.

The upper limit setting unit sets an upper limit value of luminance within a range that does not exceed a current consumption allowed for the display unit, and the luminance control unit takes into account the size of the region in controlling the luminance of the region. The communication terminal apparatus according to claim 1 or 2, wherein

The display area control means for changing the size of the area occupied by the image received together with the sound based on the size of the sound, according to any one of claims 1 to 3. Communication terminal device.

The communication terminal apparatus according to claim 1, wherein each pixel of the display unit is configured by a self-luminous element.

An image display method in a communication terminal device that receives images and sounds of a plurality of communication partners, displays each image on a display means, and pronounces the sound,
Based on the size of the area occupied by each image in the display means, set the upper limit of the brightness of each area,
A method for displaying an image in a communication terminal device, wherein the luminance of each of the regions is controlled within a range not exceeding the upper limit.

7. The image display method according to claim 6, wherein brightness of the area occupied by the image received together with the sound is controlled based on the volume of the sound.

8. The upper limit value of luminance is set within a range not exceeding the current consumption allowed for the display means, and the size of the region is taken into account in controlling the luminance of the region. Image display method in the communication terminal apparatus.

9. The image display method in the communication terminal device according to claim 6, wherein the size of the area occupied by the image received together with the sound is changed based on the size of the sound. .

In a communication terminal device that receives images and sounds of a plurality of communication partners and reproduces the images together with the sounds,
Detecting the start and end of the communication partner's speech based on the volume of the voice;
Set the luminance at the time of playing the image based on the volume of the sound received together with the image, when the utterance is ended, attenuate the luminance according to the duration of the utterance ,
When the speech is finished , the communication terminal apparatus is characterized in that the luminance is attenuated faster as the duration of the speech is longer, and slower as the duration of the speech is shorter .