JPH09116883A

JPH09116883A - Video telephone system

Info

Publication number: JPH09116883A
Application number: JP7270004A
Authority: JP
Inventors: Koichi Terada; 光一寺田; Yoshitake Kurokawa; 能毅黒川; Yasumasa Hattori; 康政服部; Toshio Tanaka; 利男田中; Takashi Yoshitomi; 隆吉冨
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-10-18
Filing date: 1995-10-18
Publication date: 1997-05-02

Abstract

PROBLEM TO BE SOLVED: To recognize which opposite party is talking among plural call opposite parties by obtaining the physical position of a partial video whose change quantity with a previous frame partial picture in respective video information is the largest on a video display part. SOLUTION: A reception part 72 separately receives video information of plural terminals and receives mixed sound information obtained by mixing sound information of the plural terminals. Sound output means 41 and 42 are arranged near video display means 21 and 22 corresponding to the plural terminals. Position information detection means 142, 143 and 144 obtain change quantity with the previous partial pictures in video information, obtain the physical positions of the partial video whose change quantity is the largest on the video display parts 21 and 22 and execute counting. A sound volume control means 612 sound volume-controls the decoding signals of mixed sound information based on output from the position information detection means so that sound from the sound output means 41 and 42 correspond to the videos from the corresponding video display means and transmits the signals to the sound output means 41 and 42.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、テレビ電話装置や
テレビ会議装置などの構成要素の一部として用いられ
る、主に利用者の音声を再現するための音声出力装置に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio output device used mainly as a part of components such as a videophone device and a videoconference device for reproducing a user's voice.

【０００２】[0002]

【従来の技術】近年になって、各種通信インフラの整
備、画像処理技術の向上などに伴い、テレビ電話装置が
実用的に利用されるようになってきている。特に、最近
のパーソナルコンピュータ（以下、ＰＣと略記する）の
処理性能の向上に伴い、従来のようなテレビ電話専用装
置ではなく、ＰＣをテレビ電話構成部品として活用する
動きが目立っている。2. Description of the Related Art In recent years, with the development of various communication infrastructures and the improvement of image processing technology, video telephone devices have come into practical use. In particular, with the recent improvement in the processing performance of personal computers (hereinafter abbreviated as PCs), there is a noticeable movement of utilizing PCs as videophone constituent parts instead of conventional videophone dedicated devices.

【０００３】このような例には、例えば、米国インテル
社が１９９４年に発表したインテル・プロシェア・ビデ
オシステム２００がある。しかし、このようなＰＣを利
用したテレビ会議装置には次に述べるような欠点があっ
た。An example of such an example is the Intel ProShare video system 200 announced by Intel Corporation in the United States in 1994. However, the video conference device using such a PC has the following drawbacks.

【０００４】[0004]

【発明が解決しようとする課題】ＰＣ利用のテレビ会議
装置は、通常、二人以上での通話を想定している。３名
以上での通話においては、ＰＣに搭載されたマルチウイ
ンドウシステムの機能により、２名以上の通話相手を同
時に画面に表示する。このように画面上では２名以上の
通話相手が別のウインドウに表示されることにより、現
在誰が発言しているかが判別できるようにしている。A video conference apparatus using a PC usually assumes a telephone conversation between two or more people. In the case of a call with three or more people, two or more callers are simultaneously displayed on the screen by the function of the multi-window system installed in the PC. In this way, two or more callers are displayed in different windows on the screen so that it is possible to determine who is currently speaking.

【０００５】また、通話相手の声質を聞き分けること
で、利用者が通話相手を判断することも行われる。しか
しながら、実際に装置を利用する場合には、通信回線な
どの都合によって、画像の更新が遅れたり、音声の品質
が悪化したりすることがあり、通話中にどの通話相手が
発言しているのかがわかりにくくなることがあった。Further, by distinguishing the voice quality of the call partner, the user can judge the call partner. However, when actually using the device, the update of the image may be delayed or the quality of the voice may be deteriorated due to the circumstances such as the communication line. Which party is talking during the call? Was sometimes difficult to understand.

【０００６】[0006]

【課題を解決するための手段】上記課題は、複数端末の
映像情報を別々に受信し、且つ前記複数端末の音声情報
を混合した混合音声情報を受信する受信手段と、前記複
数端末に対応する複数の映像表示手段と、前記複数端末
に対応し且つ前記映像表示手段の近傍に配置された複数
の音声出力手段と、それぞれの映像情報における前フレ
ーム部分画像との変化量を求める手段と、変化量の最も
大きい部分映像の前記映像表示部上での物理的位置を求
めカウントする位置情報検出手段と、音声出力手段から
の音声が相応の映像表示手段からの映像に対応するよう
に、前記位置情報出手段からの出力に基づいて、前記混
合音声情報の復号信号を音量制御して前記複数の音声出
力手段に送出する音量制御手段とを備えることによって
解決される。Means for Solving the Problems The above-mentioned problems correspond to the receiving means for separately receiving the video information of a plurality of terminals and receiving the mixed audio information obtained by mixing the audio information of the plurality of terminals, and the plurality of terminals. A plurality of video display means, a plurality of audio output means corresponding to the plurality of terminals and arranged in the vicinity of the video display means, and a means for obtaining the amount of change from the previous frame partial image in each video information, Position information detecting means for finding and counting the physical position of the largest amount of partial video on the video display part, and the position so that the audio from the audio output means corresponds to the video from the corresponding video display means. Volume control means for controlling the volume of the decoded signal of the mixed voice information based on the output from the information outputting means and sending it to the plurality of voice output means.

【０００７】[0007]

【発明の実施の形態】以下、本発明の一実施形態につい
て、図面を用いて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to the drawings.

【０００８】初めに、本発明に関わるテレビ電話装置の
構成について説明する。図８はＰＣを利用した一般的な
テレビ電話装置の外観図である。図８に於いて、１はＰ
Ｃ、２は表示部、３は映像入力部、４は音声出力部、５
は音声入力部である。この図では、表示部２に２名の通
話相手の映像が左右に並べて表示されている状態を示し
ている。以後、このような映像配置を前提に記述する。First, the configuration of the videophone device according to the present invention will be described. FIG. 8 is an external view of a general videophone device using a PC. In FIG. 8, 1 is P
C, 2 is a display unit, 3 is a video input unit, 4 is an audio output unit, 5
Is a voice input unit. This figure shows a state in which images of two call partners are displayed side by side on the display unit 2. Hereinafter, description will be made on the premise of such a video arrangement.

【０００９】次に図９は、ＰＣを利用した一般的なテレ
ビ電話装置の内部構成図である。図９に於いて、１はＰ
Ｃ、２は映像表示部、３は映像入力部、４は音声出力
部、５は音声入力部、６は映像音声処理部、７は通信処
理部である。ＰＣ１は、ＣＰＵ１１、主記憶１２、ＨＤ
Ｄ１３、映像再生部１４によって構成される。映像音声
処理部６は、音声再生部６１、映像圧縮部６２、音声圧
縮部６３によって構成される。通信処理部７は、送信部
７１、受信部７２とから構成される。Next, FIG. 9 is an internal block diagram of a general videophone device using a PC. In FIG. 9, 1 is P
C, 2 is a video display unit, 3 is a video input unit, 4 is a voice output unit, 5 is a voice input unit, 6 is a video / audio processing unit, and 7 is a communication processing unit. The PC 1 has a CPU 11, a main memory 12, and an HD.
D13 and the video reproducing unit 14. The video / audio processing unit 6 includes an audio reproduction unit 61, a video compression unit 62, and an audio compression unit 63. The communication processing unit 7 includes a transmission unit 71 and a reception unit 72.

【００１０】この図では、映像再生部１４がＰＣ１の構
成要素として示したが、これは異なる構成要素の組み合
わせであっても良い。また音声再生部６１、映像圧縮部
６２、音声圧縮部６３が同じ構成要素上に存在するが、
これらは別々の構成要素上にあっても良い。また、映像
音声処理部６と通信処理部７とが別々の構成要素として
あるが、これらを同じ要素上に集積しても構わない。Although the video reproducing section 14 is shown as a component of the PC 1 in this figure, it may be a combination of different components. Further, although the audio reproducing unit 61, the video compressing unit 62, and the audio compressing unit 63 exist on the same component,
These may be on separate components. Further, although the video / audio processing unit 6 and the communication processing unit 7 are separate components, they may be integrated on the same component.

【００１１】続いて、従来技術を用いた従来例につい
て、図１０を用いて説明する。図１０に於いて、７２は
受信部、７３はデータ分配部、６１１１は音声１復号
部、６１１２は音声２復号部、１４１１は映像１復号
部、１４１２は映像２復号部、２１は映像１表示部、２
２は映像２表示部、４１は左音声出力部、４２は右音声
出力部、６１２は音量制御部である。映像１表示部２１
と映像２表示部２２は別々の構成要素として記載した
が、これは物理的には単一の構成要素上にある２つの領
域を示しているものである。Next, a conventional example using the conventional technique will be described with reference to FIG. In FIG. 10, 72 is a receiving unit, 73 is a data distributing unit, 6111 is a voice 1 decoding unit, 6112 is a voice 2 decoding unit, 1411 is a video 1 decoding unit, 1412 is a video 2 decoding unit, and 21 is a video 1 display. Part, 2
Reference numeral 2 is a video 2 display unit, 41 is a left audio output unit, 42 is a right audio output unit, and 612 is a volume control unit. Video 1 display 21
Although the image 2 display unit 22 and the image 2 display unit 22 are described as separate components, this physically shows two areas on a single component.

【００１２】次に図１１は、図１０に示した従来例を用
いたテレビ電話システムの、情報フロー概念図である。
図１１に於いて、２０１は端末１、２０２は端末２、２
０３は端末３、２０４はホスト装置である。また３４１
は端末１からホスト装置へ送られる端末１上り情報、３
４２は端末２からホスト装置へ送られる端末２上り情
報、３４３はホスト装置から端末３へ送られる端末３下
り情報である。図１１は簡単のため、端末３が受け取る
情報の流れのみを記述してあるが、実際の運用に際して
は端末１と端末２とが受け取る情報も同様に流れる。Next, FIG. 11 is a conceptual diagram of the information flow of the videophone system using the conventional example shown in FIG.
In FIG. 11, 201 is a terminal 1, 202 is a terminal 2, 2
Reference numeral 03 is a terminal 3, and 204 is a host device. Again 341
Is the terminal 1 uplink information sent from the terminal 1 to the host device, 3
42 is terminal 2 upstream information sent from the terminal 2 to the host device, and 343 is terminal 3 downstream information sent from the host device to the terminal 3. Although FIG. 11 describes only the flow of information received by the terminal 3 for simplicity, the information received by the terminals 1 and 2 also flows in the same manner during actual operation.

【００１３】次に本従来例の動作について説明する。図
１１に示すように、端末１からは、端末１の映像情報
（V(1)）と音声情報（A(1)）とがホスト装置に送られ
る。同様に、端末２からは、端末２の映像情報（V(2)）
と音声情報（A(2)）とがホスト装置に送られる。ホスト
装置は、これらの情報をそれぞれ受け取り、各情報を必
要とする端末に向けて情報を再送する。Next, the operation of this conventional example will be described. As shown in FIG. 11, the terminal 1 sends video information (V (1)) and audio information (A (1)) of the terminal 1 to the host device. Similarly, from terminal 2, video information of terminal 2 (V (2))
And voice information (A (2)) are sent to the host device. The host device receives these pieces of information, and retransmits the information to the terminal that needs each piece of information.

【００１４】この例では、端末１の映像情報と音声情
報、端末２の映像情報と音声情報をそれぞれ、下り情報
３４３として端末３に再送する。端末３では、受信部７
２で受信した情報を、データ分配部７３でそれぞれの情
報に分配する。分配した情報はそれぞれ対応する復号部
に送られる。In this example, the video information and audio information of the terminal 1 and the video information and audio information of the terminal 2 are retransmitted as downlink information 343 to the terminal 3. In the terminal 3, the receiving unit 7
The data distribution section 73 distributes the information received in step 2 to the respective information. The distributed information is sent to the corresponding decoding units.

【００１５】図１０では、端末１の映像情報は映像１復
号部に送られ、映像１表示部２１で表示される。同様に
端末１の音声情報は音声１復号部に送られ、音量制御部
６１２を介して出力される。端末２の映像音声情報につ
いても同様である。この方式では、全ての情報を別々に
送るために情報量が増え、情報の高速伝送が必須とな
る。この点を解決するには、複数の音声情報を混ぜ合わ
せて伝送するのが良い。テレビ電話システムでは、２名
以上の通話者が同時に発言することが少ないからであ
る。In FIG. 10, the video information of the terminal 1 is sent to the video 1 decoding unit and displayed on the video 1 display unit 21. Similarly, the voice information of the terminal 1 is sent to the voice 1 decoding unit and output via the volume control unit 612. The same applies to the video / audio information of the terminal 2. In this method, the amount of information is increased in order to send all information separately, and high-speed transmission of information is essential. To solve this problem, it is better to mix and transmit a plurality of audio information. This is because in a videophone system, it is rare for two or more parties to speak at the same time.

【００１６】次に上述した原理を利用した本発明の一実
施形態について説明する。図１は、本発明の第一実施形
態の構成図である。図１に於いて、７２は受信部、７３
はデータ分配部、６１１は音声復号部、１４１１は映像
１復号部、１４１２は映像２復号部、２１は映像１表示
部、２２は映像２表示部、４１は左音声出力部、４２は
右音声出力部、６１２は音量制御部、６１３は左右バラ
ンス復号部である。映像１表示部２１と映像２表示部２
２は別々の構成要素として記載したが、これは物理的に
は単一の構成要素上にある２つの領域を示しているもの
である。Next, an embodiment of the present invention utilizing the above principle will be described. FIG. 1 is a configuration diagram of a first embodiment of the present invention. In FIG. 1, reference numeral 72 denotes a receiving unit, and 73
Is a data distribution unit, 611 is an audio decoding unit, 1411 is a video 1 decoding unit, 1412 is a video 2 decoding unit, 21 is a video 1 display unit, 22 is a video 2 display unit, 41 is a left audio output unit, and 42 is a right audio. An output unit, 612 is a volume control unit, and 613 is a left / right balance decoding unit. Video 1 display 21 and video 2 display 2
Although the two have been described as separate components, this physically refers to two regions on a single component.

【００１７】次に図２は、図１に示した本発明のテレビ
電話システムの、情報フロー概念図である。図２に於い
て、２０１は端末１、２０２は端末２、２０３は端末
３、２０４はホスト装置である。また３５１は端末１か
らホスト装置へ送られる端末１上り情報、３５２は端末
２からホスト装置へ送られる端末２上り情報、３５３は
ホスト装置から端末３へ送られる端末３下り情報であ
る。図２は簡単のため、端末３が受け取る情報の流れの
みを記述してあるが、実際の運用に際しては端末１と端
末２とが受け取る情報も同様に流れる。Next, FIG. 2 is an information flow conceptual diagram of the video telephone system of the present invention shown in FIG. In FIG. 2, 201 is a terminal 1, 202 is a terminal 2, 203 is a terminal 3, and 204 is a host device. 351 is terminal 1 upstream information sent from the terminal 1 to the host device, 352 is terminal 2 upstream information sent from the terminal 2 to the host device, and 353 is terminal 3 downstream information sent from the host device to the terminal 3. Although FIG. 2 describes only the flow of information received by the terminal 3 for simplicity, the information received by the terminals 1 and 2 also flows in the same manner during actual operation.

【００１８】次に本第一実施形態の動作について説明す
る。図２に示すように、端末１からは、端末１の映像情
報（V(1)）と音声情報（A(1)）とがホスト装置に送られ
る。同様に、端末２からは、端末２の映像情報（V(2)）
と音声情報（A(2)）とがホスト装置に送られる。Next, the operation of the first embodiment will be described. As shown in FIG. 2, the terminal 1 sends video information (V (1)) and audio information (A (1)) of the terminal 1 to the host device. Similarly, from terminal 2, video information of terminal 2 (V (2))
And voice information (A (2)) are sent to the host device.

【００１９】ホスト装置は、これらの情報をそれぞれ受
け取り、各情報を必要とする端末に向けて情報を再送す
る。この例では、端末１と端末２の映像情報、端末１と
端末２との音声情報を混合した音声情報、及び音声情報
の左右バランス情報をそれぞれ、下り情報３４３として
端末３に再送する。端末３では、受信部７２で受信した
情報を、データ分配部７３でそれぞれの情報に分配す
る。分配した情報はそれぞれ対応する復号部に送られ
る。The host device receives each of these pieces of information and retransmits the information to the terminal that needs each piece of information. In this example, the video information of the terminals 1 and 2, the audio information in which the audio information of the terminals 1 and 2 is mixed, and the left-right balance information of the audio information are retransmitted to the terminal 3 as downlink information 343. In the terminal 3, the information received by the receiving section 72 is distributed to the respective information by the data distributing section 73. The distributed information is sent to the corresponding decoding units.

【００２０】図１では、端末１の映像情報は映像１復号
部に送られ、映像１表示部２１で表示される。端末２の
映像情報は映像２復号部に送られ、映像２表示部２２で
表示される。混合されている音声情報は音声復号部６１
１を介して音量制御部６１２に送られる。音量制御部
は、左右バランス復号部６１３からの信号を元に、左音
声出力部４１と右音声出力部４２とに出力情報を送る。In FIG. 1, the video information of the terminal 1 is sent to the video 1 decoding unit and displayed on the video 1 display unit 21. The video information of the terminal 2 is sent to the video 2 decoding unit and displayed on the video 2 display unit 22. The audio information mixed is the audio decoding unit 61.
1 to the volume control unit 612. The volume control unit sends output information to the left audio output unit 41 and the right audio output unit 42 based on the signal from the left / right balance decoding unit 613.

【００２１】このような構成とすることによって、端末
３への下り情報の情報量を大幅に減らすことが可能であ
る。これは、左右バランス情報が、音声情報に比べて非
常に小さいからである。With such a configuration, the amount of downlink information to the terminal 3 can be significantly reduced. This is because the left-right balance information is much smaller than the voice information.

【００２２】次に、本発明の別の一実施形態について説
明する。図３は、本発明の第二実施形態の構成図であ
る。図３に於いて、７２は受信部、１４５はデータ分配
部、６１１は音声復号部、１４１１は映像１復号部、１
４１２は映像２復号部、２１は左映像表示部、２２は右
映像表示部、４１は左音声出力部、４２は右音声出力
部、１４２は左映像カウンタ、１４３は右映像カウン
タ、１４４は比較部、６１２は音量制御部である。Next, another embodiment of the present invention will be described. FIG. 3 is a configuration diagram of the second embodiment of the present invention. In FIG. 3, 72 is a receiving unit, 145 is a data distributing unit, 611 is an audio decoding unit, 1411 is a video 1 decoding unit, and 1 is a video decoding unit.
412 is a video 2 decoding unit, 21 is a left video display unit, 22 is a right video display unit, 41 is a left audio output unit, 42 is a right audio output unit, 142 is a left video counter, 143 is a right video counter, and 144 is a comparison. Reference numeral 612 denotes a volume control unit.

【００２３】左映像表示部２１と右映像表示部２２は別
々の構成要素として記載したが、これは物理的には単一
の構成要素上にある２つの領域を示しているものであ
る。また、左映像表示部２１と右映像表示部２２は、
「Ｉ」又は「Ｐ」と示された複数の矩形に分割されてい
るが、「Ｉ」と示された矩形はＩピクチャ、「Ｐ」と示
された矩形はＰピクチャである。Although the left image display section 21 and the right image display section 22 are described as separate components, they physically show two areas on a single component. Further, the left image display unit 21 and the right image display unit 22 are
Although the rectangle is divided into a plurality of rectangles indicated by “I” or “P”, the rectangle indicated by “I” is an I picture and the rectangle indicated by “P” is a P picture.

【００２４】Ｉピクチャは、単純に圧縮した部分画像を
表す矩形画像情報形式であって、直前画像との相関情報
を含まない圧縮画像の情報形式である。Ｐピクチャは、
直前にその領域にあった部分画像との相関情報を含んだ
矩形画像情報形式である。映像のどの矩形部分が、どち
らの情報形式となるかは、送信側での圧縮方法に依存す
る。The I-picture is a rectangular image information format that represents a simply compressed partial image, and is a compressed image information format that does not include correlation information with the immediately preceding image. P picture is
This is a rectangular image information format that includes correlation information with the partial image that existed immediately before that area. Which rectangular portion of the video has which information format depends on the compression method on the transmission side.

【００２５】一般にＰピクチャはＩピクチャより圧縮効
率が良く、データサイズを小さくできる。Ｐピクチャは
直前の部分画像との相関をとる必要があるが、部分画像
が大きく変化していると相関を取ることが出来ない。従
って送信側では、部分画像単位で、大きく変化した部分
はＩピクチャ、あまり変化していない部分はＰピクチャ
で圧縮される。左映像カウンタ１４２、右映像カウンタ
１４３は、それぞれ、左右の映像表示部のＩピクチャ数
とＰピクチャ数とをカウントする機能を持つ。Generally, P pictures have better compression efficiency than I pictures, and the data size can be reduced. The P picture needs to be correlated with the immediately preceding partial image, but cannot be correlated if the partial image changes significantly. Therefore, on the transmission side, in a partial image unit, a greatly changed portion is compressed with an I picture, and a portion that does not change much is compressed with a P picture. The left video counter 142 and the right video counter 143 each have a function of counting the number of I pictures and the number of P pictures of the left and right video display units.

【００２６】次に本第二実施形態の動作について説明す
る。受信部７２からデータ分配部１４５を介して映像復
号部で復号された映像情報は、左右の映像表示部に送ら
れる。ここで、左カウンタ１４２と右カウンタ１４３と
が、映像表示部のＩピクチャの数をカウントする。カウ
ントの結果は、左右の映像それぞれの、大きく映像が変
化している割合として得られる。カウント結果を比較部
１４４で比較し、音量制御部６１２を介して左右の音声
出力部の音量を制御する。図３では、左映像表示部２１
のほうがＩピクチャの数が多いことから、左音声出力部
４１の音量を右音声出力部４２に比べて大きくするよう
に制御する。Next, the operation of the second embodiment will be described. The video information decoded by the video decoding unit from the reception unit 72 via the data distribution unit 145 is sent to the left and right video display units. Here, the left counter 142 and the right counter 143 count the number of I pictures in the video display unit. The result of the count is obtained as a ratio of a large change in the left and right images. The count results are compared by the comparison unit 144, and the volume of the left and right audio output units is controlled via the volume control unit 612. In FIG. 3, the left image display unit 21
Since the number of I-pictures is larger in this case, the volume of the left audio output unit 41 is controlled to be higher than that of the right audio output unit 42.

【００２７】本方式の原理は、主に発言している通話者
が、顔や口を動かしたり、身ぶり手振りを行うことが多
いことから、画像変化量を検出することで、適切な音量
の左右バランスが得られるという点である。現在の画像
圧縮技術の多くは、上述したＩピクチャ、Ｐピクチャに
相当する映像圧縮情報形式を持つ。一般には画像変化量
を検出するには複雑な処理が必要だが、これらの映像圧
縮技術を利用している場合には、単純なカウンタを持つ
だけで画像変化量の比を簡単に得ることが出来る。The principle of the present method is that the caller who is mainly speaking often moves his face or mouth or makes a gesture, so by detecting the amount of image change, the right and left of an appropriate volume can be detected. The point is that balance can be obtained. Most of the current image compression techniques have a video compression information format corresponding to the above-mentioned I picture and P picture. Generally, complicated processing is required to detect the image change amount, but when using these video compression techniques, the ratio of the image change amount can be easily obtained by only having a simple counter. .

【００２８】本第二実施形態は、第一実施形態に比較し
て次のような特徴を持つ。左右バランス情報を伝送する
必要が無くなることから、データ伝送量が削減できる。
また、ホスト装置を用いないリング接続によるテレビ電
話システムにも適用することが可能になる。この点につ
いて、次に説明する。The second embodiment has the following features as compared with the first embodiment. Since it is not necessary to transmit the left-right balance information, the data transmission amount can be reduced.
Further, it can be applied to a videophone system by ring connection without using a host device. This point will be described next.

【００２９】図４は、第二実施形態を用いた場合の、ホ
スト装置を用いたテレビ電話システムの構成図である。
図４に於いて、２０１は端末１、２０２は端末２、２０
３は端末３、２０４はホスト装置である。また３１１は
端末１からホスト装置へ送られる端末１上り情報、３１
２は端末２からホスト装置へ送られる端末２上り情報、
３１３は端末３からホスト装置へ送られる端末３上り情
報、３０１はホスト装置から端末１へ送られる端末１下
り情報、３０２はホスト装置から端末２へ送られる端末
２下り情報、３０３はホスト装置から端末３へ送られる
端末３下り情報である。FIG. 4 is a block diagram of a videophone system using a host device when the second embodiment is used.
In FIG. 4, 201 is a terminal 1, 202 is a terminal 2, 20
3 is a terminal 3, and 204 is a host device. Further, 311 is terminal 1 upstream information sent from the terminal 1 to the host device, 31
2 is the terminal 2 upstream information sent from the terminal 2 to the host device,
313 is terminal 3 upstream information sent from the terminal 3 to the host device, 301 is terminal 1 downstream information sent from the host device to the terminal 1, 302 is terminal 2 downstream information sent from the host device to the terminal 2, 303 is from the host device It is the terminal 3 downlink information sent to the terminal 3.

【００３０】各上り情報は、それぞれの自端末の音声情
報と映像情報とを含む。また、各下り情報は、通話先端
末のそれぞれの映像情報と通話先の全端末の混合音声情
報とを含む。音声左右バランス情報を伝送しないため、
データ伝送量を従来より削減している。また、複数端末
からの映像信号を混合した信号を受信し、受信した複数
の映像信号を分離するような構成とすることも可能であ
る。Each upstream information includes audio information and video information of each own terminal. Further, each downlink information includes video information of each of the callee terminals and mixed audio information of all the callee terminals. Since the audio left / right balance information is not transmitted,
The amount of data transmission is reduced compared to the past. It is also possible to have a configuration in which a signal obtained by mixing video signals from a plurality of terminals is received and the plurality of received video signals are separated.

【００３１】次に図５は、同じく第二実施形態を用いた
場合の、ホスト装置を用いないリング接続テレビ電話シ
ステムの構成図である。図５に於いて、２０１は端末
１、２０２は端末２、２０３は端末３である。また３２
１は端末１から端末３へ送られる混合音声情報、３２２
は端末２から端末１へ送られる混合音声情報、３２３は
端末３から端末２へ送られる混合音声情報、３３１は端
末３から端末１へ送られる映像情報、３３２は端末１か
ら端末２へ送られる映像情報、３３３は端末２から端末
３へ送られる映像情報である。Next, FIG. 5 is a block diagram of a ring connection videophone system that does not use a host device when the second embodiment is also used. In FIG. 5, 201 is a terminal 1, 202 is a terminal 2, and 203 is a terminal 3. Also 32
1 is mixed voice information transmitted from the terminal 1 to the terminal 3 322
Is mixed audio information sent from the terminal 2 to the terminal 1, 323 is mixed audio information sent from the terminal 3 to the terminal 2, 331 is video information sent from the terminal 3 to the terminal 1, and 332 is sent from the terminal 1 to the terminal 2. Video information 333 is video information sent from the terminal 2 to the terminal 3.

【００３２】このようなリング接続のテレビ電話システ
ムは、ホスト装置を必要としない点と、回線コストを抑
えることが可能であるという利点がある。このようなリ
ング接続を行う場合、ホスト装置による音声の混合処理
などは行うことができない。従って、再生する音声の左
右バランスなどの処理は、各端末が独立に行う必要があ
る。このような場合に、本第二実施形態に記載した方式
が特に有効である。Such a ring-connected video telephone system has the advantages that no host device is required and that the line cost can be suppressed. When such a ring connection is performed, audio mixing processing by the host device cannot be performed. Therefore, it is necessary for each terminal to independently perform processing such as left-right balance of reproduced sound. In such a case, the method described in the second embodiment is particularly effective.

【００３３】本第二実施形態の他の利点としては、送信
側の変更を行うことなく、受信側の変更だけで、音声の
左右バランス処理を行うことが出来るという点もある。
左右バランス情報を送信側またはホスト装置に於いて生
成する方法では、左右バランス情報をテレビ電話システ
ムの伝送情報として取り扱う必要がある。Another advantage of the second embodiment is that the left-right balance processing of voice can be performed only by changing the receiving side without changing the transmitting side.
In the method of generating the left-right balance information on the transmitting side or the host device, it is necessary to handle the left-right balance information as transmission information of the videophone system.

【００３４】このようなシステムでは、左右バランス情
報を処理することが出来ない装置に対しては該情報をホ
スト装置から送信しないようにする必要がある。本第二
実施形態の装置では、このような問題は生じないため、
単に受信側の装置を交換するだけで左右音量バランスを
自動調整するシステムとして稼働させることができる。In such a system, it is necessary to prevent the host device from transmitting the left-right balance information to the device that cannot process the information. In the device of the second embodiment, since such a problem does not occur,
It can be operated as a system that automatically adjusts the left-right volume balance by simply replacing the receiving device.

【００３５】次に、本発明のさらに別の一実施形態につ
いて説明する。図６は、本発明の第三実施形態の構成図
である。図６に於いて、７２は受信部、６１１は音声復
号部、１４１は映像復号部、２は映像表示部、２１は左
映像、２２は右映像、４１は左音声出力部、４２は右音
声出力部、１４２は左映像カウンタ、１４３は右映像カ
ウンタ、１４４は比較部、６１２は音量制御部である。Next, still another embodiment of the present invention will be described. FIG. 6 is a configuration diagram of the third embodiment of the present invention. In FIG. 6, 72 is a receiving unit, 611 is an audio decoding unit, 141 is an image decoding unit, 2 is an image display unit, 21 is a left image, 22 is a right image, 41 is a left audio output unit, and 42 is a right audio. An output unit, 142 is a left video counter, 143 is a right video counter, 144 is a comparison unit, and 612 is a volume control unit.

【００３６】本実施形態では、左映像２１と右映像２２
は、一つの映像の左半分と右半分として存在する。この
ため、映像復号部１４１はこれまでの実施形態と異な
り、１つだけが存在する。In this embodiment, the left image 21 and the right image 22 are
Exists as the left and right halves of an image. Therefore, only one video decoding unit 141 exists, unlike the above-described embodiments.

【００３７】次に本第三実施形態の動作について説明す
る。受信部７２から入力されたデータは映像復号部１４
１で復号され、映像表示部２に送られる。ここで、左カ
ウンタ１４２と右カウンタ１４３とが、映像表示部の左
半分、右半分のＩピクチャの数をそれぞれカウントす
る。カウントの結果は、左右の映像それぞれの、大きく
映像が変化している割合として得られる。カウント結果
を比較部１４４で比較し、音量制御部６１２を介して左
右の音声出力部の音量を制御する。図６では、左映像表
示部２１のほうがＩピクチャの数が多いことから、左音
声出力部４１の音量を右音声出力部４２に比べて大きく
するように制御する。Next, the operation of the third embodiment will be described. The data input from the receiving unit 72 is the video decoding unit 14
It is decoded at 1 and sent to the video display unit 2. Here, the left counter 142 and the right counter 143 respectively count the number of I pictures in the left half and right half of the video display unit. The result of the count is obtained as a ratio of a large change in the left and right images. The count results are compared by the comparison unit 144, and the volume of the left and right audio output units is controlled via the volume control unit 612. In FIG. 6, since the left video display unit 21 has a larger number of I pictures, the volume of the left audio output unit 41 is controlled to be higher than that of the right audio output unit 42.

【００３８】次に、本発明のさらに別の一実施形態につ
いて説明する。図７は、本発明の第四実施形態の構成図
である。図７に於いて、７２は受信部、１４５はデータ
分配部、６１１は音声復号部、１４１１は映像１復号
部、１４１２は映像２復号部、２１は左映像表示部、２
２は右映像表示部、４１は左音声出力部、４２は右音声
出力部、１４６は左抽出部、１４７は右抽出部、１４４
は比較部、６１２は音量制御部である。Next, another embodiment of the present invention will be described. FIG. 7 is a configuration diagram of the fourth embodiment of the present invention. In FIG. 7, 72 is a receiving unit, 145 is a data distributing unit, 611 is an audio decoding unit, 1411 is a video 1 decoding unit, 1412 is a video 2 decoding unit, 21 is a left video display unit, 2
2 is a right video display unit, 41 is a left audio output unit, 42 is a right audio output unit, 146 is a left extraction unit, 147 is a right extraction unit, 144
Is a comparison unit, and 612 is a volume control unit.

【００３９】左抽出部１４６は、左映像表示部２１の中
央部分の部分映像を抽出し、Ｉピクチャの数をカウント
する機能を持つ。右抽出部１４７は同様に、右映像表示
部２２の中央部分の部分映像を抽出し、Ｉピクチャの数
をカウントする機能を持つ。The left extraction unit 146 has a function of extracting a partial video in the central portion of the left video display unit 21 and counting the number of I pictures. Similarly, the right extraction unit 147 has a function of extracting a partial video in the central portion of the right video display unit 22 and counting the number of I pictures.

【００４０】次に本第四実施形態の動作について説明す
る。受信部７２から入力されたデータは、データ分配部
１４５で目的別に分配され、映像復号部１４１１及び１
４１２で復号され、映像表示部２１及び２２に送られ
る。左抽出部１４６と右抽出部１４７とが、中央部分の
部分映像を抽出し、それに含まれるＩピクチャの数をカ
ウントする。カウント結果を比較部１４４で比較し、音
量制御部６１２を介して左右の音声出力部の音量を制御
する。Next, the operation of the fourth embodiment will be described. The data input from the reception unit 72 is distributed by the data distribution unit 145 according to the purpose, and the video decoding units 1411 and 1
The data is decoded at 412 and sent to the video display units 21 and 22. The left extracting unit 146 and the right extracting unit 147 extract the partial video in the central portion and count the number of I pictures included in the partial video. The count results are compared by the comparison unit 144, and the volume of the left and right audio output units is controlled via the volume control unit 612.

【００４１】図７では、左抽出部１４６の結果のほう
が、右抽出部１４７よりもＩピクチャが多いため、左音
声出力部４１の音量を右音声出力部４２に比べて大きく
するように制御する。この方式は、カウントの比較が簡
単になることなどからデータ処理量が少なくなるという
特徴がある。また、画像中央部のみを対象とした検出処
理を行うことで、背景の動きに左右されない処理が可能
であり、また画像中央部には通話相手の口元があること
から、発言者を特定するのに適切な処理が可能である。In FIG. 7, since the result of the left extraction unit 146 has more I pictures than the right extraction unit 147, the volume of the left audio output unit 41 is controlled to be higher than that of the right audio output unit 42. . This method is characterized in that the amount of data processing is reduced because the comparison of counts is simplified. In addition, by performing the detection process only on the central part of the image, it is possible to perform the process that is not affected by the movement of the background, and because the central part of the image has the mouth of the other party, the speaker is specified. Appropriate processing is possible.

【００４２】さらに、この例では中央部分を単純に比較
する方法を示したが、中央部分の画像だけが大きく変化
していることを検出してもよい。これにより、通話相手
が大きく体を揺すったような場合には音量バランスの制
御には反映せずに、会話や身ぶりなどの中央部分でのみ
映像が変化する状況に対して音量バランス制御を行うこ
とが可能である。Further, in this example, the method of simply comparing the central portion is shown, but it may be detected that only the image of the central portion is largely changed. This allows volume balance control to be performed for situations where the image changes only in the central part of conversation or body movement, etc. without reflecting it in the volume balance control when the other party shakes a lot. Is possible.

【００４３】また、以上述べた実施形態では、映像情報
から直接音量バランスを調整する手法を示したが、これ
は直接的に制御するだけでなく、例えば過去の一定時間
の積分や平均をとる方法でも良い。このような演算を施
すことで、不自然に急激な音量調整を防ぐことができ
る。Further, in the above-mentioned embodiment, the method of adjusting the volume balance directly from the image information is shown, but this is not only the method of directly controlling but also the method of, for example, integrating or averaging a certain past time. But good. By performing such calculation, it is possible to prevent unnaturally sudden volume adjustment.

【００４４】また本発明の実現手段は、実施形態に記載
したようなハードウエアによるものである必要はなく、
一部又は全部をソフトウエアによって実現しても構わな
い。The means for realizing the present invention need not be hardware as described in the embodiments,
Some or all may be realized by software.

【００４５】また、ウインドウの数が２つ表示されてい
ることを前提に記述しているが、必ずしも２つでなくて
もよく、３つ以上のウインドウを表示しても良い。Further, although the description is given on the assumption that the number of windows is two, the number of windows is not necessarily two and three or more windows may be displayed.

【００４６】上述したように、２名以上の通話相手を表
示するウインドウを水平方向に並べ、発言している通話
相手がどのウインドウに表示されているかを受信装置が
判断し、表示装置の左右に設けられたスピーカの左右バ
ランスを調整することで、画面上のどの方向から音声が
聞こえるかを制御し、誰が発言しているかを利用者に直
感的に分かり易くする。As described above, the windows for displaying two or more callers are arranged in the horizontal direction, and the receiving device determines which window the caller who is speaking is displayed, and the windows are displayed on the left and right sides of the display device. By adjusting the left-right balance of the provided speakers, it is possible to control from which direction on the screen the sound is heard and to make it easy for the user to intuitively understand who is speaking.

【００４７】また、以上述べた実施形態では、映像情報
の属性としては、他の画像との相関情報を含まない矩形
画像情報形式であるＩピクチャと、直前画像との相関情
報を含む矩形画像情報形式であるＰピクチャの二種類の
属性情報が存在するという前提で記述したが、属性情報
は必ずしも二種類である必要はなく、例えば、他の画像
との相関情報を含まない矩形画像情報形式であるＩピク
チャと、直前画像との相関情報を含む矩形画像情報形式
であるＰピクチャと、直前画像及び直後画像との相関情
報を含む矩形画像情報形式であるＢピクチャの三種類の
属性情報が存在してもよい。Further, in the above-described embodiment, the attribute of the video information is the rectangular image information including the correlation information between the I picture which is the rectangular image information format that does not include the correlation information with other images and the immediately preceding image. The description has been made on the assumption that there are two types of attribute information of the P picture, which is the format, but the attribute information does not necessarily have to be two types, for example, in the rectangular image information format that does not include correlation information with other images. There are three types of attribute information: a P picture, which is a rectangular image information format that includes correlation information between an I picture and the immediately preceding image, and a B picture, which is a rectangular image information format that includes correlation information between the immediately preceding image and the immediately following image. You may.

【００４８】[0048]

【発明の効果】以上述べたように本発明によれば、受信
側の装置に簡易な拡張を施すだけで、データ伝送量を増
やすことなく、話者を特定する補助のための左右音量バ
ランスを自動調整することができる。この機能により通
話者は、より直感的にどちらの通話相手から話しかけら
れたのかが判断できるようになり、テレビ電話システム
を利用する際の違和感を取り除くことが可能になる。ま
た該機能を、ホスト装置を設置しないリング接続のテレ
ビ電話システムにも適用することが可能になる。As described above, according to the present invention, the left and right sound volume balance for assisting the identification of the speaker can be achieved without increasing the data transmission amount by simply performing a simple expansion on the receiving side device. It can be adjusted automatically. With this function, the caller can more intuitively determine from which caller the caller is speaking, and it is possible to eliminate discomfort when using the videophone system. Further, it becomes possible to apply the function to a ring-connected video telephone system in which a host device is not installed.

[Brief description of the drawings]

【図１】本発明の第一実施形態の構成図である。FIG. 1 is a configuration diagram of a first embodiment of the present invention.

【図２】本発明の第一実施形態の適用システム例のデー
タフロー概念図である。FIG. 2 is a data flow conceptual diagram of an application system example of the first embodiment of the present invention.

【図３】本発明の第二実施形態の構成図である。FIG. 3 is a configuration diagram of a second embodiment of the present invention.

【図４】本発明の第二実施形態の適用システム例のデー
タフロー概念図（その１）である。FIG. 4 is a data flow conceptual diagram (1) of an application system example of the second embodiment of the present invention.

【図５】本発明の第二実施形態の適用システム例のデー
タフロー概念図（その２）である。FIG. 5 is a data flow conceptual diagram (2) of the application system example of the second embodiment of the present invention.

【図６】本発明の第三実施形態の構成図である。FIG. 6 is a configuration diagram of a third embodiment of the present invention.

【図７】本発明の第四実施形態の構成図である。FIG. 7 is a configuration diagram of a fourth embodiment of the present invention.

【図８】本発明の適用装置例の外観図である。FIG. 8 is an external view of an example of an application device of the present invention.

【図９】本発明の適用装置例の内部構成図である。FIG. 9 is an internal configuration diagram of an example of an application device of the present invention.

【図１０】従来例の構成図である。FIG. 10 is a configuration diagram of a conventional example.

【図１１】従来例の適用システム例のデータフロー概念
図である。FIG. 11 is a data flow conceptual diagram of an example of a conventional application system.

【符号の説明】１パ−ソナルコンピュ−タ（ＰＣ）２映像表示部３映像入力部４音声出力部５音声入力部６映像音声処理部７通信処理部１１ＣＰＵ１４映像再生部２１左映像表示部２２右映像表示部４１左音声出力部４２右音声出力部１４１映像復号部１４２左カウンタ１４３右カウンタ１４４比較器１４５データ分配部１４６左抽出部１４７右抽出部２０１端末１２０２端末２２０３端末３２０４ホスト装置３０１端末１下り情報３０２端末２下り情報３０３端末３下り情報３１１端末１上り情報３１２端末２上り情報３１３端末３上り情報３２１リング式混合音声情報３２２リング式混合音声情報３２３リング式混合音声情報３３１リング式映像情報３３２リング式映像情報３３３リング式映像情報３４１端末１上り情報３４２端末２上り情報３４３端末３下り情報３５１端末１上り情報３５２端末２上り情報３５３端末３下り情報６１１音声復号部６１２音量制御部６１３左右バランス復号部１４１１映像１復号部１４１２映像２復号部６１１１音声１復号部６１１２音声２復号部[Description of symbols] 1 personal computer (PC) 2 video display unit 3 video input unit 4 audio output unit 5 audio input unit 6 video / audio processing unit 7 communication processing unit 11 CPU 14 video playback unit 21 left video display Part 22 Right video display part 41 Left audio output part 42 Right audio output part 141 Video decoding part 142 Left counter 143 Right counter 144 Comparator 145 Data distribution part 146 Left extraction part 147 Right extraction part 201 Terminal 1 202 Terminal 2 203 Terminal 3 204 host device 301 terminal 1 downstream information 302 terminal 2 downstream information 303 terminal 3 downstream information 311 terminal 1 upstream information 312 terminal 2 upstream information 313 terminal 3 upstream information 321 ring mixed voice information 322 ring mixed voice information 323 ring mixed voice Information 331 Ring-type video information 332 Ring-type video information 333 Video information 341 Terminal 1 upstream information 342 Terminal 2 upstream information 343 Terminal 3 downstream information 351 Terminal 1 upstream information 352 Terminal 2 upstream information 353 Terminal 3 downstream information 611 Audio decoding section 612 Volume control section 613 Left-right balance decoding section 1411 Video 1 Decoding unit 1412 Video 2 decoding unit 6111 Audio 1 decoding unit 6112 Audio 2 decoding unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者田中利男神奈川県海老名市下今泉810番地株式会社日立製作所オフィスシステム事業部内 (72)発明者吉冨隆神奈川県海老名市下今泉810番地株式会社日立製作所オフィスシステム事業部内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Toshio Tanaka 810 Shimoimaizumi, Ebina City, Kanagawa, Ltd.Hitachi Ltd. Office Systems Division (72) Inventor Takashi Yoshitomi 810 Shimoimaizumi, Ebina, Kanagawa Prefecture Hitachi, Ltd. Factory Office Systems Division

Claims

[Claims]

1. Receiving means for receiving video information of a plurality of terminals, receiving mixed audio information obtained by mixing audio information of the plurality of terminals, and further receiving balance information of the audio information, and receiving by the receiving means. Distribution means for distributing various kinds of information to corresponding decoding means, a plurality of video display means corresponding to the plurality of terminals, and a plurality of audio output means corresponding to the plurality of terminals and arranged in the vicinity of the video display means. When,
Based on the decoded signal of the balance information, so that the audio from the audio decoding means for decoding the mixed audio information and the audio from the audio output means correspond to the video from the corresponding video display means,
And a volume control unit for controlling the volume of the output from the voice decoding unit and transmitting the volume to the plurality of voice output units.

2. Receiving means for separately receiving video information of a plurality of terminals and receiving mixed audio information obtained by mixing audio information of the plurality of terminals, a plurality of video display means corresponding to the plurality of terminals, A plurality of audio output means corresponding to a plurality of terminals and arranged in the vicinity of the video display means; a means for obtaining a change amount between the preceding frame partial images in the respective video information; Based on the output from the position information output means, so that the sound from the position information detection means for obtaining the physical position on the video display portion and the sound from the sound output means corresponds to the video from the corresponding video display means, And a volume control unit for controlling the volume of the decoded signal of the mixed voice information and sending the volume to the plurality of voice output units.

3. The means for obtaining the change amount of a partial image according to claim 2, wherein the information format of the compressed image does not include the correlation information with the immediately preceding image and the information of the compressed image with the correlation information with the immediately previous image. A videophone terminal device, characterized in that it is means for detecting and obtaining the amount of each attribute information in a video signal having two types of attribute information.

4. The means for obtaining the variation amount of a partial image according to claim 2, wherein the information format of the compressed image does not include the correlation information with the immediately preceding image or the immediately following image, and the correlation information with the immediately preceding image or the immediately following image. A videophone terminal device, which is a means for detecting and obtaining the amount of each attribute information in a video signal having at least two types of attribute information in the information format of a compressed image including.

5. Receiving means for receiving video information of a plurality of terminals and receiving mixed audio information obtained by mixing audio information of the plurality of terminals, a plurality of video display means corresponding to the plurality of terminals, and a plurality of video display means for the plurality of terminals. A plurality of audio output units corresponding to each other and arranged in the vicinity of the video display unit and an information format of a compressed image that does not include correlation information with the immediately preceding image for each partial image or a compressed image that includes correlation information with the immediately preceding image. The means for obtaining the video information having the attribute information of the information format and the ratio of the partial image having the attribute information including the correlation information with the immediately preceding image among the attribute information of all the partial images displaying the video information from each terminal are calculated. Based on the output from the means for obtaining the ratio, the decoded signal of the mixed voice information is volume controlled so that the sound from the means for obtaining and the sound output means corresponds to the image from the corresponding image display means. And a volume control means for sending to the plurality of audio output means.

6. Receiving means for receiving video information of a plurality of terminals and receiving mixed audio information obtained by mixing audio information of the plurality of terminals, a plurality of video display means corresponding to the plurality of terminals, and a plurality of the plurality of terminals. A plurality of audio output units corresponding to each other and arranged in the vicinity of the video display unit and an information format of a compressed image that does not include correlation information with the immediately preceding image for each partial image or a compressed image that includes correlation information with the immediately preceding image. And means for obtaining video information having attribute information of the information format, and correlation information with the immediately preceding image in the central part of all the partial images among the attribute information of all the partial images displaying the video information from each terminal. Based on the output from the means for obtaining the number of partial images so that the sound from the sound output means corresponds to the image from the corresponding image display means. And a volume control means for controlling the volume of the decoded signal of the mixed voice information and sending it to the plurality of voice output means.

7. A means for connecting three or more terminals, a means for distributing a video signal to each terminal, a means for mixing audio signals of each terminal and transmitting the mixed signals to all terminals, and a method for changing the amount of change in each terminal. Position information detecting means for obtaining a physical position of a large partial image on the image display unit for each terminal, volume balance information between the terminals is obtained from the position information, and the balance information is distributed to each terminal. Characteristic videophone host device.

8. A means for receiving video signals of a plurality of terminals separately, a means for receiving a signal obtained by mixing audio signals of a plurality of terminals, and a means for receiving a partial video having the largest change amount on the video display unit of the own terminal. A videophone terminal device, comprising: position information detecting means for obtaining a physical position, means for transmitting the position information to a host device, and means for receiving volume balance information between terminals from the host device.

9. A means for receiving a signal in which video signals of a plurality of terminals are mixed, a means for receiving a signal in which audio signals of a plurality of terminals are mixed, a means for separating a plurality of received video signals, Position information detecting means for obtaining the physical position of the partial image with the largest amount of change on the image display unit,
A videophone terminal device comprising means for transmitting the position information to the host device and means for receiving volume balance information between the terminals from the host device.

10. The information format of a compressed image in which the position information for obtaining the physical position on the video display unit of the partial video with the largest change amount in claim 7, 8 or 9 does not include the correlation information with the immediately preceding image. And a videophone host device or a videophone, which is obtained based on the amount of each attribute information in a video signal having two types of attribute information in the information format of a compressed image including correlation information with the immediately preceding image. Terminal device.

11. A videophone terminal device according to any one of claims 1 to 6, wherein the plurality of audio output means are left and right audio output means.

12. The videophone host device or the videophone terminal device according to claim 7, wherein the volume balance information is left and right volume balance information.