JP2630041B2

JP2630041B2 - Video conference image display control method

Info

Publication number: JP2630041B2
Application number: JP2228772A
Authority: JP
Inventors: 斉小山
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1990-08-29
Filing date: 1990-08-29
Publication date: 1997-07-16
Anticipated expiration: 2012-07-16
Also published as: JPH04109784A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、テレビ電話および会議システムに利用す
る。特に、複数の人間を対象としたテレビ会議システム
でのカメラおよび画面の制御手段に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention is used for videophone and conference systems. In particular, the present invention relates to a camera and screen control means in a video conference system for a plurality of people.

〔Overview〕

本発明は、テレビ会議で発言中の発声者の画像を表示
制御する手段において、発言中の発声者を自動選択することにより、人手によるカメラ操作を不要することができるように
したものである。According to the present invention, in a means for controlling the display of an image of a speaker who is speaking in a video conference, a manual camera operation can be eliminated by automatically selecting a speaker who is speaking.

[Conventional technology]

従来のテレビ会議システムでは、会議出席者の中の発
言車に向けた撮像カメラの制御をジョイスティック等を
用いて撮像（送信）側あるいは表示（受信）側で行うも
のがある。このようなテレビ会議システムでは、撮像側
または表示側に専任のオペレータを設けるか、または会
議参加者が適宜発言者に向けたカメラ制御や表示切替を
行うことによって発言者の画像をとらえ、所望の効果を
えられる。しかし、専任オペレータや会議参加者の操作
が無い場合は適切な画像が得られず、発言者と無関係な
画像の交信を行うことになる。以下具体的に撮像側でカ
メラ制御を行う場合の従来例を第６図および表示側でカ
メラ制御を行う場合の従来例を第７図の全体ブロック図
を用いて説明する。2. Description of the Related Art Some conventional video conference systems use a joystick or the like to control an imaging camera for a speaking vehicle among conference attendees on an imaging (transmission) side or a display (reception) side. In such a video conference system, a dedicated operator is provided on the imaging side or the display side, or a conference participant captures an image of the speaker by performing camera control or display switching for the speaker as appropriate, and obtains a desired image. You can get the effect. However, if there is no operation by a dedicated operator or a meeting participant, an appropriate image cannot be obtained, and an image irrelevant to the speaker is exchanged. Hereinafter, a conventional example in which camera control is performed on the image pickup side will be specifically described with reference to FIG. 6 and a conventional example in which camera control will be performed on the display side will be described with reference to the entire block diagram of FIG.

第６図で、10、11および12は会議参加者、20および21
はマイク、30は撮像カメラ、31は撮像カメラのコントロ
ーラ、50はカメラ30で撮像した画像を符号化する画像コ
ーダ、51はマイク20および21で集音した音声を符号化す
る音声コーダ、60は符号化された音声および画像をマル
チプレクスして回線にのせるマルチプレクサ、101は送
信回路、100は符号等のデータを受信する受信回線、80
は受信データを音声系と画像系に分離するデマルチプレ
クサ、70は分離された音声符号を音声信号に復号化する
音声デコーダ、71は画像符号を画像信号に復号化する画
像デコーダ、90は画像信号を表示する表示TV、40および
41は音声信号を拡声するスピーカである。In FIG. 6, 10, 11 and 12 are conference participants, 20 and 21
Is a microphone, 30 is an imaging camera, 31 is an imaging camera controller, 50 is an image coder that encodes an image captured by the camera 30, 51 is an audio coder that encodes audio collected by the microphones 20 and 21, and 60 is an audio coder. A multiplexer for multiplexing coded audio and video on a line, 101 for a transmitting circuit, 100 for a receiving line for receiving data such as codes, 80
Is a demultiplexer that separates received data into an audio system and an image system, 70 is an audio decoder that decodes the separated audio code into an audio signal, 71 is an image decoder that decodes the image code into an image signal, and 90 is an image signal Display TV, 40 and
Reference numeral 41 denotes a speaker that loudspeaks an audio signal.

第７図で第６図と同一の番号を付したものは基本的に
同じ機能および動作を行うので説明は省略する。第７図
で第６図と異なるのは、61のマルチプレクサと、81のデ
マルチプレクサとである。マルチプレクサ61は画像符号
と音声符号の他に相手方のカメラをコントロールするカ
メラコントローラ31の制御信号も合わせてマルチプレク
スして送出する。デマルチプレクサ81は画像符号と音声
符号の他に、相手方から送られるカメラをコントロール
する信号も合わせてデマルチプレクスして分離する。In FIG. 7, the components denoted by the same reference numerals as those in FIG. 6 basically perform the same functions and operations, and thus the description thereof is omitted. FIG. 7 differs from FIG. 6 in that there are 61 multiplexers and 81 demultiplexers. The multiplexer 61 multiplexes and transmits the control signal of the camera controller 31 for controlling the other camera in addition to the image code and the audio code. The demultiplexer 81 demultiplexes and separates a camera control signal sent from the other party in addition to the image code and the audio code.

従来のTV会議システムでは、例えば、撮影範囲を第６
図の画像の撮像側で人間が制御して適切な画面を撮影で
きる。一方、受信表示側で人間がカメラを制御すること
によっても同様の効果が得られる。In a conventional video conference system, for example, the shooting range is set to the sixth
An appropriate screen can be photographed under the control of a human on the imaging side of the image shown in FIG. On the other hand, the same effect can be obtained by controlling the camera on the receiving and displaying side.

[Problems to be solved by the invention]

このような従来のTV会議システムでは、自動的に発言
者にカメラを向ける手段が無く人間の操作を必要とし、
カメラを制御する人間を余分に用いるかまたは参加者が
自らカメラを制御せざるをえなく、余分な人員の発生や
カメラ制御のために会議に集中できないなどの問題を引
起こしている。さらに、画像の受信側でカメラを制御す
る場合は、送信側のカメラをコントロールするために相
手側の人間の配列順序や音声の特徴をあらかじめ把握し
ておき、発声者の方向をこの記憶をたよりに特定し、カ
メラを制御することを回避する手段が無く試行錯誤で方
向を特定しなければならない問題があった。また、専任
のオペレータを用いる場合は、人手の増加とともに、会
議内容のを第三者に聞かれる問題もあった。In such a conventional video conference system, there is no means to automatically point the camera at the speaker, and human operation is required.
This causes problems such as using an extra person for controlling the camera or having the participant control the camera by himself / herself, generating extra personnel and being unable to concentrate on the conference due to the camera control. Furthermore, when controlling the camera on the receiving side of the image, in order to control the camera on the transmitting side, the arrangement sequence and voice characteristics of the other party are grasped in advance, and the direction of the speaker is stored in this memory. There is no means for avoiding controlling the camera, and the direction must be specified by trial and error. In addition, when a dedicated operator is used, there is a problem that the contents of the conference may be heard by a third party as the number of human resources increases.

本発明、このような欠点を除去するもので、人手によ
るカメラ操作を不要にするテレビ会議画像表示制御装置
および方法を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a video conference image display control apparatus and method which eliminates such disadvantages and eliminates the need for manual camera operation.

[Means for solving the problem]

本発明は、複数人の発声者を含む場面を撮像するカメ
ラおよびこの発声者の音声を集音する集音手段のそれぞ
れで画像信号および音声信号のそれぞれを生成する第一
ステップ、この第一ステップで生成された信号を圧縮符
号化した後に多重化して送出する第二ステップと、到来
する多重化された信号を分離し、分離されたそれぞれの
信号を伸張して画像信号および音声信号を復元し、この
画像信号を表示する第三ステップとで構成されたテレビ
会議画像表示制御方法において、上記第一ステップは、離隔した位置に上記集音手段を
置き、この集音手段のそれぞれに到来する音声の時間差
に基づき発声者の位置を特定し、この特定された位置の
発声者を選択し、この選択された発声者にかかわる画像
信号を生成するステップを含むことを特徴とする。The present invention provides a first step of generating an image signal and an audio signal with a camera that captures a scene including a plurality of speakers and a sound collecting unit that collects the sounds of the speakers, respectively. A second step of compressing and encoding the signal generated in step 2 and multiplexing and transmitting the separated signal, separating an incoming multiplexed signal, and decompressing each of the separated signals to restore an image signal and an audio signal. And a third step of displaying the image signal. In the video conference image display control method, the first step comprises placing the sound collecting means at a separated position, and arriving at each of the sound collecting means. Identifying the position of the speaker based on the time difference of the speaker, selecting the speaker at the specified position, and generating an image signal related to the selected speaker. I do.

ここで、上記第一ステップは、離隔した位置に上記集
音手段を置き、この集音手段のそれぞれに到来する音声
の時間差に基づき発声者の位置を特定し、この特定され
た位置を示す選択信号を生成し、この選択信号とともに
上記複数人を発声者を含む場面に対応する画像信号を生
成するステップを含み、上記第三ステップは、到来する
選択信号に基づき到来する画像信号から選択した発声者
にかかわる画像信号を伸張して表示するステップを含む
ことができる。Here, in the first step, the sound collecting means is placed at a separated position, a position of a speaker is specified based on a time difference between sounds arriving at each of the sound collecting means, and a selection indicating the specified position is performed. Generating an image signal corresponding to the scene including the plurality of speakers together with the selection signal, wherein the third step includes selecting an utterance selected from the arriving image signal based on the arriving selection signal. Expanding and displaying the image signal relating to the user.

また、上記第一ステップは、離隔した位置に上記集音
手段を置き、この集音手段のそれぞれに到来する音声の
時間差に基づき発声者の位置を特定し、この特定された
位置を示す選択信号を生成し、この選択信号とともに上
記複数人の発声者を含む場面に対応する画像信号を生成
するステップを含み、上記第三ステップは、到来する画
像信号を伸張し、到来する選択信号に基づきこの伸張さ
れた画像信号のうちの発声者にかかわる画像信号を選択
して拡大表示するステップを含むことができる。In the first step, the sound collecting means is placed at a separated position, a position of a speaker is specified based on a time difference between sounds arriving at the sound collecting means, and a selection signal indicating the specified position. Generating an image signal corresponding to a scene including the plurality of speakers together with the selection signal, and the third step decompresses the incoming image signal and generates the image signal based on the incoming selection signal. The method may include a step of selecting an image signal related to the speaker from the expanded image signals and enlarging and displaying the selected image signal.

[Action]

少なくとも２本以上のマイクから入力される音声を基
に発声者の方向を特定し、発声者の位置特定手段の結果
に応じてカメラより入力した画像の特定部分の画像信号
のみを自動的に選択して圧縮し、送信し、また、発声者
の位置特定手段の結果を送信し、この送信された位置特
定結果を受信し、受信した位置特定結果に基づきテレビ
画面の圧縮された信号の伸張を制御し、特定の画像部分
の信号のみを自動的に伸張し表示する。The direction of the speaker is specified based on the voices input from at least two or more microphones, and only the image signal of a specific portion of the image input from the camera is automatically selected according to the result of the position specifying means of the speaker. Compressing and transmitting, and transmitting the result of the speaker's location determining means, receiving the transmitted location determination result, and decompressing the compressed signal of the television screen based on the received location determination result. Control and automatically expand and display only the signal of the specific image part.

〔Example〕

以下、本発明の一実施例を図面に基づき説明する。第
１図は第一実施例で、一台のカメラの撮影した画像の特
定エリアを符号化して送信する場合を示す全体ブロック
構成図である。第１図で第６図、第７図と同一の番号を
付したものは基本的に同様な機能を有するものであるの
で説明を省略する。第１図で、200は２本のマイク20お
よび21から入力される音声などの到達時間差とマイク20
および21の設置間隔を基に音声などの発信源の方向を判
定し、符号化対象エリアを示す信号を生成して出力する
方向安定回路である。300は方向判定回路200から出力さ
れる信号に従ってカメラ撮影画像の特定エリアを符号化
する符号化制御回路である。Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is an overall block diagram showing a case where a specific area of an image taken by one camera is encoded and transmitted in the first embodiment. In FIG. 1, the components denoted by the same reference numerals as those in FIGS. 6 and 7 have basically the same functions, and therefore description thereof will be omitted. In FIG. 1, reference numeral 200 denotes a difference between the arrival time difference of the sound input from the two microphones 20 and 21 and the microphone 20.
And a direction stabilizing circuit that determines the direction of a transmission source such as a voice based on the installation intervals of and 21 and generates and outputs a signal indicating an encoding target area. Reference numeral 300 denotes an encoding control circuit that encodes a specific area of an image captured by the camera according to a signal output from the direction determination circuit 200.

第２図は、第二実施例で、カメラから取り込まれる画
像はそのまま符号化して送信するが、サイド情報に復号
化エリアを指定する情報を付加する場合を示す全体ブロ
ック構成図である。第２図で、第１図、第６図および第
７図と同一の番号を付したものは同一の機能を有するも
のであるので説明を省略する。第二実施例で、マルチプ
レクサ60は符号化された音声、画像信号の他に方向判定
回路200の結果を合わせてマルチプレクスする。一方、
デマルチプレクタ80はマルチプレクサ60とは逆に、符号
化された音声、画像し号の他に符号化エリアを特定する
サイド情報も分離する。復号化制御回路400は復号化エ
リアを指定するサイド情報に基づき指定エリアの符号化
された画像信号を取り出して画像デコーダ71に送出す
る。画像デコーダ71は源画像の一部の符号化データを復
号化して表示TV90に送出する。FIG. 2 is an overall block diagram showing a second embodiment in which an image captured from a camera is directly encoded and transmitted, but information specifying a decoding area is added to side information. In FIG. 2, components denoted by the same reference numerals as those in FIG. 1, FIG. 6, and FIG. 7 have the same functions, and therefore description thereof will be omitted. In the second embodiment, the multiplexer 60 multiplexes the result of the direction determination circuit 200 in addition to the encoded audio and image signals. on the other hand,
The demultiplexer 80 separates coded audio and image signals as well as side information for specifying a coded area, contrary to the multiplexer 60. The decoding control circuit 400 extracts the coded image signal of the designated area based on the side information designating the decoding area, and sends it to the image decoder 71. The image decoder 71 decodes part of the encoded data of the source image and sends it to the display TV 90.

第３図は、第三実施例でカメラ30から取り込まれる画
像はそのまま符号化して送信するが、サイド情報に復号
化エリアを指定する情報を付加して送信し、受信側で復
号化の後に表示エリアを特定して拡大表示する場合を示
す全体ブロック構成図である。第３図で、第１図、第２
図、第６図および第７図と同一の番号を付したものは同
一の機能を有するものであるので説明を省略する。第三
実施例で、マルチプレクサ60は符号化された音声、画像
信号の他に方向判定回路200の結果を合わせてマルチプ
レクスする。一方、デマルチプレクサ80はマルチプレク
サ60とは逆に、符号化された音声、画像信号の他に符号
化エリアを特定するサイド情報も分離する。画像デコー
ダ71は送信された画像データを復号化する。表示制御回
路500は送信された表示エリアを指定するサイド情報に
基づき指定エリアの画像信号を取り出して拡大し、表示
VT90に送出する。FIG. 3 shows an image taken from the camera 30 in the third embodiment, which is encoded and transmitted as it is, but is transmitted after adding information for designating a decoding area to side information and after decoding on the receiving side. FIG. 3 is an overall block configuration diagram showing a case where an area is specified and enlarged and displayed. In FIG. 3, FIG.
Components having the same reference numerals as those in FIGS. 6, 6 and 7 have the same functions, and therefore description thereof will be omitted. In the third embodiment, the multiplexer 60 multiplexes the result of the direction determination circuit 200 in addition to the encoded audio and image signals. On the other hand, the demultiplexer 80 separates the side information for specifying the coding area in addition to the coded audio and image signals, contrary to the multiplexer 60. The image decoder 71 decodes the transmitted image data. The display control circuit 500 extracts and enlarges the image signal of the designated area based on the transmitted side information designating the display area, and displays the image signal.
Send to VT90.

すなわち、この実施例は、撮像するカメラ30と、集音
する集音手段であるマイク20および21と、カメラ30から
与えられる画像信号を圧縮して圧縮画像信号を生成する
画像信号圧縮手段である画像コーダ50と、上記集音手段
から与えられる音声信号を圧縮して圧縮音声信号を生成
する音声信号圧縮手段である音声コーダ51と、上記圧縮
画像信号と上記圧縮音声信号とを多重化して多重化信号
を生成して送信する多重化送信手段であるマルチプレク
サ60と、この多重化送信手段から到来する多重化信号を
圧縮画像信号と圧縮音声信号とに分離する信号分離手段
であるデマルチプレクタ80と、の信号分離手段で分離さ
れた圧縮画像信号を伸張する画像信号伸張手段である画
像デコーダ71と、上記信号分離手段で分離された圧縮音
声信号を伸張する音声信号伸張手段である音声デコーダ
70と、上記画像信号伸張手段で伸張された画像信号に係
わる画像信号を表示する表示手段である表示TVと、上記
音声信号伸張手段で伸張された音声信号を拡声する拡声
手段であるスピーカ40および41とを備え、さらに、本発
明の特徴とする手段として、上記集音手段が集音した音
声に基づきこの音声の発声者の位置を特定する位置特定
手段である方向判定回路200を備え、ここで、第一の発
明は、上記位置特定手段の特定した位置の近傍の画像信
号を除く画像信号を上記画像信号圧縮手段に与えること
を禁止する画像信号送出禁止手段である符号化制御回路
300を備え、第二の発明は、マルチプレクサ60に、上記
位置特定手段の特定した位置を示す位置情報を送信する
特定位置情報送信手段を含み、上記画像信号圧縮手段に
代わり、到来する位置情報の示す圧縮画像信号の部分を
除く圧縮画像信号を上記画像信号伸張手段に与えること
を禁止する画像信号伸張禁止手段である含号化制御回路
400を備え、第三の発明は、上記画像信号伸張禁止手段
に代わり、上記到来する圧縮画像信号の伸張処理後に到
来する位置情報の示す部分を拡大して上記表示手段に与
える表示拡大手段である表示制御回路500を備える。That is, this embodiment is a camera 30 that captures an image, microphones 20 and 21 that are sound collecting units that collect sound, and an image signal compressing unit that generates a compressed image signal by compressing an image signal given from the camera 30. An image coder 50; an audio coder 51 which is an audio signal compression unit for generating a compressed audio signal by compressing an audio signal provided from the sound collection unit; and multiplexing the compressed image signal and the compressed audio signal by multiplexing. And a demultiplexer 80 serving as signal separating means for separating the multiplexed signal coming from the multiplexing and transmitting means into a compressed image signal and a compressed audio signal. And an image decoder 71 which is an image signal decompression means for decompressing the compressed image signal separated by the signal separation means, and an audio signal for decompressing the compressed audio signal separated by the signal separation means. Audio decoder is Zhang means
70, a display TV which is a display means for displaying an image signal related to the image signal expanded by the image signal expansion means, and a speaker 40 which is a loudspeaker which expands the audio signal expanded by the audio signal expansion means. 41, further comprising, as a feature of the present invention, a direction determining circuit 200 which is a position specifying means for specifying a position of a speaker of the sound based on the sound collected by the sound collecting means. According to a first aspect of the present invention, there is provided an encoding control circuit which is an image signal transmission prohibiting unit that prohibits an image signal excluding an image signal near the position specified by the position specifying unit from being given to the image signal compressing unit.
The second invention includes a specific position information transmitting unit that transmits the position information indicating the position specified by the position specifying unit to the multiplexer 60, and replaces the image signal compression unit with the incoming position information. An encoding control circuit which is an image signal decompression prohibiting means for prohibiting a compressed image signal excluding a part of the compressed image signal shown from being given to the image signal decompression means.
A third aspect of the present invention is a display enlarging unit that includes 400 and replaces the image signal decompression prohibiting unit with a portion that indicates the position information arriving after the decompression processing of the arriving compressed image signal and provides the enlarged portion to the display unit. A display control circuit 500 is provided.

次に、動作について説明する。なお説明の簡単のため
第１図の人間10、11および12とマイク20および21の位置
（距離）関係を第８図の状態で説明する。第８図で、マ
イク20と21とは会議参加者の発声をとらえ電気信号に変
換して出力する。人間10が発声している場合には、マイ
ク21に入力される音声信号はマイク20に入力される音声
信号に対して距離差によって生じるＴ秒の遅れを持つ。
したがって、音速をＳとすればＹ＝Ｔ×Ｓ＋Ｘの関係が
ある。マイク20と21との出力の一例を第９図に示す。第
９図で、上段はマイク20の出力する信号波形、下段はマ
イク21の出力する信号波形を示す。第８図に示した例で
は、マイク20の信号波形を基準にマイク21の信号波形を
観測すると、以下３つの場合が存在する。Next, the operation will be described. For the sake of simplicity, the positional (distance) relationship between the humans 10, 11, and 12 and the microphones 20 and 21 in FIG. In FIG. 8, the microphones 20 and 21 capture the utterance of the conference participant, convert it into an electric signal, and output it. When the human 10 is speaking, the audio signal input to the microphone 21 has a delay of T seconds caused by the distance difference from the audio signal input to the microphone 20.
Therefore, if the speed of sound is S, there is a relationship of Y = T × S + X. An example of the output of the microphones 20 and 21 is shown in FIG. In FIG. 9, the upper part shows the signal waveform output from the microphone 20, and the lower part shows the signal waveform output from the microphone 21. In the example shown in FIG. 8, when the signal waveform of the microphone 21 is observed with reference to the signal waveform of the microphone 20, the following three cases exist.

マイク21の信号波形の遅れＴが正の値の場合：人間
10が発声マイク20と21の信号波形の遅れＴが０の場合：人間
11が発声マイク21の信号波形の遅れＴが負の値の場合：人間
12が発声である。なお、本実施例では説明を簡単にするために人
間を３人、マイクの個数を２個、相互の距離関係を同一
としたが、人間の総数、マイクの総数などが複数であ
り、人間とマイクの距離関係が不均一の場合でも、人間
がTVカメラでの撮影に支障が無い程度にカメラの光軸に
対して直角に近く直線状に整列し、かつマイクの配置が
人間の配列に対しほぼ並行またはほぼ直交に近い場合
で、マイク間の距離と信号の到達時間差が明確であれ
ば、同様の基本原理で方向を特定することができる。方
向判定回路200は発声者の方向を特定し、特定した方向
に従って符号化エリアを示す制御信号、復号化または表
示エリアをサイド情報として生成する。例えば上述の
の場合に、第一実施例では、カメラ撮影画像の内、第４
図の符号化エリア１を符号化して送出する。また、第二
実施例では、第５図の復号化エリア１を示すサイド情報
を生成し送出する。また、第三実施例でも、第二実施例
と同様に、サイド情報を送出し、受信側では受信画像を
デコードした後に、サイド情報に従って第５図の復号化
エリア１に相当する部分のみを拡大して表示を行う。When the delay T of the signal waveform of the microphone 21 is a positive value: human
10 is utterance When delay T of signal waveform of microphones 20 and 21 is 0: human
11 is utterance When the delay T of the signal waveform of the microphone 21 is a negative value: human
12 is utterance. In this embodiment, for the sake of simplicity, three humans, two microphones, and the same mutual distance relationship are used. However, the total number of humans, the total number of microphones, and the like are plural, and Even when the distance relationship between the microphones is not uniform, humans are aligned in a straight line near the optical axis of the camera so that they do not interfere with shooting with a TV camera, and the microphone arrangement is If the distance between the microphones and the arrival time difference of the signal are clear when the directions are almost parallel or nearly orthogonal, the direction can be specified by the same basic principle. The direction determination circuit 200 specifies the direction of the speaker, and generates a control signal indicating a coding area and a decoding or display area as side information according to the specified direction. For example, in the above case, in the first embodiment, the fourth
The encoding area 1 in the figure is encoded and transmitted. In the second embodiment, side information indicating the decoding area 1 in FIG. 5 is generated and transmitted. Also, in the third embodiment, as in the second embodiment, after transmitting the side information and decoding the received image on the receiving side, only the portion corresponding to the decoding area 1 in FIG. 5 is enlarged according to the side information. To display.

〔The invention's effect〕

本発明は、以上説明したように、TVカメラの操作者に
よる撮影画像の選択を不必要とし、専任のオペレータや
参加者の操作を回避することができるので、会議進行の
円滑化が図れ、またオペレータによる秘密漏洩の問題を
回避することができる効果がある。As described above, the present invention does not require the operator of the TV camera to select a captured image, and can avoid the operation of a dedicated operator or a participant. There is an effect that the problem of secret leakage by the operator can be avoided.

[Brief description of the drawings]

第１図は本発明の第一実施例を示す全体ブロック図。第２図は第二実施例を示す全体ブロック図。第３図は第三実施例を示す全体ブロック図。第４図は符号化エリアの例を示す図。第５図は復号化エリアの例を示す図。第６図は従来例を示す全体ブロック図。第７図は従来例を示す全体ブロック図。第８図はマイクと会議参加者との位置関係の一例を示す
図。第９図はマイクに入力される音声の信号波形を示す図。 10、11、12……人間、20、21……マイク、30……カメ
ラ、31……カメラコントローラ、40、41……スピーカ、
50……画像コーダ、51……音声コーダ、60、61……マル
チプレクサ、70……音声デコーダ、71……画像デコー
ダ、80、81……デマルチプレクサ、90……表示TV、100
……受信回線、101……送信回線、200……方向判定回
路、300……符号化制御回路、400……復号化制御回路、
500……表示制御回路。FIG. 1 is an overall block diagram showing a first embodiment of the present invention. FIG. 2 is an overall block diagram showing a second embodiment. FIG. 3 is an overall block diagram showing a third embodiment. FIG. 4 is a diagram showing an example of a coding area. FIG. 5 is a diagram showing an example of a decoding area. FIG. 6 is an overall block diagram showing a conventional example. FIG. 7 is an overall block diagram showing a conventional example. FIG. 8 is a diagram showing an example of a positional relationship between a microphone and conference participants. FIG. 9 is a diagram showing a signal waveform of a sound input to a microphone. 10, 11, 12 ... human, 20, 21 ... microphone, 30 ... camera, 31 ... camera controller, 40, 41 ... speaker,
50 ... Image coder, 51 ... Audio coder, 60, 61 ... Multiplexer, 70 ... Audio decoder, 71 ... Image decoder, 80, 81 ... Demultiplexer, 90 ... Display TV, 100
...... Reception line, 101 ... Transmission line, 200 ... Direction determination circuit, 300 ... Encoding control circuit, 400 ... Decoding control circuit,
500 Display control circuit.

Claims

(57) [Claims]

1. A first step of generating an image signal and an audio signal by a camera that captures a scene including a plurality of speakers and a sound collecting unit that collects the voices of the speakers, respectively. A second step of compressing and encoding the signals generated in the step and then multiplexing and transmitting the signals; separating an incoming multiplexed signal; decompressing each of the separated signals to restore an image signal and an audio signal And a third step of displaying the image signal. In the video conference image display control method, the first step includes arranging the sound collecting means at a separated position and arriving at each of the sound collecting means. Identifying the position of the speaker based on the time difference of the voice, selecting the speaker at the specified position, and generating an image signal related to the selected speaker. Video conference image display control method.

2. A first step of generating an image signal and an audio signal by a camera that captures a scene including a plurality of speakers and a sound collecting unit that collects the sounds of the speakers, respectively. A second step of compressing and encoding the signal generated in one step and then multiplexing and transmitting the separated signal; separating an incoming multiplexed signal; expanding each of the separated signals to form an image signal and an audio signal; Restoring and displaying the image signal in the third step of the video conference image display control method, wherein the first step includes placing the sound collecting means at a separated position, and arriving at each of the sound collecting means. The position of the speaker is specified based on the time difference of the sound to be generated, a selection signal indicating the specified position is generated, and an image signal corresponding to the scene including the plurality of speakers is generated together with the selection signal. A video conference image, wherein the third step includes a step of decompressing and displaying an image signal relating to a selected speaker from the incoming image signal based on the incoming selection signal. Display control method.

3. A first step of generating each of an image signal and an audio signal by each of a camera that captures a scene including a plurality of speakers and sound collecting means that collects the voices of the speakers, and A second step of compressing and encoding the signal generated in one step and then multiplexing and transmitting the separated signal; separating an incoming multiplexed signal; expanding each of the separated signals to form an image signal and an audio signal; Restoring and displaying the image signal in the third step of the video conference image display control method, wherein the first step includes placing the sound collecting means at a separated position, and arriving at each of the sound collecting means. The position of the speaker is specified based on the time difference of the sound to be generated, a selection signal indicating the specified position is generated, and an image signal corresponding to the scene including the plurality of speakers is generated together with the selection signal. Generating the signal, the third step includes expanding the incoming image signal, and selecting and enlarging and displaying, based on the incoming selection signal, an image signal related to the speaker of the expanded image signal. A video conference image display control method comprising the steps of: