JPH04109784A

JPH04109784A - Video conference picture display controller and its method

Info

Publication number: JPH04109784A
Application number: JP22877290A
Authority: JP
Inventors: Hitoshi Koyama; 小山　斉
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-08-29
Filing date: 1990-08-29
Publication date: 1992-04-10
Anticipated expiration: 2012-07-16
Also published as: JP2630041B2

Abstract

PURPOSE:To avoid the necessity of an exclusive operator and the operation by participants by specifying a direction of a speaker based on a voice signal inputted from 2>= microphones, selecting automatically only a picture signal of a specific part, compressing and sending the signal, expanding and displaying automatically only the specific picture part. CONSTITUTION:The above controller is provided with a direction decision circuit 200 being a position specification means specifying the location of a speaker of a voice signal based on the voice collected by a microphone. When a person 10 utters a word, the voice signal inputted to a microphone 21 has a delay of Tsec caused due to a distance difference with respect to the voice signal inputted to a microphone 20. The direction decision circuit 200 specifies the direction of the speaker and generates a control signal representing a coded area according to the specified direction and a decoding or display area as side information and sends a coded area 1 while coding it, After the received picture is decoded at the receiver side, only the part equivalent to the decoded area 1 is magnified and displayed according to the side information.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、テレビ電話および会議システムに利用する。[Detailed description of the invention] [Industrial application field] INDUSTRIAL APPLICATION This invention is utilized for a video telephone and a conference system.

特に、複数の人間を対象としたテレビ会議システムでの
カメラおよび画面の制御手段に関する。In particular, the present invention relates to camera and screen control means in a video conference system for multiple people.

〔overview〕

本発明は、テレビ会議で発言中の発声者の画像を表示制
御する手段において、発言中の発声者を自動選択することにより、人手による
カメラ操作を不要にすることができるようにしたもので
ある。The present invention is a means for controlling the display of an image of a speaker who is speaking in a video conference, which automatically selects the speaker who is speaking, thereby eliminating the need for manual camera operation. .

[Conventional technology]

従来のテレビ会議システムでは、会議出席者の中の発言
者に向けた撮像カメラの制御をジョイスティック等を用
いて撮像（送信）側あるいは表示〈受信）側で行うもの
がある。このようなテレビ会議ンステムでは、撮像側ま
たは表示側に専任のオペレータを設けるか、または会議
参加者が適宜発言者に向けたカメラ制御や表示切替を行
うことによって発言者の画像をとらえ、所望の効果をえ
られる。しかし、専任オペレータや会議参加者の操作が
無い場合は適切な画像が得られず、発言者と無関係な画
像の交信を行うことになる。以下具体的に撮像側でカメ
ラ制御を行う場合の従来例を第６図および表示側でカメ
ラ制御を行う場合の従来例を第７図の全体ブロック図を
用いて説明する。In some conventional video conference systems, an imaging camera directed toward a speaker among conference participants is controlled on the imaging (transmission) side or the display (reception) side using a joystick or the like. In such video conferencing systems, a dedicated operator is provided on the imaging side or display side, or conference participants can control the camera to point at the speaker and switch the display as appropriate to capture the speaker's image and display the desired image. You can get the effect. However, if there is no operation by a full-time operator or a conference participant, an appropriate image cannot be obtained, and images unrelated to the speaker will be exchanged. Hereinafter, a conventional example in which camera control is performed on the imaging side will be specifically explained using FIG. 6, and a conventional example in which camera control is performed on the display side will be explained using the general block diagram in FIG. 7.

第６図で、１０．１１および１２は会議参加者、２０お
よび２１はマイク、３０は撮像カメラ、３１は撮像カメ
ラのコントローラ、５０はカメラ３０で撮影した画像を
符号化する画像コーグ、５１はマイク２０および２１で
集音した音声を符号化する音声コーグ、６０は符号化さ
れた音声および画像をマルチプレクスして回線にのせる
マルチプレクサ、１０１　は送信回線、１００は符号等
のデータを受信する受信回線、８０は受信データを音声
系と画像系に分離するデマルチプレクサ、７０は分離さ
れた音声符号を音声信号に復号化する音声デコーダ、７
１は画像符号を画像信号に復号化する画像デコーダ、９
０は画像信号を表示する表示ＴＶ、４０および４１は音
声信号を拡声するスピーカである。In FIG. 6, 10.11 and 12 are conference participants, 20 and 21 are microphones, 30 is an imaging camera, 31 is a controller for the imaging camera, 50 is an image cog that encodes the image taken by the camera 30, and 51 is A voice cog encodes the voices collected by the microphones 20 and 21; 60 is a multiplexer that multiplexes the encoded voice and image and puts it on a line; 101 is a transmission line; 100 is a receiver for receiving data such as codes. a receiving line; 80 is a demultiplexer that separates received data into audio and image data; 70 is an audio decoder that decodes the separated audio codes into audio signals;
1 is an image decoder that decodes an image code into an image signal; 9
0 is a display TV that displays image signals, and 40 and 41 are speakers that amplify audio signals.

第７図で第６図と同一の番号を付したものは基本的に同
じ機能および動作を行うので説明は省略する。第７図で
第６図と異なるのは、６１のマルチプレクサと、８１の
デマルチプレクサとである。マルチプレクサ６１は画像
符号と音声符号の他に相手方のカメラをコントロールす
るカメラコントローラ３１の制御信号も合わせてマルチ
プレクスして送出する。デマルチプレクサ８１は画像符
号と音声符号の他に、相手方から送られるカメラをコン
トロールする信号も合わせてデマルチプレクスして分離
する。Components in FIG. 7 that are given the same numbers as in FIG. 6 basically have the same functions and operations, so a description thereof will be omitted. What is different in FIG. 7 from FIG. 6 is 61 multiplexers and 81 demultiplexers. The multiplexer 61 multiplexes the image code and the audio code, as well as a control signal from the camera controller 31 that controls the other party's camera, and sends the multiplexed signal. The demultiplexer 81 demultiplexes and separates not only the image code and the audio code but also the signal for controlling the camera sent from the other party.

従来のＴＶ会議システムでは、例えば、撮影範囲を第６
図の画像の撮像側で人間が制御して適切な画面を撮影で
きる。一方、受信表示側で人間がカメラを制御すること
によっても同様の効果が得られる。In conventional TV conference systems, for example, the shooting range is
A human can control the image capturing side of the figure to capture an appropriate screen. On the other hand, a similar effect can be obtained by having a person control the camera on the receiving and displaying side.

[Problem to be solved by the invention]

このような従来のＴＶ会議システムでは、自動的に発言
者にカメラを向ける手段が無く人間の操作を必要とし、
カメラを制御する人間を余分に用いるかまたは参加者が
自らカメラを制御せざるをえなく、余分な人員の発生や
カメラ制御のために会議に集中できないなどの問題を引
起こしている。In such conventional TV conference systems, there is no means to automatically point the camera at the speaker, and human operation is required.
Either an extra person is required to control the camera, or participants are forced to control the camera themselves, causing problems such as extra personnel and difficulty concentrating on the meeting due to camera control.

さらに、画像の受信側でカメラを制御する場合は、送信
側のカメラをコントロールするために相手側の人間の配
列順序や音声の特徴をあらかじめ把握しておき、発声者
の方向をこの記憶をたよりに特定し、カメラを制御する
ことを回避する手段が無く試行錯誤で方向を特定しなけ
ればならない問題があった。また、専任のオペレータを
用いる場合は、人手の増加とともに、会議内容を第三者
に聞かれる問題もあった。Furthermore, when controlling the camera on the image receiving side, it is necessary to know in advance the arrangement order of the people on the other side and the characteristics of the voices in order to control the camera on the transmitting side, and use this memory to determine the direction of the speaker. There was a problem in that there was no way to avoid specifying the direction and controlling the camera, and the direction had to be specified by trial and error. Furthermore, when a full-time operator is used, there is the problem that not only does the number of personnel increase, but also the content of the meeting can be heard by a third party.

本発明は、このような欠点を除去するもので、人手によ
るカメラ操作を不要にするテレビ会議画像表示制御装置
および方法を提供することを目的とする。The present invention aims to eliminate such drawbacks and to provide a video conference image display control device and method that eliminates the need for manual camera operation.

[Means to solve the problem]

本発明は、撮像するカメラと、集音する集音手段と、上
記カメラから与えられる画像信号を圧縮して圧縮画像信
号を生成する画像信号圧縮手段と、上記集音手段から与
えられる音声信号を圧縮して圧縮音声信号を生成する音
声信号圧縮手段と、上記圧縮画像信号と上記圧縮音声信
号とを多重化して多重化信号を生成して送信する多重化
送信手段と、この多重化送信手段から到来する多重化信
号を圧縮画像信号と圧縮音声信号とに分離する信号分離
手段と、この信号分離手段で分離された圧縮画像信号を
伸張する画像信号伸張手段と、上記信号分離手段で分離
された圧縮音声信号を伸張する音声信号伸張手段と、上
記画像信号伸張手段で伸張された画像信号に係わる画像
信号を表示する表示手段と、上記音声信号伸張手段で伸
張された音声信号を拡声する拡声手段とを備えたテレビ
会議方式の画像符号化および表示制御装置において、上
記集音手段を複数個備え、この複数個の集音手段が集音
した音声に基づきこの音声の発声者の位置を特定する位
置特定手段を備えたことを特徴とする。The present invention includes a camera that takes an image, a sound collection means that collects sound, an image signal compression means that compresses an image signal given from the camera to generate a compressed image signal, and an audio signal given from the sound collection means. audio signal compression means for compressing and generating a compressed audio signal; multiplexing transmission means for multiplexing the compressed image signal and the compressed audio signal to generate and transmit a multiplexed signal; a signal separating means for separating an incoming multiplexed signal into a compressed image signal and a compressed audio signal; an image signal expanding means for expanding the compressed image signal separated by the signal separating means; and a signal separating means for expanding the compressed image signal separated by the signal separating means; audio signal expansion means for expanding the compressed audio signal; display means for displaying an image signal related to the image signal expanded by the image signal expansion means; and amplification means for amplifying the audio signal expanded by the audio signal expansion means. A video conference type image encoding and display control device comprising a plurality of the above-mentioned sound collecting means, and identifying the position of the speaker of the sound based on the sound collected by the plurality of sound collecting means. It is characterized by being equipped with a position specifying means.

ここで、上記位置特定手段の特定した位置の近傍の画像
信号を除く画像信号を上記画像信号圧縮手段に与えるこ
とを禁止する画像信号送出禁止手段を備えることが望ま
しい。Here, it is desirable to provide an image signal transmission prohibition means for prohibiting image signals other than image signals in the vicinity of the position specified by the position specifying means from being provided to the image signal compression means.

また、上記位置特定手段の特定した位置を示す位置情報
を送信する特定位置情報送信手段と、上記画像信号送出
禁止手段に代わり、到来する位置情報の示す圧縮画像信
号の部分を除く圧縮画像信号を上記画像信号伸張手段に
与えることを禁止する画像信号伸張禁止手段を備えるこ
とが望ましい。Further, in place of the specific position information transmitting means for transmitting the position information indicating the position specified by the position specifying means and the above-mentioned image signal transmission prohibiting means, the compressed image signal excluding the part of the compressed image signal indicating the incoming position information is provided. It is desirable to include an image signal expansion prohibition means for prohibiting the image signal from being applied to the image signal expansion means.

また、上記画像信号伸張禁止手段に代わり、上記到来す
る圧縮画像信号の伸張処理後に到来する位置情報の示す
部分を拡大して上記表示手段に与える表示拡大手段を備
えことが望ましい。Further, in place of the image signal decompression inhibiting means, it is preferable to include a display enlarging means for enlarging a portion indicated by the positional information that arrives after the incoming compressed image signal has been decompressed and providing it to the display means.

[Effect]

少なくとも２本以上のマイクから入力される音声を基に
発声者の方向を特定し、発声者の位置特定手段の結果に
応じてカメラより入力した画像の特定部分の画像信号の
みを自動的に選択して圧縮し、送信し、また、発声者の
位置特定手段の結果を送信し、この送信された位置特定
結果を受信し、受信した位置特定結果に基づきテレビ画
面の圧縮された信号の伸張を制御し、特定の画像部分の
信号のみを自動的に伸張し表示する。The direction of the speaker is identified based on the audio input from at least two microphones, and only the image signal of a specific part of the image input from the camera is automatically selected according to the result of the speaker's location identification means. the compressed signal of the television screen is decompressed based on the received positioning result; control, and automatically expands and displays only the signal of a specific image part.

〔Example〕

以下、本発明の一実施例を図面に基づき説すする。第１
図は第一実施例で、−台のカメラの撮影した画像の特定
エリアを符号化して送信する場合を示す全体ブロック構
成図である。第１図で第６図、第７図と同一の番号を付
したものは基本的に同様な機能を有するものであるので
説明を省略する。第１図で、２００は２本のマイク２０
および２１から人力される音声などの到達時間差とマイ
ク２０および２１の設置間隔を基に音声などの発信源の
方向を判定し、符号化対象エリアを示す信号を生成して
出力する方向判定回路である。３００は方向判定回路２
００から出力される信号に従ってカメラ撮影画像の特定
エリアを符号化する符号化制御回路である。Hereinafter, one embodiment of the present invention will be explained based on the drawings. 1st
The figure is a first embodiment, and is an overall block configuration diagram showing a case where a specific area of an image taken by - cameras is encoded and transmitted. Components in FIG. 1 labeled with the same numbers as in FIGS. 6 and 7 basically have the same functions, and therefore their explanations will be omitted. In Figure 1, 200 is two microphones 20
and a direction determination circuit that determines the direction of the source of the voice, etc. based on the arrival time difference of the voice input manually from 21 and the installation interval of the microphones 20 and 21, and generates and outputs a signal indicating the area to be encoded. be. 300 is direction determination circuit 2
This is an encoding control circuit that encodes a specific area of a camera-captured image according to a signal output from 00.

第２図は、第二実施例で、カメラから取り込まれる画像
はそのまま符号化して送信するが、サイド情報に復号化
エリアを指定する情報を付加する場合を示す全体ブロッ
ク構成図である。第２図で、第１図、第６図および第７
図と同一の番号を付したものは同一の機能を有するもの
であるので説明を省略する。第二実施例で、マルチプレ
クサ６０は符号化された音声、画像信号の他に方向判定
回路２００の結果を合わせてマルチプレクサする。一方
、デマルチプレクサ８０はマルチプレクサ６０とは逆に
、符号化された音声、画像信号の他に符号化エリアを特
定するサイド情報も分離する。復号化制御回路４００は
復号化エリアを指定するサイド情報に基づき指定エリア
の符号化された画像信号を取り出して画像デコーダ７１
に送出する。画像デコーダ７１は源画像の一部の符号化
データを復号化して表示ＴＶ９０に送出する。FIG. 2 is an overall block configuration diagram showing a second embodiment in which an image captured from a camera is encoded and transmitted as is, but information specifying a decoding area is added to side information. In Figure 2, Figures 1, 6 and 7
Components with the same numbers as those in the drawings have the same functions, so their explanation will be omitted. In the second embodiment, the multiplexer 60 multiplexes the results of the direction determination circuit 200 in addition to the encoded audio and image signals. On the other hand, in contrast to the multiplexer 60, the demultiplexer 80 separates side information specifying the coding area in addition to the coded audio and image signals. The decoding control circuit 400 extracts the encoded image signal of the designated area based on the side information that designates the decoding area and sends it to the image decoder 71.
Send to. The image decoder 71 decodes encoded data of a part of the source image and sends it to the display TV 90.

第３図は、第三実施例でカメラ３０から取り込まれる画
像はそのまま符号化して送信するが、サイド情報に復号
化エリアを指定する情報を付加して送信し、受信側て復
号化の後に表示エリアを特定して拡大表示する場合を示
す全体ブロック構成図である。第３図で、第１図、第２
図、第６図および第７図と同一の番号を付したものは同
一の機能を有するものであるので説明を省略する。第三
実施例で、マルチプレクサ６０は符号化された音声、画
像信号の他に方向判定回路２００の結果を合わせてマル
チプレクサする。一方、デマルチプレクサ８０はマルチ
プレクサ６０とは逆に、符号化された音声、画像信号の
他に符号化エリアを特定するサイド情報も分離する。画
像デコーダ７１は送信された画像データを復号化する。FIG. 3 shows that in the third embodiment, the image captured from the camera 30 is encoded and transmitted as is, but information specifying the decoding area is added to the side information and transmitted, and the receiving side displays it after decoding. FIG. 2 is an overall block configuration diagram showing a case where an area is specified and displayed in an enlarged manner. In Figure 3, Figure 1, 2
Components with the same numbers as in FIG. 6, FIG. 7, and FIG. 7 have the same functions, so their explanation will be omitted. In the third embodiment, the multiplexer 60 multiplexes the encoded audio and image signals as well as the results of the direction determination circuit 200. On the other hand, in contrast to the multiplexer 60, the demultiplexer 80 separates side information specifying the coding area in addition to the coded audio and image signals. Image decoder 71 decodes the transmitted image data.

表示制御回路５００は送信された表示エリアを指定する
サイド情報に基づき指定エリアの画像信号を取り出して
拡大し、表示ＴＶ９０に送出する。The display control circuit 500 takes out the image signal of the designated area based on the transmitted side information designating the display area, enlarges it, and sends it to the display TV 90.

すなわち、この実施例は、撮像するカメラ３０と、集音
する集音手段であるマイク２０および２１と、カメラ３
０から与えられる画像信号を圧縮して圧縮画像信号を生
成する画像信号圧縮手段である画像コ−ダ５０と、」１
記集音手段から与えられる音声信号を圧縮して圧縮音声
信号を生成する音声信号圧縮手段てあ・る音声コーグ５
１と、上記圧縮画像信号と上記圧縮音声信号とを多重化
して多重化信号を生成して送信する多重化送信手段であ
るマルチプレクサ６０と、この多重化送信手段から到来
する多重化信号を圧縮画像信号と圧縮音声信号とに分離
する信号分離手段であるデマルチプレクサ８０と、この
信号分離手段で分離された圧縮画像信号を伸張する画像
信号伸張手段である画像デコーダ７１と、上記信号分離
手段で分離された圧縮音声信号を伸張する音声信号伸張
手段である音声デコーダ７０と、上記画像信号伸張手段
で伸張された画像信号に係わる画像信号を表示する表示
手段である表示ＴＶと、上記音声信号伸張手段で伸張さ
れた音声信号を拡声する拡声手段であるスピーカ４０お
よび４１とを備え、さらに、本発明の特徴とする手段と
して、上記集音手段が集音した音声に基づきこの音声の
発声者の位置を特定する位置特定手段である方向判定回
路２００を備え、ここで、第一の発明は、上記位置特定
手段の特定した位置の近傍の画像信号を除く画像信号を
上記画像信号圧縮手段に与えることを禁止する画像信号
送出禁止手段である符号化制御回路３００を備え、第二
〇発明は、マルチプレクサ６０に、上記位置特定手段の
特定した位置を示す位置情報を送信する特定位置情報送
信手段を含み、上記画像信号圧縮手段に代わり、到来す
る位置情報の示す圧縮画像信号の部分を除く圧縮画像信
号を上記画像信号伸張手段に与えることを禁止する画像
信号伸張禁止手段である復号化制御回路４００を備え、
第三の発明は、上記画像信号伸張禁止手段に代わり、上
記到来する圧縮画像信号の伸張処理後に到来する位置情
報の示す部分を拡大して上記表示手段に与える表示拡大
手段である表示制御回路５００を備える。That is, this embodiment includes a camera 30 for capturing an image, microphones 20 and 21 as a sound collecting means for collecting sound, and a camera 30 for capturing an image.
an image coder 50 which is an image signal compression means for compressing an image signal given from 0 to generate a compressed image signal;
Audio signal compression means for compressing the audio signal given from the recording sound collection means to generate a compressed audio signal
1, a multiplexer 60 which is a multiplexing transmission means for multiplexing the compressed image signal and the compressed audio signal to generate and transmit a multiplexed signal; A demultiplexer 80 is a signal separating means for separating a signal and a compressed audio signal; an image decoder 71 is an image signal expanding means for expanding the compressed image signal separated by the signal separating means; an audio decoder 70 which is an audio signal decompressing means for decompressing the compressed audio signal; a display TV which is a display means for displaying an image signal related to the image signal decompressed by the image signal decompressing means; and the audio signal decompressing means. and loudspeakers 40 and 41, which are amplifying means for amplifying the audio signal decompressed by the sound collecting means. A direction determination circuit 200 is provided as a position specifying means for specifying a position, wherein a first invention provides the image signal compression means with an image signal excluding image signals in the vicinity of the position specified by the position specifying means. The 20th invention includes a coding control circuit 300 which is an image signal transmission prohibition means for prohibiting image signal transmission, and the 20th invention includes specific position information transmitting means for transmitting position information indicating the position specified by the position specifying means to the multiplexer 60. , instead of the image signal compression means, a decoding control circuit 400 is provided which is an image signal decompression prohibition means for prohibiting the compressed image signal excluding the portion of the compressed image signal indicated by the incoming position information from being given to the image signal decompression means. Prepare,
A third aspect of the present invention is a display control circuit 500 which is a display enlargement means, instead of the image signal decompression inhibiting means, which enlarges a portion indicated by the positional information that arrives after decompression processing of the incoming compressed image signal and provides it to the display means. Equipped with

次に、動作について説明する。なお説明の簡単のため第
１図の人間１０．１１および１２とマイク２０および２
１の位置（距離）関係を第８図の状態で説明する。第８
図で、マイク２０と２１とは会議参加者の発声をとらえ
電気信号に変換して出力する。人間１０が発声している
場合には、マイク２１に人力される音声信号はマイク２
０に人力される音声信号に対して距離差によって生じる
Ｔ秒の遅れを持つ。したがって、音速をＳとすればＹ＝
ＴＸＳ＋Ｘの関係がある。マイク２０と２１との出力の
一例を第９図に示す。第９図で、上段はマイク２０の出
力する信号波形、下段はマイク２１の出力する信号波形
を示す。第８図に示した例では、マイク２０の信号波形
を基準にマイク２１の信号波形を観測すると、以下３つ
の場合が存在する。Next, the operation will be explained. For ease of explanation, humans 10, 11 and 12 and microphones 20 and 2 in Figure 1 are used.
The positional (distance) relationship of 1 will be explained using the state shown in FIG. 8th
In the figure, microphones 20 and 21 capture the voices of conference participants and convert them into electrical signals for output. When the human 10 is speaking, the voice signal inputted to the microphone 21 is transmitted to the microphone 2.
There is a delay of T seconds caused by the distance difference with respect to the human input audio signal. Therefore, if the speed of sound is S, then Y=
There is a relationship between TXS+X. An example of the output of the microphones 20 and 21 is shown in FIG. In FIG. 9, the upper row shows the signal waveform output from the microphone 20, and the lower row shows the signal waveform output from the microphone 21. In the example shown in FIG. 8, when the signal waveform of the microphone 21 is observed based on the signal waveform of the microphone 20, the following three cases exist.

■　マイク２１の信号波形の遅れＴが正の値の場合：人
間１０が発声 ■　マイク２０と２１の信号波形間の遅れＴが０の場合
：人間１１が発声 ■　マイク２１の信号波形の遅れＴが負の値の場合：人
間１２が発声である。なお、本実施例では説明を簡単にするために人
間を３人、マイクの個数を２個、相互の距離関係を同一
としたが、人間の総数、マイクの総数などが複数であり
、人間とマイクの距離関係が不均一の場合でも、人間が
ＴＶカメラでの撮影に支障が無い程度にカメラの光軸に
対して直角に近く直線状に整列し、かつマイクの配置が
人間の配列に対しほぼ並行またはほぼ直交に近い場合で
、マイク間の距離と信号の到達時間差が明確であれば、
同様の基本原理で方向を特定することができる。方向判
定回路２００は発声者の方向を特定し、特定した方向に
従って符号化エリアを示す制御信号、復号化または表示
エリアをサイド情報として生成する。例えば上述の■の
場合に、第一実施例では、カメラ撮影画像の内、第４図
の符号化エリア１を符号化して送出する。また、第二実
施例では、第５図の復号化エリア１を示すサイド情報を
生成し送出する。また、第三実施例でも、第二実施例と
同様に、サイド情報を送出し、受信側では受信画像をデ
コードした後に、サイド情報に従って第５図の復号化エ
リア１に相当する部分のみを拡大して表示を行う。■ If the delay T of the signal waveform of the microphone 21 is a positive value: the human 10 speaks ■ If the delay T between the signal waveforms of the microphones 20 and 21 is 0: the human 11 speaks ■ The delay T of the signal waveform of the microphone 21 If is a negative value: the human 12 is speaking. In this example, in order to simplify the explanation, there are three people, two microphones, and the distance relationship between them is the same. However, the total number of people, the total number of microphones, etc. are plural, and Even if the distance between the microphones is uneven, the person should be aligned in a straight line at right angles to the camera's optical axis to the extent that it does not interfere with the TV camera's shooting, and the microphone placement should be relative to the person's arrangement. If the microphones are almost parallel or orthogonal, and the distance between the microphones and the difference in arrival time of the signals are clear,
Direction can be determined using a similar basic principle. The direction determining circuit 200 specifies the direction of the speaker, and generates a control signal indicating the encoding area and a decoding or display area as side information according to the specified direction. For example, in the case of (2) above, in the first embodiment, encoding area 1 in FIG. 4 of the camera-captured image is encoded and transmitted. Furthermore, in the second embodiment, side information indicating decoding area 1 in FIG. 5 is generated and sent. Also, in the third embodiment, similarly to the second embodiment, side information is sent, and after decoding the received image on the receiving side, only the portion corresponding to decoding area 1 in FIG. 5 is enlarged according to the side information. and display it.

〔Effect of the invention〕

本発明は、以上説明したように、ＴＶカメラの操作者に
よる撮影画像の選択を不必要とし、専任のオペレータや
参加者の模作を回避することができるので、会議進行の
円滑化が図れ、またオペレータによる秘密漏洩の問題を
回避することができる効果がある。As explained above, the present invention eliminates the need for a TV camera operator to select a photographed image and avoids imitations by a full-time operator or participants, thereby facilitating the smooth progress of a meeting. This has the effect of avoiding the problem of secret leakage by operators.

[Brief explanation of drawings]

第１図は本発明の第一実施例を示す全体ブロック図。第２図は第二実施例を示す全体ブロック図。第３図は第三実施例を示す全体ブロック図。第４図は符号化エリアの例を示す図。第５図は復号化エリアの例を示す図。第６図は従来例を示す全体ブロック図。第７図は従来例を示す全体ブロック図。第８図はマイクと会議参加者との位置関係の一例を示す
図。第９図はマイクに入力される音声の信号波形を示す図。１０．１１．１２・・・人間、２０．２１・・・マイク
、３０・・・カメラ、３１・・・カメラコントローラ、
４０．４１・・・スピーカ、５０・・・画像コーグ、５
１・・・音声コーグ、６０．６１・・・マルチプレクサ
、７０・・・音声デコーダ、７１・・・画像デコーダ、
８０．８１・・・デマルチプレクサ、９０・・・表示Ｔ
Ｖ。１００・・・受信回線、１０１・・・送信回線、２００
・・・方向判定回路、３００・・・符号化制御回路、４
００・・・復号化制御回路、５００・・・表示制御回路
。FIG. 1 is an overall block diagram showing a first embodiment of the present invention. FIG. 2 is an overall block diagram showing a second embodiment. FIG. 3 is an overall block diagram showing a third embodiment. FIG. 4 is a diagram showing an example of a coding area. FIG. 5 is a diagram showing an example of a decoding area. FIG. 6 is an overall block diagram showing a conventional example. FIG. 7 is an overall block diagram showing a conventional example. FIG. 8 is a diagram showing an example of the positional relationship between microphones and conference participants. FIG. 9 is a diagram showing the signal waveform of audio input to the microphone. 10.11.12...Human, 20.21...Microphone, 30...Camera, 31...Camera controller,
40.41...Speaker, 50...Image Korg, 5
1... Audio cog, 60. 61... Multiplexer, 70... Audio decoder, 71... Image decoder,
80.81... Demultiplexer, 90... Display T
V. 100... Receiving line, 101... Transmitting line, 200
... Direction determination circuit, 300 ... Encoding control circuit, 4
00...Decoding control circuit, 500...Display control circuit.

Claims

[Scope of Claims] 1. A camera that takes an image, a sound collection means that collects sound, an image signal compression means that compresses an image signal provided from the camera to generate a compressed image signal, and a video signal provided from the sound collection means. audio signal compression means for compressing an audio signal to generate a compressed audio signal; multiplexing transmission means for multiplexing the compressed image signal and the compressed audio signal to generate and transmit a multiplexed signal; a signal separating means for separating the multiplexed signal arriving from the signal separating means into a compressed image signal and a compressed audio signal; an image signal expanding means for expanding the compressed image signal separated by the signal separating means; and the signal separating means. audio signal expansion means for expanding the compressed audio signal separated by the image signal expansion means; display means for displaying an image signal related to the image signal expanded by the image signal expansion means; A video conference image display control device equipped with a loudspeaker means for amplifying the sound, comprising a plurality of the above-mentioned sound collecting means, and a position for identifying the position of the speaker of the sound based on the sound collected by the plurality of sound collecting means. A video conference image display control device comprising specifying means. 2. The video conference image display control according to claim 1, further comprising image signal transmission prohibiting means for prohibiting image signals other than image signals in the vicinity of the position specified by the position specifying means from being provided to the image signal compression means. Device. 3. Specific position information transmitting means for transmitting position information indicating the position specified by the position specifying means, and instead of the image signal compression means, transmitting the compressed image signal excluding the part of the compressed image signal indicating the incoming position information. 2. The video conference image display control device according to claim 1, further comprising image signal decompression prohibition means for prohibiting the image signal from being provided to the image signal decompression means. 4. The television set according to claim 3, further comprising display enlarging means in place of the image signal decompression inhibiting means, which magnifies a portion indicated by positional information that arrives after decompression processing of the incoming compressed image signal and provides it to the display means. Conference image display control device. 5. A first step in which an image signal and an audio signal are generated by a camera that images a scene including multiple speakers and a sound collection means that collects the voices of the speakers; The second step is to compress and encode the received signals, multiplex them and send them out, and separate the incoming multiplexed signals, expand each separated signal to restore the image signal and the audio signal, and then and a third step of displaying an image signal. 1. A video conference image display control method comprising the steps of specifying the location of a speaker based on the location of the speaker, selecting the speaker at the identified location, and generating an image signal related to the selected speaker. 6. A first step of generating an image signal and an audio signal with each of a camera that images a scene including a plurality of speakers and a sound collection means that collects the voices of the speakers; a second step of compressing and encoding the generated signal, multiplexing it and sending it out; separating the incoming multiplexed signal and decompressing each separated signal to restore the image signal and the audio signal; In the video conference image display control method comprising a third step of displaying the image signal, the first step includes placing the sound collecting means at separate positions, the step of identifying the location of the speaker based on the time difference, generating a selection signal indicating the identified location, and generating an image signal corresponding to a scene including the plurality of speakers together with the selection signal; A video conference image display control method characterized in that the third step includes the step of expanding and displaying an image signal related to a speaker selected from incoming image signals based on an incoming selection signal. 7. A first step of generating an image signal and an audio signal with each of a camera that images a scene including a plurality of speakers and a sound collection means that collects the voices of the speakers; a second step of compressing and encoding the generated signal, multiplexing it and sending it out; separating the incoming multiplexed signal and decompressing each separated signal to restore the image signal and the audio signal; In the video conference image display control method comprising a third step of displaying the image signal, the first step includes placing the sound collecting means at separate positions, the step of identifying the location of the speaker based on the time difference, generating a selection signal indicating the identified location, and generating an image signal corresponding to a scene including the plurality of speakers together with the selection signal; The television is characterized in that the third step includes the step of expanding the incoming image signal, and selecting and enlarging and displaying the image signal related to the speaker from among the expanded image signals based on the incoming selection signal. Conference image display control method.