JPH06178295A

JPH06178295A - Picture signal processing unit for video conference and utterance party pattern mgnification synthesis device

Info

Publication number: JPH06178295A
Application number: JP4329089A
Authority: JP
Inventors: Katsumi Kitajima; 克美北島
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-12-09
Filing date: 1992-12-09
Publication date: 1994-06-24

Abstract

PURPOSE:To simplify the setting of a field angle and to switch the display of an utterance party picture smoothly by calling out data in a picture frame set in response to the input from an utterance party discrimination device, magnifying the data and synthesizing the magnified data to an original entire picture. CONSTITUTION:The device is provided with a picture signal processing unit 4 which extracts an optional picture range designated by an operation device 3 in advance as a closeup frame and moves it to other picture position, magnifies it and synthesizes the magnified signal to an original picture and with an utterance party discrimination device 5 which automatically discriminates an utterance party through voice detection to call out the set closeup frame. In this case, a preset frame of a picture frame designating to close up each of conference participants is designated in an optional size and position and stored. For example, when a conference participant makes utterance, the utterance party discrimination device 5 calls one of preset frames and the picture segmented by the preset frame is magnified and inserted and synthesized at a designated position in an original full scene picture and the result is displayed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テレビ会議システムで
使用するテレビ会議用画像信号処理装置及び発言者画面
拡大合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video conference image signal processing device and a speaker screen enlargement / synthesis device used in a video conference system.

【０００２】[0002]

【従来の技術】従来、複数人が参加するテレビ会議では
発言者を撮像するために、撮像視野を移動するためのカ
メラ旋回台および撮像範囲の可変なズームレンズを有す
る電動旋回式テレビカメラを使用していた。それぞれの
参加者をクローズアップするためには、旋回台を左右上
下に操作したり、ズームレンズを広角または望遠にした
り、フォーカス調整をする必要があった。またそれらの
操作を省略する目的で、あらかじめ参加者各人像のクロ
ーズアップ画面のための旋回台やズームレンズの最適位
置を記憶させ、プリセット制御する方法が用いられてい
る。更に、音声の検出により自動的に発言者を判定し
て、プリセット制御する音声自動追尾機能も採用されて
いる。2. Description of the Related Art Conventionally, in a video conference in which a plurality of people participate, an electric swivel type TV camera having a camera swivel for moving an image pick-up field and a zoom lens having a variable image pick-up range is used for picking up an image of a speaker. Was. In order to get a close-up of each participant, it was necessary to operate the swivel vertically and horizontally, to make the zoom lens wide-angle or telephoto, and to adjust the focus. Further, for the purpose of omitting those operations, a method is used in which optimum positions of a swivel base and a zoom lens for a close-up screen of each participant's image are stored in advance and preset control is performed. Furthermore, a voice automatic tracking function for automatically determining the speaker by detecting voice and performing preset control is also adopted.

【０００３】図４は従来のテレビ会議システムのカメラ
制御装置の構成を示すブロック図である。図４におい
て、会議参加者Ａ，Ｂ，Ｃのそれぞれにマイクロホン２
１−１〜２１−３と音声検出器２２−１〜２２−３を備
え、旋回台２４とズームレンズ２５を備えたテレビカメ
ラ２３と、操作器２６と、カメラ旋回台制御装置２７
と、発言者判定装置２８とから構成されている。まず、
参加者Ａ，Ｂ，Ｃの各クローズアップ画面を設定するた
めに、操作器２６を操作して旋回台２４やズームレンズ
２５を制御し、３つのポジションを記憶させる。ここ
で、それぞれの設定ポジションをＰａ（Ａの画面），Ｐ
ｂ（Ｂの画面），Ｐｃ（Ｃの画面）と呼ぶことにする。
操作器２６には、これらのポジジョンを呼び出し、テレ
ビカメラのプリセット制御を行うための単一操作キーが
備えられている。例えば、Ｐａの呼び出しキーを押すこ
とによって、旋回台２４及びズームレンズ２５が動作
し、Ａをクローズアップする。また、Ｂが発言すると音
声検出器２２−２がその発言音声に反応し発言者判定装
置２８へ音声検出信号を出力し、それに応じてＰｂを呼
び出すように発言者判定装置２８がカメラ旋回台制御装
置２７にＰｂ呼び出し信号を出力する。そして、Ｐｂポ
ジションまで旋回台２４及びズームレンズ２５が移動
し、Ｂがクローズアップされる。同様にＣをクローズア
ップするのは、操作リモコンのＰｃ呼び出しキーを押す
か、Ｃが発言して音声が検出された場合である。FIG. 4 is a block diagram showing the configuration of a camera control device of a conventional video conference system. In FIG. 4, a microphone 2 is provided to each of the conference participants A, B, and C.
1-1 to 21-3 and voice detectors 22-1 to 22-3, a television camera 23 having a swivel base 24 and a zoom lens 25, an operation device 26, and a camera swivel base control device 27.
And a speaker determination device 28. First,
In order to set the close-up screens of the participants A, B, and C, the operation device 26 is operated to control the swivel base 24 and the zoom lens 25, and the three positions are stored. Here, the respective setting positions are Pa (screen of A), P
These are referred to as b (screen of B) and Pc (screen of C).
The operation unit 26 is provided with a single operation key for calling these positions and performing preset control of the television camera. For example, by pressing the Pa calling key, the swivel base 24 and the zoom lens 25 are operated, and A is closed up. When B speaks, the voice detector 22-2 responds to the voice and outputs a voice detection signal to the speaker determination device 28, and the speaker determination device 28 controls the camera swivel base to call Pb accordingly. The Pb calling signal is output to the device 27. Then, the swivel base 24 and the zoom lens 25 are moved to the Pb position, and B is closed up. Similarly, C is closed up when the Pc call key of the operation remote controller is pressed or when C speaks and voice is detected.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら上記の従
来のカメラ制御装置では、１つ１つのクローズアップポ
ジションを設定するための旋回台とズームレンズの操作
が煩雑で、所望の撮像画面を得るためにいろいろな操作
キーを使わなければならなかった。また、プリセットし
たポジションまで旋回台とズームレンズが移動するのに
若干の時間を要し、キー操作や発言への追従性が悪いと
いう問題点を有していた。However, in the above-mentioned conventional camera control device, the operation of the swivel base and the zoom lens for setting each close-up position is complicated, and in order to obtain a desired image pickup screen. I had to use various operation keys. In addition, it takes some time for the swivel base and the zoom lens to move to the preset position, and there is a problem in that followability to key operations and speech is poor.

【０００５】本発明は上記従来の問題を解決するもの
で、旋回台やズームレンズなどの制御によらず、常時会
議参加者全体を撮像しているテレビカメラによる映像だ
けを用いて、画像信号処理によって各参加者のクローズ
アップ画面を得てそれを拡大し、元の全体像に合成する
ことにより、画角設定を簡略化し、画面呼び出し操作や
発言者音声に対する追随性がよく、円滑に発言者画面の
表示切り替えができる優れたテレビ会議用発言者画面拡
大合成装置を提供することを目的とする。The present invention solves the above-mentioned conventional problems by performing image signal processing without using the control of the swivel or the zoom lens, but only by using the image from the television camera that constantly images the entire conference participants. By obtaining each participant's close-up screen, enlarging it, and synthesizing it in the original overall image, the angle of view setting is simplified, the screen call operation and the voice of the speaker are well followed, and the speaker smoothly It is an object of the present invention to provide an excellent speaker conference screen enlarging / synthesizing device for video conference, which enables switching of screen display.

【０００６】[0006]

【課題を解決するための手段】この目的を達成するため
に本発明は、会議参加者全体を撮影するテレビカメラ
と、映像を表示するモニターテレビと、全景画面の中か
ら各会議参加者のクローズアップ画面範囲を設定したり
呼び出したりするための操作器と、指定される任意の画
面範囲をクローズアップ枠として抽出し、その画面枠を
別の画面位置へ移動し拡大して元の画面に合成する画像
信号処理装置との構成を有する。In order to achieve this object, the present invention provides a TV camera for photographing the entire conference participants, a monitor TV for displaying images, and a closing of each conference participant from the panoramic screen. An operation device for setting and calling the up screen range and an arbitrary specified screen range are extracted as a close-up frame, and the screen frame is moved to another screen position and enlarged to synthesize the original screen. And an image signal processing device that operates.

【０００７】[0007]

【作用】この構成によって、会議参加者全体を撮影する
ようにカメラを固定して、その表示画面中から各参加者
のクローズアップ画面枠を設定し、記憶させることによ
り、従来のように旋回台やズームレンズを操作すること
を必要とせず、クローズアップ画面を速やかに呼び出す
ことができる。また、１台のテレビカメラの撮像範囲内
で、任意の複数画面枠を設定することにより、元の全景
映像に発言者のクローズアップ画面枠を拡大合成して表
示することができる。With this configuration, the camera is fixed so that the entire participants of the conference are photographed, and the close-up screen frame of each participant is set and stored from the display screen, so that the swivel base can be operated as in the conventional case. You can quickly call up the close-up screen without having to operate the or zoom lens. Further, by setting arbitrary plural screen frames within the imaging range of one TV camera, the close-up screen frame of the speaker can be enlarged and combined and displayed on the original panoramic image.

【０００８】[0008]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。図１は本発明の一実施例の構成を示
すブロック図、図２は同実施例の画像処理を説明する図
であって、（ａ）は会議参加者全体を撮影した画像、
（ｂ）は各会議参加者クローズアップ画面枠を示す図、
（ｃ）は会議参加者全体を撮影した画像に発言者のクロ
ーズアップ画像を拡大し合成した画面を示す図である。
図３は本発明の一実施例における画像信号処理装置内の
ビデオ用のデジタル・シグナル・プロセッサ（以降ＤＳ
Ｐと記す）部のプリセット枠画面拡大時の画素補間動作
を説明するための図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. 1 is a block diagram showing the configuration of an embodiment of the present invention, FIG. 2 is a diagram for explaining the image processing of the embodiment, and (a) is an image of the entire conference participants,
(B) is a diagram showing a close-up screen frame of each conference participant,
(C) is a diagram showing a screen in which a close-up image of a speaker is enlarged and combined with an image of the entire conference participants.
FIG. 3 shows a digital signal processor for video (hereinafter referred to as DS) in an image signal processing apparatus according to an embodiment of the present invention.
FIG. 6 is a diagram for explaining a pixel interpolation operation when enlarging a preset frame screen of a portion (denoted as P).

【０００９】図１において、１は会議風景の全体を撮影
するためのテレビカメラ、２は映像を表示するモニター
テレビ、３は全景画面の中から会議参加者各人のクロー
ズアップ画面範囲を設定したり、各人のクローズアップ
画面を呼び出したりするための操作器、４はあらかじめ
操作器３によって指定された任意の画面範囲をクローズ
アップ枠として抽出し、別の画面位置へ移動し拡大して
元の画面に合成するための画像信号処理装置、５は音声
の検出により発言者を自動的に判定して、設定されたク
ローズアップ枠を呼び出すための発言者判定装置であ
る。テレビカメラ１の映像信号出力は画像信号処理装置
４へ入力され、画像信号処理された後モニターテレビ２
および伝送路へ出力される。操作器３の操作信号と発言
者判定装置５のプリセット呼出し信号とは共に画像信号
処理装置４へ入力される。画像信号処理装置４は映像信
号デコーダ４０１と、パルス発生器４０２と、Ａ／Ｄ変
換器４０３と、ラインメモリ４０４と、ビデオ用デジタ
ル・シグナル・プロセッサ４０５と、その外付けＲＡＭ
４０６と、キャラクタ・ジェネレータＲＯＭ４０７と、
Ｄ／Ａ変換器４０８と、映像信号エンコーダ４０９と、
キーボード・インターフェース４１０と、ＣＰＵ４１１
とから構成されている。In FIG. 1, 1 is a television camera for photographing the entire conference scene, 2 is a monitor television for displaying images, and 3 is a panoramic screen for setting the close-up screen range of each conference participant. Or, the operation device 4 for calling up each person's close-up screen extracts an arbitrary screen range specified by the operation device 3 in advance as a close-up frame, moves it to another screen position, and enlarges it. The image signal processing device 5 for synthesizing on the screen is a speaker determination device for automatically determining the speaker by detecting the voice and calling the set close-up frame. The video signal output of the television camera 1 is input to the image signal processing device 4, and after the image signal processing, the monitor television 2
And output to the transmission line. Both the operation signal of the operation device 3 and the preset calling signal of the speaker determination device 5 are input to the image signal processing device 4. The image signal processing device 4 includes a video signal decoder 401, a pulse generator 402, an A / D converter 403, a line memory 404, a video digital signal processor 405, and an external RAM thereof.
406, a character generator ROM 407,
A D / A converter 408, a video signal encoder 409,
Keyboard interface 410 and CPU 411
It consists of and.

【００１０】以上のように構成された発言者画面拡大合
成装置について、図１と図２を用いてその動作を説明す
る。The operation of the speaker screen enlarging / synthesizing device configured as described above will be described with reference to FIGS. 1 and 2.

【００１１】図２において（ａ）はテレビカメラ１の撮
像した全景画面で、図１におけるテレビカメラ１から画
像信号処理装置４への入力映像信号を示すものであり、
（ｂ）は全景画面中の各参加者をクローズアップするた
めに指定した枠を示す画面で、図１における操作器３で
指定されるクローズアップ枠を線で示している。（ｃ）
は指定されたクローズアップ枠画面を拡大して元の画面
に合成した画面で、図１における画像信号処理装置４か
らモニターテレビ２への出力映像信号である。図２
（ａ）において会議参加者をそれぞれＡ，Ｂ，Ｃとし、
（ｂ）においてそのＡ，Ｂ，Ｃをおのおの一人ずつクロ
ーズアップするよう指定した画面枠をそれぞれＰ１，Ｐ
２，Ｐ３とする。これらのプリセット枠は任意のサイズ
で任意の位置に指定でき、記憶される。例えば、会議中
にＡが発言したとき、図１の発言者判定装置５によっ
て、このプリセット枠Ｐ１〜Ｐ３の中からＰ１が呼び出
され、またＢが発言したときにはＰ２が呼び出され、そ
のプリセット枠で切り取った画面が拡大され、元の全景
画面の指定された位置にはめ込み合成され、図２（ｃ）
のように画面表示されるのが本発明の一実施例の動作で
ある。In FIG. 2, (a) is a panoramic screen imaged by the television camera 1, showing an input video signal from the television camera 1 to the image signal processing device 4 in FIG.
(B) is a screen showing a frame specified to close up each participant in the panoramic screen, and the close-up frame specified by the operation device 3 in FIG. 1 is indicated by a line. (C)
Is a screen obtained by enlarging a designated close-up frame screen and combining it with the original screen, which is an output video signal from the image signal processing device 4 to the monitor television 2 in FIG. Figure 2
In (a), the conference participants are A, B, and C, respectively,
In (b), the screen frames designated to close up A, B, and C respectively are designated P1 and P, respectively.
2, P3. These preset frames can be specified in any size and at any position and are stored. For example, when A speaks during the conference, the speaker determination device 5 in FIG. 1 calls P1 from the preset frames P1 to P3, and when B speaks, P2 is called and the preset frame is used. The cropped screen is enlarged, and the original panoramic screen is inset and synthesized at a specified position, as shown in FIG.
The screen display as described above is the operation of one embodiment of the present invention.

【００１２】次に、画像信号処理装置４の動作を説明す
る。テレビカメラ１から入力するＮＴＳＣ複合映像信号
の映像信号入力は映像信号デコーダ４０１で復調されて
ＲＧＢ信号となり、Ａ／Ｄ変換器４０３でデジタル化さ
れ次のラインメモリ４０４により倍速走査変換（インタ
レース→ノンインターレース走査変換）され、ビデオ用
ＤＳＰ４０５へサンプリングデータとして取り込まれ
る。また、映像信号デコーダ４０１では映像入力信号か
らの同期分離が行われ、水平同期信号、垂直同期信号、
バースト信号等が生成され、パルス発生器４０２へ出力
される。パルス発生器４０２は分離された同期信号から
本装置に必要なタイミングを発生する回路により、Ａ／
Ｄ変換器４０３のサンプリングクロック及びラインメモ
リ４０４の書き込みクロックとなる４ｆsc（ｆscは色副
搬送波周波数）、ラインメモリ４０４の読み出しクロッ
ク及びビデオ用ＤＳＰ４０５のサンプリングクロックと
なる８ｆsc、ラインメモリ４０４のコントロール信号等
を供給する。ノンインターレース走査によりデシタルサ
ンプリングされたＲＧＢ信号は、ビデオ用ＤＳＰ４０５
によって外付けＲＡＭ４０６上に１フレーム毎に記憶さ
れ、後述するデジタル画像信号処理によって加工され、
キャラクタジェネレータＲＯＭ４０７の文字をスーパー
インポーズされたりしたのち、再びＤ／Ａ変換器４０８
と映像信号エンコーダ４０９により変調され、図２
（ｃ）のような画面映像として出力される。Ｄ／Ａ変換
器４０８のクロック、映像信号エンコーダ４０９の同期
信号はパルス発生器４０２から供給される。また、操作
器３からの制御データは、キーボードインターフェース
４１０を介してＣＰＵ４１１が処理する。ＣＰＵ４１１
は発言者判定装置５や操作器３からのプリセット枠呼び
出し信号またはプリセット枠設定処理信号によって、図
２のような発言者画面やプリセット枠設定操作中の画面
を構成するために必要なメモリアドレスやキャラクタ情
報等のメモリ制御データをビデオ用ＤＳＰ４０５に送
る。ビデオ用ＤＳＰ４０５はこのＣＰＵ４１１からの制
御データに従って、次に説明するような画像信号処理動
作を行う。Next, the operation of the image signal processing device 4 will be described. The video signal input of the NTSC composite video signal input from the television camera 1 is demodulated by the video signal decoder 401 into an RGB signal, digitized by the A / D converter 403, and then double-speed scan converted by the next line memory 404 (interlace → Non-interlaced scan conversion) is performed, and the data is captured by the video DSP 405 as sampling data. Further, the video signal decoder 401 performs sync separation from the video input signal, and a horizontal sync signal, a vertical sync signal,
A burst signal or the like is generated and output to the pulse generator 402. The pulse generator 402 is a circuit that generates the timing necessary for this device from the separated sync signal,
4fsc (fsc is a color subcarrier frequency) that is the sampling clock of the D converter 403 and the writing clock of the line memory 404, 8fsc that is the reading clock of the line memory 404 and the sampling clock of the video DSP 405, the control signal of the line memory 404, etc. To supply. The RGB signal digitally digitally sampled by the non-interlaced scanning is used for the video DSP 405.
Is stored in the external RAM 406 for each frame and processed by digital image signal processing described later,
After the characters in the character generator ROM 407 are superimposed, the D / A converter 408 is again used.
2 is modulated by the video signal encoder 409 as shown in FIG.
It is output as a screen image as shown in (c). The clock of the D / A converter 408 and the synchronizing signal of the video signal encoder 409 are supplied from the pulse generator 402. The control data from the operation device 3 is processed by the CPU 411 via the keyboard interface 410. CPU411
Is a memory address necessary for constructing a speaker screen or a screen during preset frame setting operation as shown in FIG. 2 by a preset frame calling signal or a preset frame setting processing signal from the speaker determination device 5 or the controller 3. Memory control data such as character information is sent to the video DSP 405. The video DSP 405 performs an image signal processing operation described below according to the control data from the CPU 411.

【００１３】図３はプリセット枠画面拡大時の画素補間
動作を説明する図である。まず、設定されたプリセット
枠を呼び出す時のビデオ用ＤＳＰ４０５の動作を図３を
用いて説明する。はじめに選択されたプリセット枠内の
画素を原データとして抽出してくる。これはプリセット
枠設定時にＣＰＵ４１１から与えられるプリセット枠の
上下端のライン番号（行番号）及び左右端のサンプリン
グデータ番号（列番号）に対応したメモリアドレスに記
憶されているサンプリングデータをそのエリアから取り
出すことにより行われる。次に取り出されたプリセット
枠内の画素データを、画面的に拡大させるために伸張す
る。簡単のため２倍伸張の場合を考えることにする。こ
のとき、前サンプル、前ラインのデータを使ってそのま
ま補間するだけの伸張方法では、画面表示したときにプ
リセット枠内だけ粗い画質になってしまうので、補間す
るデータとしては、図３に示すように、前後のサンプ
ル、前後のラインの原データの加算平均値を使う。そう
して伸張されたプリセット枠内の画素データを、ＲＡＭ
４０６上の同フレームのサンプルデータのある合成画面
メモリエリアに書き込む。そのフレームのデータを読み
出し、再生することによって、全景画面上の指定した不
要な部分（図２の例では中央下）に相当する所に、拡大
抽出したプリセット枠の発言者画面が表示されることに
なる。以上のような画像信号処理を受けた映像信号出力
は、モニタに表示されるとともに、テレビ会議において
通信中の相手会議端末機へ伝送される。FIG. 3 is a diagram for explaining the pixel interpolation operation when the preset frame screen is enlarged. First, the operation of the video DSP 405 when calling the set preset frame will be described with reference to FIG. First, the pixels within the selected preset frame are extracted as original data. This is to extract the sampling data stored in the memory addresses corresponding to the line numbers (row numbers) at the upper and lower ends of the preset frame and the sampling data numbers (column numbers) at the left and right ends given from the CPU 411 when setting the preset frame from the area. It is done by Next, the extracted pixel data in the preset frame is expanded in order to enlarge it on the screen. For simplicity, let us consider the case of double expansion. At this time, with the decompression method in which the data of the previous sample and the data of the previous line are simply interpolated as they are, the image quality becomes rough only in the preset frame when displayed on the screen. Therefore, the data to be interpolated is as shown in FIG. For, the average value of the raw data of the sample before and after, the original data of the line before and after is used. The pixel data in the preset frame expanded in this way is stored in the RAM.
The data is written in the combined screen memory area on the sampled data 406 where the sample data of the same frame exists. By reading and reproducing the data of that frame, the speaker screen of the enlarged and extracted preset frame is displayed at a position corresponding to the designated unnecessary portion (lower center in the example of FIG. 2) on the panoramic screen. become. The video signal output subjected to the image signal processing as described above is displayed on the monitor and transmitted to the conference terminal of the other party in communication in the video conference.

【００１４】なお、テレビカメラ１はその出力映像が部
分拡大されるため高解像度のものが望ましい。Since the output image of the television camera 1 is partially enlarged, a high resolution one is desirable.

【００１５】[0015]

【発明の効果】以上のように本発明は、映像信号デコー
ダと、パルス発生器と、Ａ／Ｄ変換器と、ラインメモリ
と、ビデオ用デジタル・シグナル・プロセッサと、その
外付けＲＡＭと、キャラクタ・ジェネレータＲＯＭと、
Ｄ／Ａ変換器と、映像信号エンコーダと、キーボード・
インターフェースと、ＣＰＵとを所定の動作となるよう
に接続して回路中に備え、テレビカメラで撮像されたテ
レビ会議の全体風景の映像入力から会議参加者各個人像
をクローズアップするような画面枠を設定・記憶する機
能と、発言者判定装置からの入力に応じて設定した画面
枠内のデータを呼び出し、その画面枠内の画像を拡大し
て元の全体画面に合成して映像出力する機能とを有する
ことにより、従来のようにプリセット式電導旋回台やズ
ームレンズを用いることなく、すなわち煩雑なカメラ操
作をすることなく、固定式のテレビカメラ１台の映像信
号から複数の発言者画面を枠で囲むだけの簡単な操作で
設定でき、また画像呼び出しから発言者が画面に表示さ
れるまでが瞬時であるため、従来のようなリモコン操作
や音声検出後の旋回台・レンズの動作時間による追随性
の悪さのない優れたテレビ会議用画像信号処理装置及び
発言者画面拡大合成装置を実現できるものである。As described above, according to the present invention, a video signal decoder, a pulse generator, an A / D converter, a line memory, a video digital signal processor, its external RAM, and a character.・ Generator ROM,
D / A converter, video signal encoder, keyboard
A screen frame in which an interface and a CPU are connected to each other so as to perform a predetermined operation and provided in a circuit, and a personal image of each participant in the conference is closed up from the video input of the entire landscape of the video conference captured by the video camera. A function to set and store, and a function to call the data in the screen frame set according to the input from the speaker determination device, enlarge the image in the screen frame, synthesize it with the original whole screen, and output the image With the above, it is possible to display a plurality of speaker screens from a video signal of one fixed type television camera without using a preset type electrically conductive swivel base or a zoom lens as in the past, that is, without performing a complicated camera operation. It can be set with a simple operation that only encloses it in a frame, and since it is instant from when the image is called to when the speaker is displayed on the screen, the conventional remote control operation and the operation after voice detection are performed. In which can realize the video conference image signal processing device and the speaker screen magnifier synthesizing apparatus excellent no conformability of poor due to the operation time of the base lens.

[Brief description of drawings]

【図１】本発明の一実施例の構成を示すブロック図FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】（ａ）は会議参加者全体を撮影した画像を示す
図（ｂ）は各会議参加者クローズアップ画面枠を示す図（ｃ）は会議参加者全体を撮影した画像に発言者のクロ
ーズアップ画像を拡大し合成した画面を示す図[FIG. 2] FIG. 2A is a diagram showing an image of the entire conference participants. FIG. 2B is a diagram showing a close-up screen frame of each conference participant. Figure showing the enlarged and combined screen of the close-up image

【図３】プリセット枠画面拡大時の画素補間動作を説明
する図FIG. 3 is a diagram illustrating a pixel interpolation operation when enlarging a preset frame screen.

【図４】従来のテレビ会議システムのカメラ制御装置の
構成を示すブロック図FIG. 4 is a block diagram showing a configuration of a camera control device of a conventional video conference system.

[Explanation of symbols]

１テレビカメラ２モニターテレビ３操作器４画像信号処理装置５発言者判定装置４０１映像信号デコーダ４０２パルス発生器４０３Ａ／Ｄ変換器４０４ラインメモリ４０５ビデオ用デジタル・シグナル・プロセッサ４０６ＲＡＭ４０７キャラクタ・ジェネレータＲＯＭ４０８Ｄ／Ａ変換器４０９映像信号エンコーダ４１０キーボード・インターフェース４１１ＣＰＵ DESCRIPTION OF SYMBOLS 1 TV camera 2 Monitor TV 3 Operation device 4 Image signal processing device 5 Speaker determination device 401 Video signal decoder 402 Pulse generator 403 A / D converter 404 Line memory 405 Video digital signal processor 406 RAM 407 Character generator ROM 408 D / A converter 409 Video signal encoder 410 Keyboard interface 411 CPU

Claims

[Claims]

1. A video signal decoder, a pulse generator, and A
/ D converter, line memory, video digital signal processor, its external RAM, character generator ROM, D / A converter, video signal encoder, keyboard interface, CP
It is equipped with U by connecting to U so as to have a predetermined operation,
A function to set and store a screen frame that closes up individual images of each participant in the conference from the video input of the entire scene of the video conference captured by the TV camera, and a screen set according to the input from the speaker determination device. Call the data in the frame,
An image signal processing apparatus for a video conference, comprising a function of enlarging an image in the screen frame, synthesizing the original whole screen, and outputting the image.

2. An overall view of a video conference imaged by the video camera, comprising the image signal processing device for video conference according to claim 1, a TV camera, a monitor TV, an operating unit, and a speaker judging device. From the video input of, the screen frame that closes up the individual images of the conference participants in advance is set and stored by the operation device, and the image of the speaker's image frame specified by the speaker determination device is enlarged and the original full view is displayed. An apparatus for enlarging and composing a speaker screen for a video conference, which is characterized by synthesizing with an image and outputting the image.