JPH08251562A

JPH08251562A - Video conversation system

Info

Publication number: JPH08251562A
Application number: JP7054431A
Authority: JP
Inventors: Noritoshi Shibuya; 文紀渋谷; Hiroya Kusaka; 博也日下; Masaaki Nakayama; 正明中山
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-03-14
Filing date: 1995-03-14
Publication date: 1996-09-27

Abstract

PURPOSE: To permit conversation in a natural state wherein users look at each other when the users communicate with each other while viewing mutual images of television telephone, television conference, etc., on monitors. CONSTITUTION: A memory 103 stores the image of the full face of a user 100 (image pickup device sight) and a memory 104 stores the image of a current user (who is using the system) at all times. A movement detection means 105 detects the state of the position shift of the image in the memory 103 from the image in the memory 104, and an image position moving means 106 moves the image position in the memory 103 so as to minimize the position shift and writes the moved image in the memory 103 into a memory 107. A moving detecting means 108 detects moving vectors by units of pixels or areas of the image in the memory 104 as to the image in the memory 107 and a control means 109 makes corrections on the basis of the moving vectors so that the image in the memory 104 is put composite with the image in the memory 107, and outputs the resulting image.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テレビ電話、テレビ会
議などのモニタを通じて相手側の画像を見ながらコミニ
ュケートする映像対話システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video interactive system for communicating while watching the image of the other party through a monitor such as a videophone and a video conference.

【０００２】[0002]

【従来の技術】映像と音声を双方向で伝送する映像対話
システムは、自分、あるいは相手側の画像を取り込む撮
像装置と、この撮像装置で取り込んだ画像を映すモニタ
を使用して映像を送受信する。ここで、利用者がモニタ
を通じて相手側の顔を見ながら対話する場合、利用者ど
うしの目線が合っていれば違和感のない自然な対話がで
きる。2. Description of the Related Art A video interactive system for bidirectionally transmitting video and audio transmits and receives video by using an image pickup device that captures an image of itself or the other party and a monitor that displays the image captured by this image pickup device. . Here, when the user has a dialogue while looking at the other party's face through the monitor, if the eyes of the users are matched, a natural dialogue without a feeling of strangeness can be performed.

【０００３】従来の映像対話システムとしては、例えば
実開昭６２−２１６８７号公報に示されている。A conventional image interactive system is disclosed in, for example, Japanese Utility Model Laid-Open No. 62-21687.

【０００４】以下に、従来の映像対話システムについて
説明する。図１３はこの従来の映像対話システムの構成
図を示すものである。図１３において、１３０１は遮光
ケース、１３０２は利用者を撮影する撮像装置、１３０
３は相手利用者を映すモニタ、１３０４はハーフミラ
ー、１３０５は利用者である。A conventional video dialogue system will be described below. FIG. 13 is a block diagram of this conventional video dialogue system. In FIG. 13, 1301 is a light-shielding case, 1302 is an image pickup device for photographing the user, 130
Reference numeral 3 is a monitor showing the other user, 1304 is a half mirror, and 1305 is a user.

【０００５】以上のように構成された映像対話システム
について、以下その動作について説明する。The operation of the video dialogue system configured as described above will be described below.

【０００６】まず、遮光ケース１３０１はシステム内に
入ってくる外光を遮断する。利用者１３０５はモニタ１
３０３に映された相手利用者をハーフミラー１３０４に
反射させて見る。このとき、利用者の目線はハーフミラ
ー１３０４を向いているので、このハーフミラー１３０
４を隔てた利用者の目線上に撮像装置１３０２の光軸を
合わせて利用者を撮影して相手利用者にその画像を伝送
する。First, the light shielding case 1301 blocks the external light entering the system. User 1305 is monitor 1
The other user reflected on 303 is reflected by the half mirror 1304 for viewing. At this time, since the user's eyes are directed to the half mirror 1304, this half mirror 130
The optical axis of the image pickup device 1302 is aligned with the line of sight of the user, which is separated by 4, and the user is photographed and the image is transmitted to the other user.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら上記の従
来の構成では、ハーフミラーを使用するためモニタの大
きさが限定され、さらにシステム全体が大きな容積を持
ってしまうという問題点を有していた。However, the above-mentioned conventional configuration has a problem that the size of the monitor is limited because the half mirror is used, and the whole system has a large volume.

【０００８】本発明は上記従来の問題点を解決するもの
で、小型で、モニタの大きさも自由に選択でき、モニタ
を見ながらでも利用者同士が目線を合わせて自然な状態
で通話することができる映像対話システムを提供するこ
とを目的とする。The present invention solves the above-mentioned conventional problems. It is compact, the size of the monitor can be freely selected, and users can talk in a natural state while looking at the monitor while looking at each other. The purpose is to provide a video dialogue system that can be used.

【０００９】[0009]

【課題を解決するための手段】この目的を達成するため
に本発明の映像対話システムは、モニタを見る人物を撮
影する撮像装置と、この人物の顔の正面の画像を記憶す
ることができる第１のメモリと、撮像装置の画像を記憶
する第２のメモリと、第１のメモリの画像に対する第２
のメモリの画像の動きベクトルを検出する第１の動き検
出手段と、この動きベクトルのスカラー量が最小になる
ように第１のメモリの画像位置を移動させる画像位置移
動手段と、移動後の第１のメモリの画像を記憶する第３
のメモリと、第３のメモリの画像に対する第２のメモリ
の画像の動きベクトルを検出する第２の動き検出手段
と、この動きベクトルをもとに第２のメモリの画像をコ
ントロールして第２のメモリの画像と第３のメモリの画
像を合成した画像を出力するコントロール手段とを備え
ている。In order to achieve this object, a video interactive system of the present invention is capable of storing an image pickup device for photographing a person looking at a monitor and a front image of the person's face. No. 1 memory, a second memory for storing an image of the imaging device, and a second memory for the image of the first memory.
First motion detecting means for detecting the motion vector of the image in the memory, image position moving means for moving the image position in the first memory so that the scalar amount of the motion vector is minimized, and the first position after the movement. Third storing image of memory 1
Memory, and second motion detecting means for detecting a motion vector of the image of the second memory with respect to the image of the third memory, and controlling the image of the second memory based on this motion vector to control the second Control means for outputting an image obtained by synthesizing the image in the memory and the image in the third memory.

【００１０】[0010]

【作用】本発明は上記した構成により、第１のメモリに
記憶されている正面を向いた利用者の画像に対する第２
のメモリに記憶されている現在の利用者の画像の動きベ
クトルを検出し、この動きベクトルがもっとも小さくな
るように第１のメモリの画像位置を移動して第２のメモ
リの画像に近似して第３のメモリに記憶させる。According to the present invention, with the above-described structure, the second image for the front-facing user image stored in the first memory can be obtained.
Detects the motion vector of the image of the current user stored in the memory of, and moves the image position of the first memory so that this motion vector becomes the smallest, and approximates it to the image of the second memory. It is stored in the third memory.

【００１１】つぎに、第３のメモリの画像に対する第２
のメモリの画像の動きベクトルを検出してこの動きベク
トルがｋ倍（０＜ｋ＜１）になるように第２のメモリの
画像を補正する。以上の処理をすることによりテレビ電
話やテレビ会議などの互いの画像をモニタで見ながら通
話する場合、互いに目線を合わせて自然な状態の通話を
することができる。Then, the second image for the image in the third memory is
The motion vector of the image in the memory is detected, and the image in the second memory is corrected so that the motion vector becomes k times (0 <k <1). By performing the above processing, when making a call while viewing each other's images on a monitor such as a videophone or a videoconference, it is possible to make a natural call by matching the eyes.

【００１２】[0012]

【実施例】以下本発明の第１の実施例について、図面を
参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described below with reference to the drawings.

【００１３】図１は本発明の第１の実施例における映像
対話システムの構成図を示すものである。図１において
１００は利用者であり、１０１は利用者を撮影する撮像
装置であり、１０１０は撮像装置１０１の出力をデジタ
ル信号に変換するアナログーデジタル変換手段（図では
Ａ／Ｄと記す）であり、１０２は対話相手を映すモニタ
であり、１０３は利用者１００の正面を向いた（撮像装
置目線の）画像を記憶するメモリであり、１０４は現在
の（システム利用中の）利用者の画像を記憶するメモリ
であり、１０５はメモリ１０３の画像に対するメモリ１
０４の画像の位置ズレの状態を検出する動き検出手段で
あり、１０６はこの位置ズレが最小になるようにメモリ
１０３の画像位置を移動する画像位置移動手段であり、
１０７はこの移動したメモリ１０３の画像を記憶するメ
モリであり、１０８はメモリ１０７の画像に対するメモ
リ１０４の画像の画素、あるいはある単位のエリアごと
に動きベクトルを検出する動き検出手段であり、１０９
はこの動きベクトルをもとにメモリ１０４の画像をメモ
リ１０７の画像と合成するように補正して出力するコン
トロール手段である。FIG. 1 is a block diagram of a video interactive system according to the first embodiment of the present invention. In FIG. 1, reference numeral 100 denotes a user, 101 denotes an image pickup apparatus for photographing the user, and 1010 denotes an analog-digital conversion unit (denoted as A / D in the figure) for converting the output of the image pickup apparatus 101 into a digital signal. Yes, 102 is a monitor showing a conversation partner, 103 is a memory for storing an image of the front of the user 100 (as viewed from the imaging device), and 104 is an image of the current user (while using the system). Is a memory for storing an image, and 105 is a memory 1 for an image in the memory 103.
Reference numeral 04 is a motion detecting means for detecting the state of positional deviation of the image, and 106 is an image position moving means for moving the image position of the memory 103 so as to minimize the positional deviation.
107 is a memory for storing the moved image of the memory 103, 108 is a motion detecting means for detecting a motion vector for each pixel of the image of the memory 104 or an area of a certain unit with respect to the image of the memory 107, 109
Is a control means for correcting the image of the memory 104 based on this motion vector so as to be combined with the image of the memory 107 and outputting it.

【００１４】以上のように構成された本実施例の映像対
話システムについて、以下その動作について説明する。
利用者１００はモニタ１０２に映っている相手側の映像
を見ながら対話しており、撮像装置１０１はその利用者
１００を撮影する。アナログーデジタル変換手段１０１
０は撮像装置１０１の信号をデジタル信号に変換する。The operation of the video interactive system of the present embodiment constructed as described above will be described below.
The user 100 is interacting while watching the video of the other party displayed on the monitor 102, and the imaging device 101 photographs the user 100. Analog-digital conversion means 101
0 converts the signal of the imaging device 101 into a digital signal.

【００１５】メモリ１０３にはシステム利用前に利用者
の正面画像（カメラ目線画像）を記憶させておき、メモ
リ１０４には撮像装置１０１の画像を随時記憶させる。
動き検出手段１０５はメモリ１０３の画像とメモリ１０
４の画像を複数のエリアに分割し、そのエリアごとにメ
モリ１０３の画像とメモリ１０４の画像を比較してメモ
リ１０３の画像に対するメモリ１０４の画像の動きベク
トルを各エリアごとに検出してそれらを平均して出力
し、画像位置移動手段１０６は動き検出手段１０５から
出力される動きベクトルが最小となるようにメモリ１０
３の画像を移動する。これによりメモリ１０３の画像を
メモリ１０４の画像に近似させる。メモリ１０７はこの
移動した画像を記憶する。動き検出手段１０８はこのメ
モリ１０７の画像に対するメモリ１０４の画像の動きベ
クトルを画素単位で検出し、コントロール手段１０９は
この動きベクトルがある範囲まで小さくなるようにメモ
リ１０４の画像を補正することでメモリ１０４の画像と
メモリ１０７の画像を合成したような画像を出力する。Before the system is used, the memory 103 stores the front image of the user (the image viewed from the camera), and the memory 104 stores the image of the image pickup device 101 at any time.
The motion detection means 105 includes the image of the memory 103 and the memory 10.
The image of No. 4 is divided into a plurality of areas, the image of the memory 103 and the image of the memory 104 are compared for each area, the motion vector of the image of the memory 104 with respect to the image of the memory 103 is detected for each area, and they are calculated. The average position is outputted, and the image position moving means 106 makes the memory 10 so that the motion vector outputted from the motion detecting means 105 becomes the minimum.
Move the image of 3. As a result, the image in the memory 103 is approximated to the image in the memory 104. The memory 107 stores this moved image. The motion detection means 108 detects the motion vector of the image of the memory 104 with respect to the image of the memory 107 in pixel units, and the control means 109 corrects the image of the memory 104 so that the motion vector is reduced to a certain range. An image similar to the image of 104 and the image of the memory 107 is output.

【００１６】図２、図３、図４、図５、図６を用いて以
下に詳細な説明をする。図２（ａ）はメモリ１０３に記
憶されている画像（利用者の正面画像）を示したもので
ある。図２（ｂ）はメモリ１０４に記憶されている画像
（撮像装置１０１の画像）を示したものであり、利用者
はモニタ１０２を見ているため目線は撮像装置１０１を
向いていない。図３（ａ）はメモリ１０３の画像（図２
（ａ））とメモリ１０４の画像（図２（ｂ））を模式的
に重ねた図である。この図における画像の位置ズレを動
き検出手段１０５により動きベクトルとして検出する。
検出方法としては、それぞれの画像を複数のエリアにブ
ロック分割し（図３に示す）、一方の画像のあるエリア
を他方の画像とエリアごとに比較し、一致または近似し
たエリアが存在した場合、それらのエリアの位置の差を
動きベクトルとして検出し、この処理を総てのエリアに
ついて行い、検出した動きベクトルを画像位置移動手段
１０６に出力する。図３（ｂ）はその動きベクトルのみ
を表したものである。画像位置移動手段１０６は動き検
出手段１０５で検出された動きベクトルを平均し、これ
が最小になるようにメモリ１０３の画像を移動する。移
動はメモリ１０３の先頭アドレスデータの位置をその位
置から動きベクトルの平均した値の位置にシフトするよ
うにメモリ１０３の画像をメモリ１０７に書き込み、メ
モリ１０４の画像位置にメモリ１０７の画像位置が重な
った状態にする。A detailed description will be given below with reference to FIGS. 2, 3, 4, 5, and 6. FIG. 2A shows an image (a front image of the user) stored in the memory 103. FIG. 2B shows an image (image of the image pickup apparatus 101) stored in the memory 104, and since the user is looking at the monitor 102, the line of sight is not facing the image pickup apparatus 101. FIG. 3A shows an image in the memory 103 (see FIG.
FIG. 3A is a diagram schematically showing an image (FIG. 2B) in the memory 104 superimposed on each other. The positional deviation of the image in this figure is detected by the motion detecting means 105 as a motion vector.
As a detection method, each image is divided into a plurality of areas (shown in FIG. 3), one area of one image is compared with the other image for each area, and when there is a matched or approximated area, The difference between the positions of these areas is detected as a motion vector, this processing is performed for all areas, and the detected motion vector is output to the image position moving means 106. FIG. 3B shows only the motion vector. The image position moving means 106 averages the motion vectors detected by the motion detecting means 105, and moves the image in the memory 103 so as to minimize the motion vector. In the movement, the image of the memory 103 is written in the memory 107 so that the position of the head address data of the memory 103 is shifted from that position to the position of the average value of the motion vector, and the image position of the memory 107 overlaps the image position of the memory 104. Turn it on.

【００１７】図３（ｃ）はその模式図である。図４は動
き検出手段１０８によるメモリ１０７の画像に対するメ
モリ１０４の画像の画素ごとの動きベクトルの検出を示
したものである。検出方法は動き検出手段１０５と同様
にするが分割するエリアの大きさが画素単位となる。図
４中の矢印が動きベクトルを示しており、矢印の数は見
やすさ上いくつか省略している。ここで、動き検出手段
１０５、画像位置移動手段１０６、動き検出手段１０８
に関しては、すでに画像処理の分野では一般的に知られ
ており、例えばテレヒ゛シ゛ョン学会技術報告Vol.11No.3,pp43-
48,PPOE'87-12(May1987)に詳細な報告があり、このよう
な方法で実現することができる。ゆえに本実施例ではこ
れ以上の詳細な説明は省略する。FIG. 3C is a schematic diagram thereof. FIG. 4 shows the detection of the motion vector for each pixel of the image in the memory 104 with respect to the image in the memory 107 by the motion detecting means 108. The detection method is the same as that of the motion detection means 105, but the size of the divided area is in pixel units. The arrows in FIG. 4 indicate motion vectors, and the number of arrows is omitted for clarity. Here, the motion detecting means 105, the image position moving means 106, and the motion detecting means 108.
Is already generally known in the field of image processing, for example, Technical Report of Television Society Vol.11 No.3, pp43-
48, PPOE'87-12 (May 1987) has a detailed report and can be realized by such a method. Therefore, further detailed description is omitted in this embodiment.

【００１８】図５はコントロール手段１０９の処理内容
を示すものであり、図中矢印ａは動き検出手段１０８に
より検出されたメモリ１０４中のある画素に対するメモ
リ１０７中の画素の動きベクトルを表している。コント
ロール手段１０９の処理は、動きベクトルａをｋ倍（０
＜ｋ＜１）した位置にメモリ１０４の画素を移動させる
ことでメモリ１０４の画素をメモリ１０７の画素と合成
するように補正して出力する。図６（ａ）はコントロー
ル手段を構成するブロック図である。図中s1はメモリ１
０４の画像データであり、画素、あるいは画面をある単
位のエリアに分割した場合の各エリアデータごとにコン
トロール手段１０９に入力される。s2はこのデータs1の
メモリ１０４でのアドレスであり、s3は動き検出手段１
０８で検出されたデータs1に対応する動きベクトルであ
る。６０１はデータs1、アドレスs2と動きベクトルs3の
入力タイミングを合わせるバッファであり、６０２は後
述するメモリ６０３にデータs1を書き込む時に、動きベ
クトルs3とアドレスs2をもとに書き込みアドレスを与え
る書き込みアドレスコントロール手段であり、メモリ６
０３は書き込みアドレスコントロール手段６０２で与え
られたアドレスにデータs1を書き込み、メモリ１０４の
データを総て書き終えた時点で補正した画像データとし
て出力するメモリであり、６０４は動きベクトルs3をｋ
倍（０＜ｋ＜１）する乗算器である。以下に詳細な動作
を図６（ｂ）を用いて説明する。図６（ｂ）はコントロ
ール手段１０９により画面中のデータの移動の様子を表
す模式図である。図中A1はデータs1のメモリ１０４での
データ位置（メモリ１０４でのアドレス）であり、ｋａ
は動きベクトルs3を乗算器６０４でｋ倍したものであ
る。FIG. 5 shows the processing contents of the control means 109, and the arrow a in the figure represents the motion vector of the pixel in the memory 107 for a certain pixel in the memory 104 detected by the motion detecting means 108. . The processing of the control means 109 is performed by multiplying the motion vector a by k (0
By moving the pixel of the memory 104 to the position <k <1), the pixel of the memory 104 is corrected so as to be combined with the pixel of the memory 107, and then output. FIG. 6A is a block diagram of the control means. In the figure, s1 is memory 1
The image data of No. 04 is input to the control unit 109 for each pixel or each area data when the screen is divided into areas of a certain unit. s2 is the address of this data s1 in the memory 104, and s3 is the motion detection means 1
It is a motion vector corresponding to the data s1 detected in 08. Reference numeral 601 denotes a buffer for adjusting the input timing of the data s1, the address s2 and the motion vector s3, and 602 is a write address control for giving a write address based on the motion vector s3 and the address s2 when writing the data s1 in the memory 603 described later. Means and memory 6
Reference numeral 03 is a memory for writing the data s1 at the address given by the write address control means 602 and outputting it as corrected image data when all the data in the memory 104 has been written. Reference numeral 604 denotes the motion vector s3 by k.
It is a multiplier that doubles (0 <k <1). The detailed operation will be described below with reference to FIG. FIG. 6B is a schematic view showing how the data in the screen is moved by the control means 109. In the figure, A1 is the data position (address in the memory 104) of the data s1 in the memory 104, and ka
Is the motion vector s3 multiplied by k in the multiplier 604.

【００１９】A2はデータs1を動きベクトルｋａに従って
移動した位置（メモリ６０３でのアドレス）であり、こ
の位置がデータs1をメモリ６０３に書き込むアドレスと
なる。A2 is a position (address in the memory 603) where the data s1 is moved according to the motion vector ka, and this position is an address for writing the data s1 in the memory 603.

【００２０】書き込みアドレスコントロール手段６０２
はアドレスs2（図６（ｂ）中A1）と動きベクトルs3（図
６（ｂ）中ｋａ）により図６（ｂ）中A2の位置を示すア
ドレスを演算してメモリ６０３に出力し、メモリ６０３
はこのアドレスにデータs1を書き込む。これによりデー
タs1は、A1からA2に移動することになる。図７はコント
ロール手段１０９によるメモリ１０４の画像の補正の様
子を示したものである。図７（ａ）は補正前の図４中の
太円内の拡大図で、矢印が動きベクトルを示している。
この動きベクトルをｋ倍（０＜ｋ＜１）してコントロー
ル部で補正した様子を表したものが図７（ｂ）である。
図７（ｂ）中波線部分は補正前のメモリ１０４とメモリ
１０７の画像であり補正後は実線部分に表される。Write address control means 602
6A calculates the address indicating the position of A2 in FIG. 6B from the address s2 (A1 in FIG. 6B) and the motion vector s3 (ka in FIG. 6B) and outputs it to the memory 603.
Writes data s1 to this address. This causes the data s1 to move from A1 to A2. FIG. 7 shows how the control means 109 corrects the image in the memory 104. FIG. 7A is an enlarged view inside the thick circle in FIG. 4 before correction, and the arrow indicates the motion vector.
FIG. 7B shows a state in which the motion vector is multiplied by k (0 <k <1) and corrected by the control unit.
In FIG. 7B, the middle wavy line part is an image of the memory 104 and the memory 107 before correction and is shown by the solid line part after correction.

【００２１】以上のような構成により、テレビ電話やテ
レビ会議などの互いの画像をモニタで見ながら通話する
場合、互いに目線を合わせて自然な状態の通話をするこ
とができる。With the above configuration, when making a call while viewing each other's images on a monitor such as in a videophone or a videoconference, it is possible to make a natural call by matching the eyes.

【００２２】なお、本実施例において動き検出手段１０
５、画像位置移動手段１０６、動き検出手段１０８に関
しては、テレヒ゛シ゛ョン学会技術報告Vol.11No.3,pp43-48,PPO
E'87-12(May1987)に報告されている方法で実現すること
ができると述べたが、これに限定されるものではない。In this embodiment, the motion detecting means 10
5. Regarding the image position moving means 106 and the motion detecting means 108, the Television Society Technical Report Vol.11 No.3, pp43-48, PPO
Although it can be realized by the method reported in E'87-12 (May 1987), it is not limited thereto.

【００２３】また、本実施例においては、撮像装置はア
ナログ映像信号を出力するものとして説明したがこれに
限るものではなく、例えばデジタル信号を出力する撮像
装置を用い、アナログーデジタル変換手段を用いない構
成も考えられる。Further, in the present embodiment, the image pickup device is described as outputting an analog video signal, but the present invention is not limited to this. For example, an image pickup device outputting a digital signal is used, and analog-digital conversion means is used. It is also possible that there is no configuration.

【００２４】また、本実施例においては、本映像対話シ
ステムの出力はデジタル信号となるが、これに限るもの
ではなく、相手方との通信方式に応じてデジタルからア
ナログへの変換、あるいは、画像情報の圧縮等が行える
ことは明らかである。In the present embodiment, the output of the video interactive system is a digital signal, but the output is not limited to this, and conversion from digital to analog or image information is performed according to the communication system with the other party. It is obvious that the compression and the like can be performed.

【００２５】また、本実施例では動き検出手段を２つ設
けているが、これを１つにまとめる構成も考えられるこ
とは明らかである。Further, although two motion detecting means are provided in the present embodiment, it is obvious that a configuration in which they are combined is also conceivable.

【００２６】図８は本発明の第２の実施例を示す映像対
話システムの構成図である。同図において、１００は利
用者であり、１０２はモニタであり、１０１は利用者１
００を撮影する撮像装置であり、１０１０は撮像装置１
０１の出力をデジタル信号に変換するアナログーデジタ
ル変換手段（図ではＡ／Ｄと記す）であり、１０３はシ
ステム利用前に利用者の正面画像（カメラ目線画像）を
記憶させておくメモリであり、１０４に撮像装置１０１
の画像を随時記憶するメモリであり、１０５はメモリ１
０３の画像に対するメモリ１０４の画像（撮像装置１０
１の画像）の位置ズレの状態を検出する動き検出手段で
あり、１０６はこの位置ズレが最小となるようにメモリ
１０３の画像を移動する画像位置移動手段であり、１０
７はこの移動したメモリ１０３の画像を記憶するメモリ
であり、１０８はこのメモリ１０７の画像に対するメモ
リ１０４の画像の動きベクトルを検出する動き検出手段
であり、１０９はこの動きベクトルがｋ倍（０＜ｋ＜
１）になるようにメモリ１０４の画像をコントロールす
ることでメモリ１０４の画像をメモリ１０７の画像と合
成するように補正して出力するコントロール手段であ
り、以上は第１の実施例の構成と同様なものである。第
１の実施例と異なるのは８０１のセレクタをもうけた点
である。セレクタ８０１は動き検出手段１０５により検
出された動きベクトルの大きさが非常に大きい場合は補
正範囲外と判断し、また非常に小さい場合は補正の必要
がないと判断してメモリ１０４の画像をスルーで出力す
るものである。FIG. 8 is a block diagram of a video interactive system showing a second embodiment of the present invention. In the figure, 100 is a user, 102 is a monitor, 101 is a user 1.
Reference numeral 1010 denotes the image pickup apparatus 1
Reference numeral 103 is an analog-to-digital conversion means (indicated as A / D in the figure) for converting the output of 01 into a digital signal, and 103 is a memory for storing the front image of the user (camera line-of-sight image) before using the system. , 104 to the imaging device 101
105 is a memory for storing the image of
Image of the memory 104 (image pickup device 10
1) is a motion detecting means for detecting a position deviation state, and 106 is an image position moving means for moving the image in the memory 103 so as to minimize the position deviation.
7 is a memory for storing the moved image of the memory 103, 108 is a motion detecting means for detecting a motion vector of the image of the memory 104 with respect to the image of the memory 107, and 109 is k times (0 <K <
This is a control means for correcting the image of the memory 104 so as to be combined with the image of the memory 107 by controlling the image of the memory 104 so as to be 1), and the above is the same as the configuration of the first embodiment. It is something. The difference from the first embodiment is that an 801 selector is provided. The selector 801 judges that the magnitude of the motion vector detected by the motion detecting means 105 is out of the correction range when it is very large, and judges that correction is unnecessary when it is very small, and the image in the memory 104 is passed through. Is output by.

【００２７】以上のように本実施例によれば利用者の正
面画像に対するシステム利用中の利用者の画像の動きベ
クトルの大きさによりセレクタで補正画とスルー画を選
択して出力することで、過剰な補正による不自然な画像
の出力を軽減でき、テレビ電話やテレビ会議などの互い
の画像をモニタで見ながら通話する場合、互いに目線を
合わせて自然な状態の通話をすることができる。As described above, according to the present embodiment, the correction image and the through image are selected and output by the selector according to the magnitude of the motion vector of the image of the user who is using the system with respect to the front image of the user. Unnatural image output due to excessive correction can be reduced, and when making a call while viewing each other's image on a monitor such as a videophone or a videoconference, it is possible to make a natural call by aligning the eyes.

【００２８】図９は、本発明の第３の実施例の構成図で
ある。同図において、１００は映像対話システムの利用
者、１０２は相手方の映像を映し出すモニタ、９０１、
９０２は利用者１００を撮影するためにモニタ１０２の
側面に設置された撮像装置、１０１０，１０１１は撮像
装置９０１、９０２の出力をデジタル信号に変換するア
ナログーデジタル変換手段（図ではＡ／Ｄと記す）、９
０４は、アナログーデジタル変換手段１０１０，１０１
１によりデジタル信号に変換された利用者１００の映像
信号から、利用者１００を正面から撮影した場合と類似
の映像を画像処理により合成する画像合成手段である。FIG. 9 is a block diagram of the third embodiment of the present invention. In the figure, 100 is a user of the video interactive system, 102 is a monitor for displaying the video of the other party, 901,
Reference numeral 902 denotes an image pickup device installed on the side surface of the monitor 102 for taking an image of the user 100, and 1010 and 1011 denote analog-to-digital conversion means (A / D and A / D in the figure) for converting outputs of the image pickup devices 901 and 902 into digital signals. Note), 9
04 is analog-digital conversion means 1010, 101
It is an image synthesizing means for synthesizing, by image processing, a video similar to that when the user 100 is photographed from the front, from the video signal of the user 100 converted into a digital signal by 1.

【００２９】図１０は、図９に示した画像合成手段の具
体的な構成例を示したブロック図である。図１０におい
て、１０００は撮像装置９０１、９０２により撮影され
た利用者１００の映像から、特に顔の目、鼻、口、等の
構成要素を抽出する特徴抽出手段、１００１は、特徴抽
出手段１０００により抽出された利用者１００の顔の
目、鼻、口等の各部位の三次元的位置を推定する三次元
位置推定手段、１００２は、三次元位置推定手段１００
１による推定結果に基づき、利用者１００の顔の正面画
像を合成する正面画像合成手段である。FIG. 10 is a block diagram showing a concrete configuration example of the image synthesizing means shown in FIG. In FIG. 10, reference numeral 1000 denotes a feature extracting means for extracting constituent elements such as eyes, nose, and mouth of a face from the images of the user 100 captured by the image capturing devices 901 and 902, and 1001 denotes a feature extracting means 1000. A three-dimensional position estimating unit 1002 that estimates the three-dimensional position of each part of the face, eye, nose, mouth, etc. of the extracted user 100, and three-dimensional position estimating unit 100.
It is a front image synthesizing means for synthesizing the front image of the face of the user 100 based on the estimation result of 1.

【００３０】以上のように構成された本実施例の映像対
話システムに関して、以下その動作について説明する。
なお、この画像合成手段９０４に関しては、すでに画像
処理の分野では、一般的に知られており、例えば、電子
情報通信学会技術研究報告ＨＣ９１−４３、ｐ．９〜１
６またはナショナルテクニカルレポート第４０巻第６号
ｐ．１０６〜１１３、等に詳細な報告がある。ゆえに本
実施例においては詳細な説明は省略する。The operation of the video interactive system of the present embodiment having the above configuration will be described below.
The image synthesizing means 904 is generally known in the field of image processing, and is described in, for example, IEICE Technical Research Report HC91-43, p. 9-1
6 or National Technical Report Vol. 40, No. 6, p. There are detailed reports in 106-113, etc. Therefore, detailed description is omitted in this embodiment.

【００３１】撮像装置９０１、９０２により撮影された
利用者１００の２つの映像は、特徴抽出手段１０００に
送られる。特徴抽出手段１０００では、２つの映像から
まず顔の部分の特徴、例えば、目、鼻、口、等の顔の構
成部品を抽出する。そしてこの抽出結果に基づき、三次
元位置推定手段１００１により、上記、目、鼻、口、等
の顔の構成部品の三次元的な位置を推定し、この推定結
果を元に、正面画像合成手段１００２により、利用者１
００の正面画像を合成する。そしてこの合成された利用
者１００の正面画像を相手方に送信する。Two images of the user 100 photographed by the image pickup devices 901 and 902 are sent to the feature extraction means 1000. The feature extracting unit 1000 first extracts the features of the face portion from the two images, for example, face constituent parts such as eyes, nose, and mouth. Then, based on this extraction result, the three-dimensional position estimating means 1001 estimates the three-dimensional positions of the face components such as the eyes, nose, mouth, etc., and based on this estimation result, the front image synthesizing means. 1002, user 1
The front image of 00 is synthesized. Then, the synthesized front image of the user 100 is transmitted to the other party.

【００３２】以上のように、本実施例によれば、２台の
撮像装置により撮影された映像を用いて利用者の正面画
像を合成することにより、テレビ電話やテレビ会議な
ど、互いの画像をモニタで見ながら通話をする場合、互
いに目線を合わせて自然な状態の通話を行うことができ
る。As described above, according to this embodiment, the front images of the users are combined by using the images captured by the two image pickup devices, so that the images of each other such as a videophone and a video conference can be displayed. When making a call while looking at a monitor, it is possible to make a call in a natural state by aligning the eyes of each other.

【００３３】また、従来の例として示したハーフミラー
を用いた構成による映像対話システムに比べ、本実施例
の構成は画像合成手段９０４を電気部品、例えばＩＣ等
で構成できるため、システム全体の大幅に小型・軽量化
が可能である。Further, as compared with the video interactive system having the configuration using the half mirror shown as the conventional example, in the configuration of the present embodiment, the image synthesizing means 904 can be constructed by electric parts, for example, IC, etc. It can be made smaller and lighter.

【００３４】なお、本実施例においては、２台の撮像装
置を用いる構成に関して説明したがこれに限るものでは
なく、例えば１台の撮像装置を用いる構成も考えられ、
この場合、全体のシステム構成が簡素化され、更に小型
・軽量化が実現できる。また、例えば３台以上の撮像装
置を用いる構成をとれば、より一層、精度の高い利用者
の正面画像の合成が可能となる。In this embodiment, the configuration using two image pickup devices has been described, but the present invention is not limited to this. For example, a configuration using one image pickup device can be considered.
In this case, the entire system configuration is simplified, and the size and weight can be further reduced. In addition, for example, if three or more image pickup devices are used, it is possible to more accurately combine the front images of the user.

【００３５】また、本実施例においては、撮像装置はア
ナログ映像信号を出力するものとして説明したがこれに
限るものではなく、例えばデジタル信号を出力する撮像
装置を用い、画像合成手段９０４はデジタルインターフ
ェイスにより映像信号を受け取る構成も考えられる。Further, in the present embodiment, the image pickup device is described as outputting an analog video signal, but the present invention is not limited to this. For example, an image pickup device outputting a digital signal is used, and the image synthesizing means 904 is a digital interface. It is also conceivable that the configuration receives a video signal.

【００３６】また、本実施例においては、画像合成手段
９０４の出力は、デジタル信号となるが、これに限るも
のではなく、相手方との通信方式に応じて、デジタルか
らアナログへの変換、もしくは、画像情報の圧縮等が行
えることは明かである。Further, in the present embodiment, the output of the image synthesizing means 904 is a digital signal, but the output is not limited to this, and conversion from digital to analog or according to the communication system with the other party, or It is obvious that image information can be compressed.

【００３７】また、本実施例においては、撮像装置をモ
ニタの両側面に配置する構成を用いて説明したがこれに
限るものではなく、例えば撮像装置をモニタの上下もし
くはモニタの対角線上に配置する等の構成も考えられ
る。Further, although the present embodiment has been described by using the configuration in which the image pickup device is arranged on both sides of the monitor, the present invention is not limited to this. For example, the image pickup device is arranged above and below the monitor or on a diagonal line of the monitor. A configuration such as is also conceivable.

【００３８】また、本実施例においては、画像合成手段
に関しては、特徴抽出手段１０００、三次元位置推定手
段１００１及び正面画像合成手段１００２の３つの手段
をもって構成したがこれに限るものではない。Further, in the present embodiment, the image synthesizing means is constituted by three means of the feature extracting means 1000, the three-dimensional position estimating means 1001 and the front image synthesizing means 1002, but the invention is not limited to this.

【００３９】図１１は、本発明の第４の実施例の構成図
である。本実施例は、アナログーデジタル変換手段（図
ではＡ／Ｄと記す）１０１０，１０１１によりデジタル
信号に変換された利用者１００の映像信号に対し、肌色
部分の面積を検出した結果に基づき、２つの映像信号の
うち、どちらか一方を後述するセレクタ１１０２に送
り、且つ別の信号出力系により２つの映像信号をそのま
ま画像合成手段９０４に送る画像判定手段１１０１と、
画像判定手段１１０１からの制御信号（これをＣ１とす
る）により最終出力画像を画像合成手段９０４の出力と
するか、画像判定手段１１０１の出力とするかを選択す
るセレクタ１１０２を設けた部分のみ、第３の実施例と
異なる。FIG. 11 is a block diagram of the fourth embodiment of the present invention. In the present embodiment, based on the result of detecting the area of the flesh-colored portion in the video signal of the user 100 converted into a digital signal by the analog-digital conversion means (denoted as A / D in the figure) 1010, 1011, 2 An image determination unit 1101 which sends one of the two video signals to the selector 1102 described later and sends the two video signals to the image synthesis unit 904 as they are by another signal output system,
Only a portion provided with a selector 1102 for selecting whether to output the final output image from the image synthesizing means 904 or the image determining means 1101 by a control signal from the image determining means 1101 (C1). Different from the third embodiment.

【００４０】よって、第３の実施例と異なる部分に関し
てのみ説明する。図１２は、図１１に示した画像判定手
段の具体的な構成例を示したブロック図である。図１２
において、１２０２及び１２０３は、撮像装置９０１及
び９０２により撮影された映像（撮像装置９０１により
撮影された映像をｉ１、撮像装置９０２により撮影され
た映像をｉ２とする）から肌色部分を検出する肌色検出
手段、１２０１は肌色検出手段１２０２、１２０３によ
り得られた検出結果を比較し、この比較結果を元にセレ
クタ１２０４とセレクタ１１０２を制御する比較手段、
１２０４は比較手段１２０１からの制御信号（これをｓ
ｃとする）に基づき２つの映像ｉ１，ｉ２のうちどちら
か一方を選択し出力するセレクタである。なお、この肌
色検出手段１２０２、１２０３に関しては一般に知られ
ており、先行特許（例えば特開平６−１０５３２３号公
報）等に詳細な報告がある。ゆえに本実施例においては
詳細な説明は省略する。Therefore, only the parts different from the third embodiment will be described. FIG. 12 is a block diagram showing a specific configuration example of the image determination means shown in FIG. 12
In 1202 and 1203, flesh color detection for detecting a flesh color part from a video imaged by the imaging devices 901 and 902 (i1 is a video imaged by the imaging device 901 and i2 is a video imaged by the imaging device 902) A means 1201 compares the detection results obtained by the skin color detecting means 1202, 1203, and a comparison means for controlling the selector 1204 and the selector 1102 based on the comparison result.
1204 is a control signal from the comparison means 1201
It is a selector that selects and outputs either one of the two images i1 and i2 based on the (c). The skin color detecting means 1202 and 1203 are generally known, and there are detailed reports in prior patents (for example, Japanese Patent Laid-Open No. 6-105323). Therefore, detailed description is omitted in this embodiment.

【００４１】以上のように構成された本実施例の映像対
話システムに関して、以下その動作について説明する。The operation of the video interactive system of the present embodiment having the above configuration will be described below.

【００４２】肌色検出手段１２０２は、映像ｉ１からそ
の肌色の部分を検出し、その画素数をカウントする（カ
ウント結果をｓａとする）。同様に肌色検出手段１２０
３は、映像ｉ２から肌色部分の画素数をカウントする
（カウント結果をｓｂとする）。The flesh color detecting means 1202 detects the flesh color portion from the image i1 and counts the number of pixels (the counting result is sa). Similarly, the skin color detecting means 120
3 counts the number of pixels of the flesh-colored portion from the image i2 (the count result is sb).

【００４３】ここで肌色部分のカウントを行うのは、映
像中の利用者１００の顔の部分の面積を検出するためで
ある。カウント結果ｓａ，ｓｂは比較手段１２０１に送
られ、ここでｓａとｓｂの比が計算される。そして比較
手段１２０１はこの比の値に応じてセレクタ１２０４が
ｉ１，ｉ２のどちらか一方の映像を選択し出力するよう
制御する。具体的には、閾値ｋ１（ｋ１＞１）及びｋ２
（ｋ２＜１）を設定し、ｓａ／ｓｂ＞ｋ１の場合、つま
り映像ｉ２に比べ映像ｉ１に利用者１００の顔の部分が
より多く映し出されている場合、セレクタ１２０４はｉ
１を選択し、ｓａ／ｓｂ＜ｋ２の場合、つまり映像ｉ１
に比べ映像ｉ２に利用者１００の顔の部分がより多く映
し出されている場合、セレクタ１２０４はｉ２を選択し
出力するように制御する。また、同じく比較手段１２０
１は、ｋ１≦ｓａ／ｓｂ≦ｋ２の場合、つまり映像ｉ
１、ｉ２ともに肌色部分の面積に大きな差が無い場合
（これは利用者１００がモニタを注視していると判断で
きる）、セレクタ１１０２が最終出力を画像合成手段９
０４の出力とし、ｓａ／ｓｂ＞ｋ１またはｓａ／ｓｂ＜
ｋ２の場合、つまり映像ｉ１、ｉ２で肌色部分の面積に
大きな差が有る場合（これはモニタよりも撮像装置の方
に利用者１００が顔を向けていると判断できる）、セレ
クタ１１０２が最終出力を画像判定手段１１０１の出力
とするよう制御する。このように肌色検出の結果に従い
セレクタにより映像の選択を行うことで、例えば映像ｉ
１、ｉ２で肌色部分の面積に大きな差が有る場合、つま
りモニタよりも撮像装置の方に利用者１００が顔を向け
ていると判断できる場合には、２つの映像の合成画像で
はなく、もとの撮影画像をそのまま相手方に送信するこ
とができ、絶えず合成画像のみを送信する場合に比べ、
より臨場感のある映像対話が実現できる。The reason why the flesh-colored portion is counted here is to detect the area of the face portion of the user 100 in the image. The count results sa and sb are sent to the comparing means 1201, where the ratio of sa and sb is calculated. Then, the comparison means 1201 controls the selector 1204 to select and output either one of the images i1 and i2 according to the value of this ratio. Specifically, thresholds k1 (k1> 1) and k2
When (k2 <1) is set and sa / sb> k1, that is, when the face portion of the user 100 is displayed more in the image i1 than in the image i2, the selector 1204 selects i.
1 is selected and sa / sb <k2, that is, the image i1
When more of the face portion of the user 100 is displayed in the image i2 as compared with, the selector 1204 controls to select and output i2. Similarly, the comparing means 120
1 is in the case of k1 ≦ sa / sb ≦ k2, that is, the image i
When there is no large difference in the areas of the skin-colored portions of 1 and i2 (this can be judged to be the user 100 gazing at the monitor), the selector 1102 outputs the final output to the image synthesizing means 9
04 / sa / sb> k1 or sa / sb <
In the case of k2, that is, when there is a large difference in the areas of the flesh-colored portions between the images i1 and i2 (this can be judged to be that the user 100 faces his / her imaging device rather than the monitor), the selector 1102 makes a final output. Is output as the output of the image determination means 1101. In this way, by selecting the image with the selector according to the result of the skin color detection, for example, the image i
If there is a large difference in the area of the flesh-colored portion between 1 and i2, that is, if it can be determined that the user 100 faces his or her imaging device rather than the monitor, it is not a composite image of the two images, but You can send the captured image with and to the other party as it is, compared to the case where only the composite image is continuously sent
A more realistic video dialogue can be realized.

【００４４】以上のように、本実施例によれば、２台の
撮像装置により撮影された映像を用いて利用者の正面画
像を合成することにより、テレビ電話やテレビ会議な
ど、互いの画像をモニタで見ながら通話をする場合、互
いに目線を合わせて自然な状態の通話を行うことができ
る。As described above, according to the present embodiment, the images captured by the two image pickup devices are used to synthesize the front images of the users so that images of each other such as a videophone and a video conference can be displayed. When making a call while looking at a monitor, it is possible to make a call in a natural state by aligning the eyes of each other.

【００４５】また、従来の例として示したハーフミラー
を用いた構成による映像対話システムに比べ、本実施例
の構成は画像合成手段９０４を電気部品、例えばＩＣ等
で構成できるため、システム全体の大幅に小型・軽量化
が可能である。Further, as compared with the video interactive system having the structure using the half mirror shown as the conventional example, in the structure of the present embodiment, the image synthesizing means 904 can be composed of electric parts, for example, IC, etc. It can be made smaller and lighter.

【００４６】また、モニタよりも撮像装置の方に利用者
１００が顔を向けていると判断できる場合には、２つの
映像の合成画像ではなく、利用者１００が顔を向けてい
るほうの撮像装置の撮影画像をそのまま相手方に送信す
ることができ、絶えず合成画像のみを送信する場合に比
べ、より臨場感のある映像対話が実現できる。If it is possible to determine that the user 100 faces his / her face toward the image pickup device rather than the monitor, it is not a composite image of the two images, but an image of the one in which the user 100 faces his / her face. The captured image of the device can be directly transmitted to the other party, and a more realistic video dialogue can be realized as compared with the case where only the composite image is constantly transmitted.

【００４７】なお、本実施例においては、２台の撮像装
置を用いる構成に関して説明したがこれに限るものでは
なく、例えば１台の撮像装置を用いる構成も考えられ、
この場合、全体のシステム構成が簡素化され、更に小型
・軽量化が実現できる。また、例えば３台以上の撮像装
置を用いる構成をとれば、より一層、精度の高い利用者
の正面画像の合成が可能となる。In the present embodiment, the configuration using two image pickup devices has been described, but the present invention is not limited to this. For example, a configuration using one image pickup device may be considered.
In this case, the entire system configuration is simplified, and the size and weight can be further reduced. In addition, for example, if three or more image pickup devices are used, it is possible to more accurately combine the front images of the user.

【００４８】また、本実施例においては、撮像装置はア
ナログ映像信号を出力するものとして説明したがこれに
限るものではなく、例えばデジタル信号を出力する撮像
装置を用い、画像合成手段９０４はデジタルインターフ
ェイスにより映像信号を受け取る構成も考えられる。In the present embodiment, the image pickup device is described as outputting an analog video signal, but the present invention is not limited to this. For example, an image pickup device outputting a digital signal is used, and the image synthesizing means 904 is a digital interface. It is also conceivable that the configuration receives a video signal.

【００４９】また、本実施例においては、画像合成手段
９０４の出力は、デジタル信号となるが、これに限るも
のではなく、相手方との通信方式に応じて、デジタルか
らアナログへの変換、もしくは、画像情報の圧縮等が行
えることは明かである。Further, in the present embodiment, the output of the image synthesizing means 904 is a digital signal, but the output is not limited to this, and conversion from digital to analog or according to the communication system with the other party, or It is obvious that image information can be compressed.

【００５０】また、本実施例においては、撮像装置をモ
ニタの両側面に配置する構成を用いて説明したがこれに
限るものではなく、例えば撮像装置をモニタの上下もし
くはモニタの対角線上に配置する等の構成も考えられ
る。Further, although the present embodiment has been described by using the configuration in which the image pickup device is arranged on both side surfaces of the monitor, the present invention is not limited to this. For example, the image pickup device is arranged above and below the monitor or on a diagonal line of the monitor. A configuration such as is also conceivable.

【００５１】また、本実施例においては、肌色検出手段
を２つ用いる構成をとっているが、これに限るものでは
なく、肌色検出手段を１つにまとめる構成も考えられ
る。In this embodiment, two skin color detecting means are used. However, the present invention is not limited to this, and a structure in which the skin color detecting means is integrated is also conceivable.

【００５２】[0052]

【発明の効果】以上のように本発明は、モニタを見る人
物を撮影する撮像装置と、この人物の顔の正面の画像を
記憶することができる第１のメモリと、撮像装置の画像
を記憶する第２のメモリと、第１のメモリの画像に対す
る第２のメモリの画像の動きベクトルを検出する第１の
動き検出手段と、この動きベクトルのスカラー量が最小
になるように第１のメモリの画像位置を移動させる画像
位置移動手段と、移動後の第１のメモリの画像を記憶す
る第３のメモリと、第３のメモリの画像に対する第２の
メモリの画像の動きベクトルを検出する第２の動き検出
手段と、この動きベクトルをもとに第２のメモリの画像
をコントロールして第２のメモリの画像と第３のメモリ
の画像を合成した画像を出力するコントロール手段とを
備えていることにより、テレビ電話やテレビ会議などの
互いの画像をモニタで見ながら通話する場合、互いに目
線を合わせて自然な状態の通話をすることができる。As described above, according to the present invention, an image pickup apparatus for photographing a person looking at a monitor, a first memory capable of storing an image of the front of the face of this person, and an image of the image pickup apparatus are stored. And a first motion detecting means for detecting the motion vector of the image of the second memory with respect to the image of the first memory, and the first memory for minimizing the scalar amount of this motion vector. Image position moving means for moving the image position of the second memory, a third memory for storing the image of the first memory after the movement, and a motion vector of the image of the second memory with respect to the image of the third memory. And a control means for controlling the image in the second memory based on this motion vector and outputting an image obtained by combining the image in the second memory and the image in the third memory. To be Ri, if you want to call while watching a monitor each other's image, such as a television telephone and videoconferencing, it is possible to call the natural state in accordance with the eyes each other.

[Brief description of drawings]

【図１】本発明の第１の実施例における映像対話システ
ムの構成図FIG. 1 is a configuration diagram of a video dialogue system according to a first embodiment of the present invention.

【図２】本発明の第１の実施例における映像対話システ
ムのメモリ１０３、１０４の画像の模式図FIG. 2 is a schematic diagram of images in memories 103 and 104 of the video interactive system according to the first embodiment of the present invention.

【図３】本発明の第１の実施例における映像対話システ
ムの動き検出手段１０５の処理内容の模式図FIG. 3 is a schematic diagram of the processing contents of the motion detecting means 105 of the video interactive system according to the first embodiment of the present invention.

【図４】本発明の第１の実施例における映像対話システ
ムの動き検出手段１０８の処理内容の模式図FIG. 4 is a schematic diagram of processing contents of the motion detecting means 108 of the video interactive system according to the first embodiment of the present invention.

【図５】本発明の第１の実施例における映像対話システ
ムのコントロール手段の処理内容の模式図FIG. 5 is a schematic diagram of the processing contents of the control means of the video interactive system according to the first embodiment of the present invention.

【図６】（ａ）本発明の第１の実施例における映像対話
システムのコントロール手段の構成図（ｂ）本発明の第１の実施例における映像対話システム
のコントロール手段による画面中のデータの移動の様子
を表す模式図FIG. 6A is a block diagram of the control means of the video interactive system according to the first embodiment of the present invention. FIG. 6B is a diagram showing the movement of data in the screen by the control means of the video interactive system according to the first embodiment of the present invention. Schematic diagram showing the situation of

【図７】本発明の第１の実施例における映像対話システ
ムのコントロール手段の処理内容の模式図FIG. 7 is a schematic diagram of the processing contents of the control means of the video interactive system according to the first embodiment of the present invention.

【図８】本発明の第２の実施例における映像対話システ
ムの構成図FIG. 8 is a configuration diagram of a video interactive system according to a second embodiment of the present invention.

【図９】本発明の第３の実施例における映像対話システ
ムの構成図FIG. 9 is a configuration diagram of a video dialogue system according to a third embodiment of the present invention.

【図１０】本発明の第３の実施例における映像対話シス
テムの画像合成手段のブロック図FIG. 10 is a block diagram of an image synthesizing means of a video interactive system according to a third embodiment of the present invention.

【図１１】本発明の第４の実施例における映像対話シス
テムの構成図FIG. 11 is a configuration diagram of a video dialogue system according to a fourth embodiment of the present invention.

【図１２】本発明の第４の実施例における映像対話シス
テムの画像判定手段のブロック図FIG. 12 is a block diagram of image determination means of a video interactive system according to a fourth embodiment of the present invention.

【図１３】従来例の映像対話システムの構成図FIG. 13 is a block diagram of a conventional video dialogue system.

[Explanation of symbols]

１００システム利用者１０１，９０１，９０２撮像装置１０２モニタ１０３，１０４，１０７メモリ１０５動き検出手段１０６画像位置移動手段１０８動き検出手段１０９コントロール手段８０１，１１０２セレクタ９０４画像合成手段１１０１画像判定手段 100 system user 101, 901, 902 imaging device 102 monitor 103, 104, 107 memory 105 motion detection means 106 image position moving means 108 motion detection means 109 control means 801, 1102 selector 904 image synthesizing means 1101 image judging means

Claims

[Claims]

1. A monitor for displaying an image of the other party, an image pickup device for photographing a person talking with the other party while looking at the monitor, and a front image of a face of the person can be stored.
Memory, a second memory for storing an image of the imaging device, a first motion detection unit for detecting a motion vector of the image of the second memory with respect to the image of the first memory, and the first memory. Image position moving means for moving the image position of the first memory based on the motion vector detected by the motion detecting means, and storing the moved image of the first memory moved by the image position moving means. And a second motion detecting means for detecting a motion vector of the image of the third memory with respect to the image of the second memory, and a motion vector detected by the second motion detecting means. A video interactive system provided with control means for controlling the image in the second memory and outputting an image obtained by combining the image in the second memory and the image in the third memory. Beam.

2. A monitor displaying an image of the other party, an imaging device for photographing a person talking with the other party while looking at the monitor, and a front image of a face of the person can be stored.
Memory, a second memory for storing an image of the imaging device, a first motion detection unit for detecting a motion vector of the image of the second memory with respect to the image of the first memory, and the first memory. Image position moving means for moving the image position of the first memory based on the motion vector detected by the motion detecting means, and storing the image of the moved first memory moved by the image position moving means. And a second motion detecting means for detecting a motion vector of the image of the third memory with respect to the image of the second memory, and a motion vector detected by the first motion detecting means. Control means for controlling the image of the second memory to output an image obtained by combining the image of the second memory and the image of the third memory, and the first motion If the value of the motion vector obtained by the detection means is outside the certain range, the through image of the second memory is output, and if the value of the motion vector is within the certain range, the image output from the control means is output. Video dialogue system with selector.

3. A monitor displaying an image of the other party, one or more imaging devices for photographing a person talking with the other party while looking at the monitor, and the person based on the image signal of the person obtained from the imaging device. A video interactive system including image synthesizing means for synthesizing an image similar to that when a front is photographed by image synthesizing processing.

4. A feature of a monitor displaying an image of the other party, one or more imaging devices for photographing a person talking with the other party while looking at the monitor, and an image signal of the person output from the imaging device. An image determination unit that selects one image signal based on the amount, and an image that is similar to an image obtained when the person is photographed from the front based on the image signal of the person that is output from the image pickup apparatus and that is combined by image combining processing. A video interactive system comprising a synthesizing means, an image selected by the image determining means, and a selector for selecting one of the images synthesized by the image synthesizing means.

5. The first motion detecting means and the second motion detecting means divide each input image into each pixel or into each certain area, and respectively divide one image into one area and the other image. 3. The video interactive system according to claim 1, wherein the areas are compared with each other, and if there is a matching or approximate area, the difference in the position of each area is detected and output as a motion vector.

6. The image position moving means averages the motion vectors obtained by the first motion detecting means, and converts the address of the data in the first memory into a position according to the averaged motion vector, 3. The image of the first memory is moved by writing the data of the first memory to the converted address of the third memory.
The described video dialogue system.

7. The control means comprises a buffer for adjusting the timing of the image in the second memory and the motion vector output from the second motion detecting means, and a write address for setting a write address when writing data in the memory. The write address control means includes a control means and a fourth memory capable of writing an image of the second memory output from the buffer, and the write address control means is based on the motion vector output from the second motion detection means. 3. The video interactive system according to claim 1, wherein an address for writing the image of the second memory output from the buffer into the fourth memory is set.

8. The image synthesizing means comprises a feature extracting means for extracting facial components such as eyes, nose and mouth from the image of the person photographed by the image pickup device. The described video dialogue system.

9. The image synthesizing means comprises a three-dimensional position estimating means for estimating a three-dimensional position of each part of the person, particularly eyes, nose, mouth, etc., from the image of the person photographed by the imaging device. The video interactive system according to claim 3 or 4, characterized in that

10. An image synthesizing means, a feature extracting means for extracting facial constituent elements such as eyes, nose, mouth, etc., from an image of a person photographed by an imaging device, and the person extracted by the feature extracting means. Eyes, nose,
A three-dimensional position estimating means for estimating a three-dimensional position of each part such as a mouth, and a front image synthesizing means for synthesizing a front image of the person based on an estimation result by the three-dimensional position estimating means. The video dialogue system according to claim 3 or 4.

11. The image determining means includes a skin color detecting means for detecting a skin color portion of each image, a comparing means for comparing the sizes of the skin color portions of the images detected by the skin color detecting means, and a comparing means of the comparing means. It is provided with a selector controlled by a comparison result, and an image from each image pickup device is input to the selector, and an image from the image pickup device having the most skin color part is selected by the selector based on the result of the comparison means. The video dialogue system according to claim 4, characterized in that