JPH0888853A

JPH0888853A - Medium processing system

Info

Publication number: JPH0888853A
Application number: JP22228294A
Authority: JP
Inventors: Toshiaki Watanabe; 敏明渡邊; Kimio Miseki; 公生三関; Takashi Ida; 孝井田; Yoshihiro Kikuchi; 義浩菊池; Masahiro Oshikiri; 正浩押切; Susumu Kanba; 進神庭
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-09-16
Filing date: 1994-09-16
Publication date: 1996-04-02

Abstract

PURPOSE: To efficiently acquire the essence of information by respectively limiting the information to be an object, acquiring the information and integrating the information based on the attributes of the acquired information. CONSTITUTION: One or plural limitation input devices 3a-3n are provided for an optional information source 1 as an information object and limited information I1 is acquired by the limitation input device 3. The acquired information I3 is appropriately processed in a processor 5, inputted to an output device 7 as processing information I5 and outputted from the output device 7 as video images and sound further. The limitation input device 3 is constituted of plural cameras and they are, for instance, a fixed camera, the camera moving in a horizontal direction, the camera moving in a vertical direction, the camera rotating around an axis connecting the camera and the object and the camera performing repetitive movement within a certain fixed range, etc. Also, the processor 5 is provided with a movement amount setting circuit and an encoder.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、画像情報を含む情報の
通信、放送或いは蓄積等の処理を行う情報処理システム
におけるメディア処理システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a media processing system in an information processing system for carrying out processing such as communication, broadcasting or storage of information including image information.

【０００２】[0002]

【従来の技術】従来、多大な情報量を有する動画像デー
タを効率よく圧縮符号化するために、時間軸方向、ある
いは空間軸方向の相関を利用し、冗長度を効率よく削減
する手法が用いられている。このうち時間軸方向の冗長
度削減手法として用いられているものに「動き補償」
（以下ＭＣ：Motion Compensation と称する）がある。
ＭＣは現フレームと前フレームとの間で、動物体の動き
量（その物体が前フレームからどの程度移動して現フレ
ームの位置まで来たかを示す量）を測定し、その方向と
大きさを表すベクトル（動きベクトル）を発生させた
後、その動きベクトルを符号化して伝送、あるいは格納
する手法である。再生側ではこの動きベクトル情報をも
とに、前フレームの動物体をその量だけ動かすことによ
って、現フレームの位置に動物体をはめ込んで現フレー
ムを作成する。2. Description of the Related Art Conventionally, in order to efficiently compress and code moving image data having a large amount of information, a method of efficiently reducing redundancy by utilizing correlation in the time axis direction or the space axis direction has been used. Has been. Of these, "motion compensation" is used as a method for reducing redundancy in the time axis direction.
(Hereinafter referred to as MC: Motion Compensation).
The MC measures the amount of movement of the moving object (the amount of movement of the object from the previous frame to the position of the current frame) between the current frame and the previous frame, and determines its direction and size. This is a method of generating a vector (motion vector) to be represented and then encoding and transmitting or storing the motion vector. On the reproducing side, based on this motion vector information, the moving object of the previous frame is moved by that amount to fit the moving object at the position of the current frame to create the current frame.

【０００３】図６は従来の動き情報獲得装置の一例であ
る。カメラは１台であり、このカメラ１８１からの情報
が処理装置１８２に取り込まれる。処理装置１８２内で
は、前フレームと現在取り込まれた現フレームとの情報
により、従来からのＭＣ手法を用いて動き情報が検出さ
れ、その結果が復号側に伝送される。FIG. 6 shows an example of a conventional motion information acquisition device. There is one camera, and the information from this camera 181 is taken into the processing device 182. In the processing device 182, the motion information is detected using the conventional MC method based on the information of the previous frame and the currently captured current frame, and the result is transmitted to the decoding side.

【０００４】しかしながら、動物体、あるいは動領域の
動きの大小が前もって明らかではないため、従来はそれ
ら全ての動き量をカバーできる範囲を想定して広範囲な
動き検出を行っている。そのため計算量が増加するとと
もに、それら全ての検出範囲を示す動きベクトル情報に
符号を割り当てる必要があるため、動きベクトルの符号
量も増加してしまう。また、１つの画面内に異なる動き
の動物体、あるいは動部分が存在する場合は、それら全
ての動き情報を符号化する必要があるため、さらに符号
量が増大する結果になる。However, since the magnitude of the movement of the moving body or the moving area is not clear in advance, conventionally, wide-range movement detection is performed assuming a range capable of covering all the movement amounts. Therefore, the amount of calculation increases, and it is necessary to assign codes to the motion vector information indicating all the detection ranges, so that the code amount of the motion vector also increases. Further, when there are moving objects or moving parts having different movements within one screen, it is necessary to code all the motion information, resulting in a further increase in the code amount.

【０００５】また、従来のメディア処理システムでは、
図１２に示すように画像情報を獲得する方法として、入
力装置２７１として通常１台のカメラから得られる１つ
の視点からの画像を画像情報として獲得してきた。この
方法は、被写体とその背景の状況をカメラ側からの客観
視点から捕らえた画像を得る手段として最も簡単で自然
であるため従来から広く用いられてきた。Further, in the conventional media processing system,
As a method of acquiring image information as shown in FIG. 12, an image from one viewpoint, which is usually obtained from one camera as the input device 271, has been acquired as image information. This method has been widely used since it is the simplest and natural means for obtaining an image in which the condition of a subject and its background is captured from an objective viewpoint from the camera side.

【０００６】ところが、この他の重要な画像情報とし
て、カメラの使用者側からの主観視点の画像がある。通
常はカメラの使用者がカメラのモニタを見て被写体を撮
るため、客観視点の画像と主観視点の画像は視点が一致
し両者の区別がつかないが、カメラの使用者が自分自身
を被写体にするとき、客観視点の画像はカメラが撮る使
用者自身の画像であり、主観視点の画像は使用者側から
見える環境画像となる。将来のメディア処理システムで
は個人から情報を発信することが多くなると考えられる
ため、カメラの使用者が画像を見る側に自分自身の画像
（客観視点の画像）を伝えたいという要求と、自分側の
視点から見た画像（主観視点の画像）を伝えて感動を共
有化したいという要求の両方が強くなると考えられる。
従来のメディア処理システムではこのような将来的な２
つの要求を同時に満足できないという問題点がある。However, as another important image information, there is an image of the subjective viewpoint from the user side of the camera. Normally, the camera user takes a picture of the subject by looking at the camera's monitor, so it is difficult to distinguish between the objective viewpoint image and the subjective viewpoint image, but the camera user does not recognize himself as the subject. In this case, the image of the objective viewpoint is the image of the user himself / herself taken by the camera, and the image of the subjective viewpoint is the environmental image seen from the user side. In future media processing systems, it is considered that information will be transmitted from individuals in many cases, so there is a demand for camera users to convey their own images (objective viewpoint images) to the viewers and It is thought that both the demands for sharing emotions by transmitting images viewed from the viewpoint (images from the subjective viewpoint) will be strengthened.
In the conventional media processing system, such a future 2
There is a problem that one requirement cannot be satisfied at the same time.

【０００７】また、さらに、使用者画像と使用者の視点
からの画像の両方の画像を用いて自然な形で１つの画像
にして表示する方法が従来システムには欠如していると
いう問題点がある。Furthermore, there is a problem that the conventional system lacks a method of displaying both a user image and an image from the user's viewpoint as one image in a natural form. is there.

【０００８】また一方で、情報発信する使用者側の状況
によっては、自分の環境画像情報を秘匿化したいが単調
な環境画像は好ましくなく、送る側から視覚的に効果の
ある形で変化させたいという要求が将来強くなると考え
られる。環境画像の秘匿化方法として従来から用いられ
ている方法の一つは、環境画像を何も送らないことであ
る。しかしこの方法では、画像を受け取る側では背景に
変化の無い画像を表示することになるため、利用者に臨
場感に乏しい画像情報しか提供できないシステムとなっ
てしまうという問題点があった。On the other hand, depending on the situation of the user who sends the information, it is desired to keep his / her own environmental image information concealed, but a monotonous environmental image is not preferable, and it is desired to change it in a visually effective form from the sending side. It is expected that the demand will be stronger in the future. One of the conventionally used methods for concealing environmental images is to send no environmental image. However, this method has a problem in that the image receiving side displays an image having no change in the background, so that the system can provide only the image information that is not realistic to the user.

【０００９】また、従来、風景などをカメラで撮像する
と、空間的に等間隔、等解像度に配置された画素のデー
タがその風景の画像データとして出力される。そして、
一般にはＣＣＤの微細加工技術、演算量、データ量の制
限から、画面の全体について十分な解像度は得られな
い。Further, conventionally, when a landscape or the like is imaged by a camera, data of pixels spatially arranged at equal intervals and at equal resolution is output as image data of the landscape. And
In general, due to the fine processing technology of CCD, the amount of calculation, and the amount of data, sufficient resolution cannot be obtained for the entire screen.

【００１０】そこで、低解像度のカメラによる複数の低
解像度の画像を用いて１枚の高解像度な画像を得る検討
が進められている。（例えば「複数の低解像度画像から
の高解像度画像取得に関する検討」宮地、花村、富永、
１９９３年電子情報通信学会春季大会Ｄ−３５５。）低
解像度の画像は１つのカメラで時間的にずらして複数撮
影しても良いし、複数のカメラで同時に撮影しても良
い。ここでは複数のカメラを用いる場合のブロック図を
図２３に示す。まず、第１のカメラ３７１と第２のカメ
ラ３７２を用いて風景全体を同時に撮影する。２つのカ
メラは低解像度のカメラであるが、図２４に示すように
一方のサンプリング点Ｓａは他方のサンプリング点Ｓｂ
の間になるようにカメラの方向を調整して撮影する。そ
してデータ合成器３７５は２つの低解像度の画像信号３
７３，３７４を図２４のように合成することにより高解
像度の画像信号３７６を合成し出力する。Therefore, studies are underway to obtain one high-resolution image by using a plurality of low-resolution images obtained by a low-resolution camera. (For example, "Study on high resolution image acquisition from multiple low resolution images" Miyaji, Hanamura, Tominaga,
1993 IEICE Spring Conference D-355. ) Low-resolution images may be taken with a single camera while being temporally shifted, or may be taken with a plurality of cameras at the same time. Here, FIG. 23 shows a block diagram when a plurality of cameras are used. First, the entire landscape is simultaneously photographed using the first camera 371 and the second camera 372. The two cameras are low-resolution cameras, but one sampling point Sa is the other sampling point Sb as shown in FIG.
Adjust the direction of the camera so that it is in between and shoot. The data synthesizer 375 then outputs the two low resolution image signals 3
By synthesizing 73 and 374 as shown in FIG. 24, a high resolution image signal 376 is synthesized and output.

【００１１】しかしこの方法では２つのカメラを用いて
も画像全体の各部分の画素密度が一様に２倍になるだけ
であり、さらにカメラを増やさない限りこれ以上の解像
度の向上はできない。つまり、画面のある注目する重要
な部分を非常に詳細に見たい時は多くのカメラが必要に
なってしまう。However, in this method, even if two cameras are used, the pixel density of each portion of the entire image is uniformly doubled, and further resolution cannot be improved unless the number of cameras is increased. In other words, if you want to see a certain important part of the screen in great detail, you need many cameras.

【００１２】また、従来のテレビ電話では、被写体の位
置や向き、形状などの状態が時々刻々変化する場合で
も、カメラの位置や向き、受信感度などのカメラの状態
は固定されており、放っておくと画面から被写体がはみ
出してしまったり、被写体を後ろ向きに撮ってしまった
りする。音声の入力についても使用者が動いてマイクの
指向性の高い領域からはずれると、良好な音声の入力が
できなくなる。また、ディスプレイで画像を使用者に提
示する場合、使用者が動くとディスプレイが見にくくな
る場合がある。Further, in the conventional videophone, even if the position, orientation, shape, etc. of the subject change from moment to moment, the camera state such as the camera position, orientation, and reception sensitivity is fixed and can be left alone. If you leave it, the subject will stick out of the screen or you will shoot the subject backwards. As for voice input, if the user moves and moves out of the area where the microphone has high directivity, good voice input cannot be performed. Further, when presenting an image to the user on the display, it may be difficult to see the display when the user moves.

【００１３】また、現在、画像／音等のメディア情報を
伝送／蓄積するシステムの研究・開発が盛んに行われて
いる。従来のメディア伝送／蓄積装置においては、各メ
ディアはそれぞれ単一の入力手段を持つか、あるいは、
複数の入力手段から入力された情報のうちひとつを選択
してひとつの情報と成し、これをひとつの情報として圧
縮符号化等の処理を行い伝送／蓄積する方法が用いられ
ている。At present, research and development of systems for transmitting / storing media information such as images / sounds are being actively conducted. In the conventional media transmission / storage device, each media has a single input means, or
A method is used in which one of the information input from a plurality of input means is selected to form one information, and this information is used as one information for processing such as compression encoding and transmission / accumulation.

【００１４】図３１に従来の画像／音を入力し、それを
少ない情報量に圧縮して伝送／蓄積を行う装置のブロッ
ク図を示す。また、図３２に、図３１の装置に対応し
た、従来の伝送／蓄積された情報から画像／音を再生す
る装置のブロック図を示す。FIG. 31 shows a block diagram of a conventional device for inputting an image / sound, compressing the image / sound, and transmitting / storing it. Further, FIG. 32 shows a block diagram of a conventional device corresponding to the device of FIG. 31, which reproduces an image / sound from transmitted / stored information.

【００１５】図３１に示す入力及び伝送／蓄積装置にお
いて、伝送／蓄積の対象となる画像及び音はそれぞれひ
とつ以上のカメラ５０１ａ及びマイク５０１ｂから入力
される。カメラ／マイクが複数存在する場合には、それ
ぞれ画像選択／合成器５０２ａ及び音選択／合成器５０
２ｂで一つの画像及び音に選択／合成される。画像選択
／合成器５０２ａ及び音選択／合成器５０２ｂにおける
処理は複数の入力手段からの画像及び音のうち一つを選
択するか、それぞれに対してフィルタリング等の処理を
行って加算する処理であり、出力画像及び音はいずれも
ひとつの画像及び音として成っている。なお、入力手段
がひとつしか存在しない場合には選択／合成は行われ
ず、入力した画像及び音がそのまま画像及び音となる。
この画像及び音はそれぞれ符号化器５０３ａ及び５０３
ｂで少ない情報量に圧縮符号化される。圧縮符号化され
た符号化列は多重化器（マルチプレクサ）５０４で多重
化され、伝送／蓄積される。In the input and transmission / accumulation device shown in FIG. 31, images and sounds to be transmitted / accumulated are inputted from one or more cameras 501a and microphones 501b, respectively. When there are a plurality of cameras / microphones, the image selector / synthesizer 502a and the sound selector / synthesizer 50 respectively.
In 2b, one image and sound are selected / synthesized. The processing in the image selecting / synthesizing unit 502a and the sound selecting / synthesizing unit 502b is processing for selecting one of the images and sounds from a plurality of input means, or processing such as filtering for each and adding. The output image and the sound are both composed of one image and sound. If there is only one input means, selection / combination is not performed, and the input image and sound become the image and sound as they are.
This image and sound are encoded by encoders 503a and 503, respectively.
In b, the data is compressed and encoded to a small amount of information. The compression-encoded coded sequence is multiplexed by a multiplexer 504 and transmitted / stored.

【００１６】一方、図３２の再生装置において、伝送／
蓄積装置から伝送／蓄積された符号化列は、デマルチプ
レクサ５０６において画像符号化列と音声符号化列に分
離され、それぞれ画像復号器５０７ａ及び音復号器５０
７ｂにおいて再生画像及び再生音が再生される。On the other hand, in the reproducing apparatus shown in FIG.
The coded sequence transmitted / stored from the storage device is separated in the demultiplexer 506 into an image coded sequence and an audio coded sequence, and the image decoder 507a and the sound decoder 50 are respectively separated.
In 7b, the reproduced image and reproduced sound are reproduced.

【００１７】図３１の装置において、画像符号化器５０
３ａにおける符号化の方式としては、例えばＩＳＯ・Ｍ
ＰＥＧ１，ＭＰＥＧ２の画像符号化方式やＩＴＵ−Ｔ・
Ｈ．２６１，Ｈ．２６２のように動き補償と離散コサイ
ン変換を組み合わせた方式や、サブバンド符号化、ピラ
ミッド符号化等と動き補償を組み合わせた方式、あるい
は、主に顔画像を符号化することを目的としたモデルペ
ースト符号化等が用いられる。これらの符号化方式のう
ち、動き補償と離散コサイン変換、サブバンド、ピラミ
ッド等を組み合わせた方式は、どのような画像入力に対
してもそれなりの符号化効率を得ることができるが、特
に人の顔の部分等に限っていえばそれに特化したモデル
ペースト符号化方式には符号化効率の点で及ばない。In the apparatus of FIG. 31, the image encoder 50
The encoding method in 3a is, for example, ISO / M.
Image coding methods such as PEG1 and MPEG2 and ITU-T
H. 261, H.H. A method combining motion compensation and discrete cosine transform such as 262, a method combining motion compensation with subband coding, pyramid coding, or the like, or a model paste mainly for coding a face image Encoding or the like is used. Among these coding methods, the method that combines motion compensation and discrete cosine transform, subband, pyramid, etc. can obtain a certain coding efficiency for any image input, but especially for humans. As far as the face part is limited, the model paste coding method specialized for it is inferior in terms of coding efficiency.

【００１８】一方、モデルペースト符号化方式は、画像
中でその符号化方式で想定しているモデルに合った人の
顔のような部分については符号化効率が良いもののそれ
以外の背景部分等は符号化効率が極めて低い。このた
め、顔部分とそれ以外の部分を分離し、顔部分はモデル
ペースト符号化で符号化し、それ以外の部分は動き補償
＋離散コサイン変換で符号化するといった切り分けが一
般的に行われる。On the other hand, in the model paste coding method, the coding efficiency is good for a part such as a person's face that matches the model assumed in the coding method in the image, but other background parts etc. The coding efficiency is extremely low. For this reason, the face part and the other part are separated, the face part is coded by the model paste coding, and the other part is coded by motion compensation + discrete cosine transform.

【００１９】このため、顔以外の部分に多くの情報が含
まれる場合にはトータルとしての符号化効率は必ずしも
高くならない。また、一般に入力した画像から顔とそれ
以外の部分を正確に切り分けることは非常に難しく、顔
として切り出した部分に背景の一部が入ったり、その逆
になるということが良くあり、符号化効率の低下につな
がっていた。Therefore, the total coding efficiency does not necessarily become high when a lot of information is included in the parts other than the face. In addition, it is generally very difficult to accurately separate the face and other parts from the input image, and it is often the case that part of the background is included in the part cut out as a face, and vice versa. Was leading to a decline in.

【００２０】また、音符号化器５０３ｂにおける符号化
の方式としては、例えばＩＳＯ・ＭＰＥＧ１，ＭＰＥＧ
２のオーディオ符号化方式や、ＣＥＬＰ等の符号化方式
が用いられる。これら従来の符号化方式は、音楽あるい
は人の音声といった特定の分野に対して良好な符号化品
質を得ることを主目的としており、想定される分野以外
の音に対しては必ずしも符号化効率が良くない。As a coding method in the sound coder 503b, for example, ISO MPEG1, MPEG
The audio encoding method of No. 2 and the encoding method such as CELP are used. These conventional coding methods are mainly intended to obtain good coding quality for a specific field such as music or human voice, and the coding efficiency is not necessarily required for sounds outside the expected field. Not good.

【００２１】例えば、ＣＥＬＰ方式は主に音声を高能率
に符号化することを目的としており、音声以外の音、例
えば音楽や背景音等に対しては良好な符号化品質は得ら
れない。そのため、図３１に示すメディア入力装置よう
に単一の音を合成してから符号化を行う従来の装置で
は、符号化入力には人の声、背景音といった各種の音が
混合して含まれることになり、符号化装置５０３ｂで用
いられている符号化方式に適合する音（音声）部分に対
しては高能率に符号化が行えるものの、それ以外の音は
符号化効率が良くないという問題点があった。[0021] For example, the CELP system is mainly intended to code speech with high efficiency, and good coding quality cannot be obtained for sounds other than speech, such as music and background sounds. Therefore, in a conventional device that synthesizes a single sound and then encodes it like the media input device shown in FIG. 31, the encoded input contains various sounds such as a human voice and a background sound in a mixed manner. In other words, although the sound (speech) part that conforms to the coding method used in the coding device 503b can be coded with high efficiency, the coding efficiency of other sounds is not good. There was a point.

【００２２】また、図３２に示すメディア再生装置にお
いて、画像復号器５０７ａ及び音復号器５０７ｂは、い
ずれもそれぞれ図３１における画像符号化器５０３ａ及
び音符号化器５０３ｂで用いられた符号化方式に対応し
て復号を行うものである。このようにして再生された画
像及び音声は、それぞれひとつの画像及び音声として再
生されている。したがって、再生された画像及び音声を
個々の要素に分離（例えば、人の顔と背景、人の声と背
景音とに分離する）したり、さらに一部の構成要素を他
のものに置き換える（例えば背景を別の画像にしたり、
音声を他人の声に変える等）ためには、再生された画像
及び音に対して分離処理を行う必要がある。Further, in the media reproducing apparatus shown in FIG. 32, both the image decoder 507a and the sound decoder 507b have the same encoding method as that used in the image encoder 503a and the sound encoder 503b in FIG. 31, respectively. Correspondingly, decoding is performed. The image and sound reproduced in this way are reproduced as one image and sound, respectively. Therefore, the reproduced image and sound are separated into individual elements (for example, human face and background, human voice and background sound), and some components are replaced with others ( For example, you can change the background to another image,
In order to change the voice to the voice of another person), it is necessary to perform separation processing on the reproduced image and sound.

【００２３】このような分離処理を、全ての要素が合成
されて１つになっている画像／音から正確に行うことは
難しい。さらには、伝送／蓄積の際に圧縮符号化を行っ
た再生画像／音声には符号化歪が含まれており、入力画
像／音から直接分離処理を行う場合よりも更に分離精度
が低下する。It is difficult to accurately perform such separation processing from an image / sound in which all the elements are combined into one image. Furthermore, the reproduced image / sound that has been compression-encoded at the time of transmission / accumulation contains encoding distortion, and the separation accuracy is further reduced as compared with the case where the input image / sound is directly separated.

【００２４】以上のように、従来技術によるメディア入
力伝送／蓄積装置では、各メディアに対してそれぞれ単
一の入力手段しか持たないか、あるいは、複数の入力手
段から選択／合成を行ってメディア毎にそれぞれ１つの
情報を作成していた。作成された情報は複数の構成要素
からなるものが混合されたものとなっている。このた
め、入力中のある構成要素に特化した圧縮符号化方式を
用いると、その構成要素は高い符号化効率が得られるも
のの他の要素は符号化効率が低いといった問題が生じ
る。As described above, the media input transmission / accumulation device according to the prior art has only a single input means for each medium, or selects / combines from a plurality of input means to select each medium. Each had one piece of information created. The created information is a mixture of information composed of multiple components. For this reason, when a compression coding method specialized for a certain component being input is used, there arises a problem that the component has high coding efficiency but the other components have low coding efficiency.

【００２５】これを避けるためには、個々の構成要素に
分解してそれぞれに適した圧縮符号化方式を用いる必要
があるが、要素への分解を精度良く行うことが難しく、
全体としては高い符号化効率を得ることができなかっ
た。また、背景部分のように、入力した情報をそのまま
忠実に伝送／蓄積する必要性は低く雰囲気が伝えられれ
ばよいような部分についても多くの情報量を割く必要が
あり、符号化効率をあまり高くとれなかった。In order to avoid this, it is necessary to decompose into individual constituent elements and use a compression coding method suitable for each of them, but it is difficult to perform decomposition into elements with high precision,
As a whole, high coding efficiency could not be obtained. In addition, it is necessary to devote a large amount of information to a portion such as a background portion where it is not necessary to faithfully transmit / store the input information as it is, and it is necessary to allocate a large amount of information. I couldn't get it.

【００２６】また、従来の技術によるメディア再生装置
では、再生したメディアに対し、構成要素への分解や一
部置き換えといった処理を行うことが困難であるという
問題があった。Further, in the conventional media reproducing apparatus, there is a problem that it is difficult to perform processing such as disassembling the reproduced media into constituent elements or partially replacing them.

【００２７】さらに、従来、多大な情報量を有する動画
像データを伝送若しくは蓄積する際、時間軸方向あるい
は空間軸方向の相関を利用し、冗長度を効率的に削減す
る手法が用いられている。特に対象画像が顔画像のよう
にある程度特定されているような場合、符号化器と復号
器で共通のモデル（ワイヤフレームモデル）を持ち、目
や口の動きを１つの入力画像から抽出してその変化量を
伝送する方法がある。Further, conventionally, when transmitting or accumulating moving image data having a large amount of information, a method of utilizing the correlation in the time axis direction or the spatial axis direction to effectively reduce the redundancy is used. . Especially when the target image is specified to some extent like a face image, a common model (wireframe model) is used by the encoder and the decoder, and eye and mouth movements are extracted from one input image. There is a method of transmitting the change amount.

【００２８】この従来の方法を図４４に示すブロック図
を参照して説明する。まず、カメラ６３１を介して情報
源としての顔画像を取り込む。変化量検出手段６３２で
は、入力として得られる顔画像を用い、３次元ワイヤフ
レームモデルに照らし合わせながら動きが発生した部分
とその部分毎にどれだけ動きが生じたかを検出し、それ
らを動き情報として出力する。自然画合成手段６３３で
は、該動き情報を用いて動きの生じた部分と動き量を復
号し、それら情報に従いワイヤフレームモデルを変化さ
せ、該ワイヤフレームモデルの各領域に自然に見えるよ
う自然画の対応する部分をマッピングして合成画像を生
成し、表示装置６３４を介して表示する。This conventional method will be described with reference to the block diagram shown in FIG. First, a face image as an information source is captured via the camera 631. Using the face image obtained as an input, the change amount detecting means 632 detects a part in which motion has occurred and how much motion has occurred in each part while comparing it with a three-dimensional wire frame model, and uses them as motion information. Output. The natural image synthesizing unit 633 decodes the motion part and the motion amount using the motion information, changes the wire frame model according to the information, and creates a natural image so that each area of the wire frame model looks natural. The corresponding portion is mapped to generate a composite image, which is displayed on the display device 634.

【００２９】この方法の場合には、共通のワイヤフレー
ムモデルを用いるため少ない情報量で動画を実現するこ
とができるものの、自然画を変形して復号画像を生成し
ているため、どうしても不自然さが生じるという問題が
ある。In the case of this method, since a common wire frame model is used, a moving image can be realized with a small amount of information, but since a natural image is transformed to generate a decoded image, it is inevitably unnatural. There is a problem that occurs.

【００３０】さらに、従来テレビ電話等の顔画像を主体
とした情報処理において、殆ど全ての場合、被撮影者の
正面に位置したカメラから撮った画像のみを処理対象と
している。その方法は装置を簡単にすることができると
いう利点がある反面、歪を生じる原因となっている。Further, in the conventional information processing mainly using a face image of a videophone or the like, in almost all cases, only the image taken by the camera located in front of the subject is processed. Although this method has the advantage that the device can be simplified, it causes distortion.

【００３１】また、顔の動きを検出する方法として一般
的には画像をブロックに細分化して時間的に前後の関係
にある画像から誤差の小さいブロック同士を探索し、そ
れを移動前と移動後のブロックとそれぞれ見なし、移動
の軌跡である動きベクトルとブロック間の誤差とを求め
るブロックマッチング法が採られている。Further, as a method for detecting the movement of a face, generally, an image is subdivided into blocks, and blocks having a small error are searched from images having a temporally front-and-back relationship, and the blocks are searched before and after the movement. The block matching method is adopted in which each of the blocks is regarded as a block, and a motion vector that is a locus of movement and an error between blocks are obtained.

【００３２】この様子を図５０を参照して説明する。ま
ず、図５０（ａ）に示す顔画像があるものとする。ある
所定の時間が経過した後に、図５０（ｂ）に示すよう
に、顔全体が横に平行移動すると共に、口を広げた画像
に変化したとする。この図５０（ｂ）に示す顔画像の中
のブロック６７５，６７６に注目すると、図５０（ａ）
に示すそれらのブロックは顔画像の中のブロック６７
１，６７９にそれぞれ対応しているので、図５０（ｃ）
に示すように、動きベクトルＶ_a，Ｖ_bがそれぞれ求ま
る。This state will be described with reference to FIG. First, it is assumed that there is a face image shown in FIG. It is assumed that, after a predetermined time has elapsed, the entire face is moved in parallel in the horizontal direction and the image is changed to an image with the mouth widened, as shown in FIG. When attention is paid to blocks 675 and 676 in the face image shown in FIG. 50B, FIG.
The blocks shown in are the blocks 67 in the face image.
Since it corresponds to 1 and 679, respectively, FIG.
As shown in, the motion vectors V _a and V _b are respectively obtained.

【００３３】ところで顔の部分の多くは動きベクトルＶ
_aの向きに動いているため、顔全体の動きをＶ_aとし
て、それに各部の局所的な動きを加える形式で動きベク
トルを表す方が、例えば顔画像を符号化するような場
合、動きベクトルの大きさの分散を抑えられるため符号
化効率の点で都合がよい。また顔のモデルを用いて符号
化する場合やマン・マシン・インタフェースへの適用に
おいても、顔の動きを顔全体と各部の局所的な動きに分
けて表す方が自然で合理的である。By the way, most of the face portion is the motion vector V.
_Since it is moving in the direction of _a , it is better to represent the motion vector in a format in which the motion of the entire face is V _a and the local motion of each part is added to it, for example, when a face image is encoded. It is convenient in terms of coding efficiency because the variance of the size can be suppressed. Also, in the case of encoding using a face model or application to a man-machine interface, it is more natural and rational to express the face movement separately for the entire face and local movements of each part.

【００３４】しかし、ブロックマッチングによって、上
述のような形の動きベクトルを求めることは困難であ
る。その理由の１つとして１回のサンプリングで画像を
１枚しか得られないことが挙げられる。したがって、１
枚の画像情報のみで顔全体の動きから局所的な動きまで
を詳しく把握するのは困難であり、そのため複数の多面
的な画像情報を得ることが必要となる。However, it is difficult to obtain the motion vector having the above-mentioned form by block matching. One of the reasons is that only one image can be obtained by one sampling. Therefore, 1
It is difficult to understand in detail from the movement of the entire face to the local movement only with the image information of one sheet, and therefore it is necessary to obtain a plurality of multifaceted image information.

【００３５】[0035]

【発明が解決しようとする課題】以上、説明したように
従来は、想定される動きをうまくカバーするＭＣ手法が
必要であったため、動き検出に要する計算量の増加と、
動きベクトル情報そのものの符号量の増加という問題点
があった。また１つの画面内に異なる動きの動物体、あ
るいは動部分が存在する場合は、それら全ての動き情報
を符号化する必要があるため、さらに符号量の増大を招
来することとなった。As described above, conventionally, since the MC method that covers the expected motion well is required, the amount of calculation required for motion detection increases, and
There is a problem that the code amount of the motion vector information itself increases. Further, when there are moving objects or moving parts with different movements within one screen, it is necessary to code all the motion information, which further increases the code amount.

【００３６】また、従来のメディア処理システムでは、
使用者画像と使用者側から見る環境画像を同時に１画面
に表示し、それを自然な形で臨場感のある画像にするこ
とができず、また環境画像の秘匿化により、使用者の環
境画像を使用者側から視覚的に効果のある形で変化させ
ることができないという問題点があった。Further, in the conventional media processing system,
It is not possible to display the user image and the environmental image viewed from the user side on one screen at the same time, and to make it a realistic image in a natural form. Also, the environmental image of the user is hidden due to the concealment of the environmental image. There is a problem in that the user cannot change it in a visually effective form.

【００３７】本発明は、使用者が画像を見る側に自分自
身の画像（客観視点の画像）を伝え、かつ、自分の視点
から見た画像（主観視点の画像）も同時に伝えて感動を
共有化でき、しかも両方の画像を用いて自分を含めた周
りの状況を自然で臨場感のある１つの画像にして表示す
ることができるメディア処理システムを提供することを
目的とすると共に、情報伝送に用いる際に、送り手であ
る使用者の環境画像を秘匿化しながら、使用者の環境画
像を視覚的に効果のある形で変化させることが可能なメ
ディア処理システムを提供することを目的とする。According to the present invention, the user conveys the image of himself (the image of the objective viewpoint) to the viewer of the image, and at the same time, conveys the image seen from his own viewpoint (the image of the subjective viewpoint) to share the impression. It is an object of the present invention to provide a media processing system that can be visualized and can display the surrounding situation including oneself as one image with a natural and realistic feeling by using both images and at the same time for information transmission. An object of the present invention is to provide a media processing system capable of visually changing the environmental image of the user while concealing the environmental image of the user who is the sender when used.

【００３８】また、従来のメディア処理システムにおい
ては、重要な部分であるにも拘らず低い解像度でしか画
像データを得ることができず、また被写体を最適な状態
で撮像できないという問題点があった。Further, in the conventional media processing system, there is a problem that the image data can be obtained only at a low resolution although it is an important part, and the subject cannot be imaged in an optimum state. .

【００３９】また、従来のメディア処理システムでは、
メディアの入力手段はメディア毎に１つであるか、ある
いは複数の入力手段からの入力から選択／合成を行って
１つのものとしてから処理が行われている。このため、
入力した情報を構成要素に分解してそれぞれに適した圧
縮符号化を行うことが難しいという問題点がある。ま
た、従来技術を用いたメディア再生装置においては、再
生される情報は各メディア毎に１つのものとなるため、
各構成要素に分解したり一部置き換えを行ったりするこ
とは難しいという問題点がある。Further, in the conventional media processing system,
The number of media input means is one for each medium, or the processing is performed after selecting / combining from the input from a plurality of input means to make one. For this reason,
There is a problem that it is difficult to decompose the input information into constituent elements and perform compression encoding suitable for each. Further, in the media reproducing apparatus using the conventional technology, since the information reproduced is one for each medium,
There is a problem that it is difficult to disassemble each component or replace it partially.

【００４０】また、従来のメディア処理システムでは、
符号化効率を上げるために符号化器・復号器で共通のモ
デルを利用して情報の伝送を行う方法があるものの、自
然画の変形・加工から合成画像が生成されるため、どう
しても合成画像に不自然さが生じてしまうという問題が
あった。Further, in the conventional media processing system,
Although there is a method of transmitting information using a common model in the encoder / decoder in order to improve the encoding efficiency, a synthetic image is generated from the deformation / processing of a natural image, so it is inevitable There was a problem that unnaturalness would occur.

【００４１】本発明は、上記事情に鑑みてなされたもの
で、高精細な人工画をもとに動きの検出された領域に対
応する該人工画の領域を時間的に更新して、自然な動き
をする合成画像を生成すること及び、人工画のかわりに
線画を用いた合成画像を生成することを目的とする。The present invention has been made in view of the above circumstances. Based on a high-definition artificial image, the region of the artificial image corresponding to the region in which the motion is detected is updated temporally to obtain a natural image. The object is to generate a moving composite image and to generate a composite image using a line drawing instead of an artificial image.

【００４２】また、従来のメディア処理システムでは、
複数の手段により、特に顔全体及び各部の動きに関する
情報を個別にかつ正確に得て、効率の良い符号化や的確
に処理を行うマンマシンインタフェースなどのようなメ
ディア処理装置のための、情報の入力手段及びその処理
方法を提供することを目的とする。Further, in the conventional media processing system,
By means of a plurality of means, particularly for a media processing device such as a man-machine interface, etc., which individually and accurately obtains information regarding the entire face and movements of each part, and performs efficient encoding and accurate processing, It is an object to provide an input means and a processing method thereof.

【００４３】本発明は、上記課題に鑑みてなされたもの
で、多元的に情報をとらえることができ、情報の特性に
合った処理で情報のエッセンスを効率的に獲得すること
を可能とするメディア処理システムを提供することを目
的とする。The present invention has been made in view of the above problems, and is a medium capable of capturing information in a multidimensional manner and efficiently obtaining the essence of information by a process suitable for the characteristics of the information. The purpose is to provide a processing system.

【００４４】[0044]

【課題を解決するための手段】上記目的を達成するため
本願第１の発明は、対象とする情報をそれぞれが限定し
て情報を獲得する複数の入力手段と、この獲得された獲
得情報の属性に基づいて獲得情報の統合化を行う処理手
段と、この処理手段で統合化されて得られる処理情報を
出力する出力手段とを有することを要旨とする。In order to achieve the above object, the first invention of the present application is to provide a plurality of input means for respectively limiting target information to acquire information, and attributes of the acquired acquisition information. The gist of the present invention is to have a processing means for integrating the acquired information based on the above and an output means for outputting the processing information obtained by the integration by the processing means.

【００４５】また、本願第２の発明は、対象とする情報
をそれぞれが限定して情報を獲得する複数の入力手段
と、この獲得された獲得情報から重要度の低い情報を除
いて当該獲得情報の統合化を行う処理手段と、この処理
手段で統合化されて得られる処理情報を出力する出力手
段とを有することを要旨とする。In the second invention of the present application, a plurality of input means for respectively limiting target information to acquire information, and the acquired information excluding less important information from the acquired information are acquired. The gist of the present invention is to have a processing means for integrating the above and an output means for outputting the processing information obtained by the integration by this processing means.

【００４６】また、本願第３の発明は、情報対象に係る
画像情報をそれぞれが限定して獲得する複数の画像情報
入力手段と、この獲得された獲得画像情報に含まれる情
報対象の経時的変化を検出する検出手段と、この検出手
段で得られる経時的変化情報を出力する出力手段とを有
することを要旨とする。The third invention of the present application is a plurality of image information input means for respectively acquiring image information related to the information object, and a change with time of the information object included in the acquired image information. The gist of the present invention is to have a detection means for detecting the above and an output means for outputting the temporal change information obtained by this detection means.

【００４７】本発明では、複数のカメラを備え、それぞ
れに別の動きを与えることによって、実際に符号化すべ
き画面内の各動物体、あるいは動部分の動きを最もよく
表現する動きのカメラからの情報を適応的に利用する手
法を実現する。つまり、ＭＣによる動きベクトルの発生
情報量が最も少なくなるような動きをしているカメラか
らの情報を用いて、各動物体、あるいは動部分の動きベ
クトルを表現する。According to the present invention, a plurality of cameras are provided, and different motions are given to each of the cameras so that the motion of the moving object that best represents the motion of each moving object or moving object in the screen to be encoded. A method for adaptively using information is realized. That is, the motion vector of each moving object or moving part is expressed using the information from the camera that is moving so that the amount of generated information of the motion vector by the MC is minimized.

【００４８】さらに実際の「動」物体、あるいは「動」
部分の動きに合わせて、外部から操作してカメラを動か
すことによって、例えばカメラを覗いていて、ターゲッ
トとなる物体や部分の動きに合わせて人間がカメラを動
かす等して、複雑な動きにも対応した動きベクトルの発
生を可能にする。なお複数の動きがあるときは、複数の
カメラを１台ずつ異なる動きに割り当てて操作すること
も可能である。Furthermore, an actual "moving" object or "moving"
By operating the camera by operating it from the outside according to the movement of the part, for example, looking into the camera, a person moves the camera in accordance with the movement of the target object or part Allows generation of corresponding motion vectors. When there are a plurality of movements, a plurality of cameras can be assigned to different movements one by one and operated.

【００４９】また、実際に測定された各動物体、あるい
は動部分の動きをもとに、カメラの動きを制御すること
により、つまり実際に符号化された動き情報を打ち消す
ような方向にカメラを自動的に動かすことにより、動き
ベクトルの発生情報量を減少させることも可能である。Further, by controlling the movement of the camera based on the movement of each moving body or moving portion actually measured, that is, the camera is moved in a direction in which the actually encoded movement information is canceled. It is also possible to reduce the amount of generated information of the motion vector by automatically moving.

【００５０】また、本願第４の発明は、情報対象に係る
画像情報を含む情報をそれぞれが限定して獲得する１又
は複数の第１の情報入力手段と、この第１の情報入力手
段を操作するものを情報対象として画像情報を含む情報
を獲得する第２の情報入力手段と、前記第１の情報入力
手段で獲得された第１の情報と第２の情報入力手段で獲
得された第２の情報とを合成する合成手段とを有するこ
とを要旨とする。Further, in the fourth invention of the present application, one or a plurality of first information inputting means for respectively obtaining information including image information related to the information object, and the first information inputting means are operated. Second information input means for obtaining information including image information for the information subject, first information obtained by the first information input means, and second information obtained by the second information input means The gist is to have a synthesizing means for synthesizing the information of 1.

【００５１】望ましくは、情報対象を限定して異なる限
定された情報を獲得する複数の入力手段と、該獲得情報
の処理を行う手段と、処理情報を出力する手段とを有す
るメディア処理装置において、装置の前面の使用者の画
像情報を得るために備えられた第１の入力手段と、装置
の前面と異なる向きに備えられた固定または可動の第２
の入力手段を備え、使用者画像の背景の全部または一部
に第２の入力手段から得られる環境画像を表示する手段
とを有すると良い。Desirably, in a media processing device having a plurality of input means for limiting the information object to obtain different limited information, means for processing the obtained information, and means for outputting the processing information, A first input means provided on the front surface of the device for obtaining the image information of the user, and a second fixed or movable device provided in a different direction from the front surface of the device.
And a means for displaying the environment image obtained from the second input means on all or part of the background of the user image.

【００５２】さらに、情報対象を限定して異なる限定さ
れた情報を獲得する複数の入力手段と、該獲得情報の処
理を行う手段と、処理情報を出力する手段とを有するメ
ディア処理システムにおいて、使用者の画像情報を得る
手段と、コード化された環境音情報または音声情報を得
る手段と、該コードから環境画像情報を得る手段と、使
用者画像の背景の全部または一部に環境画像情報を用い
て得られる環境画像を表示する手段とを有すると良い。Further, it is used in a media processing system having a plurality of input means for limiting different information objects to obtain different limited information, means for processing the obtained information, and means for outputting the processing information. Person's image information, coded environmental sound information or voice information, means for obtaining environmental image information from the code, and environmental image information on all or part of the background of the user image. It is preferable to have means for displaying an environmental image obtained by using it.

【００５３】本発明は、情報対象を限定して異なる限定
された情報を獲得する複数の入力手段と、該獲得情報の
処理を行う手段と、処理情報を出力する手段とを有する
メディア処理システムにおいて、使用者の画像情報を得
るための第１の入力手段と、使用者側から見える環境画
像を得るための第２入力手段と、使用者画像の背景の全
部または一部に第２の入力手段から得られる環境画像を
表示する手段とを有することを要旨とする。The present invention is a media processing system having a plurality of input means for limiting different information objects to obtain different limited information, means for processing the obtained information, and means for outputting the processing information. A first input means for obtaining the image information of the user, a second input means for obtaining the environment image seen from the user side, and a second input means for the whole or part of the background of the user image. And a means for displaying an environmental image obtained from the above.

【００５４】さらに、このメディア処理システムは、使
用者画像が表示画面を占める大きさまたは範囲を設定す
る手段を有することを要旨とする。また、さらに、この
メディア処理システムは、第２の入力手段から得られる
環境画像を表示する際に、使用者画像だけを鏡像反転し
て表示する手段を有することを要旨とする。さらに、こ
のメディア処理システムは、第１の入力手段からの画像
または第２の入力手段からの画像だけも表示できる手段
を有することを要旨とする。また本発明のメディア処理
装置は、情報対象を限定して異なる限定された情報を獲
得する複数の入力手段と、該獲得情報の処理を行う手段
と、処理情報を出力する手段とを有するメディア処理装
置において、装置の前面の使用者の画像情報を得るため
に備えられた第１の入力手段と、装置の前面と異なる向
きに備えられた固定または可動の第２の入力手段を備
え、使用者画像の背景の全部または一部に第２の入力手
段から得られる環境画像を表示する手段とを有すること
を要旨とする。Furthermore, the gist of this media processing system is to have means for setting the size or range in which the user image occupies the display screen. Further, the gist of this media processing system is that it has means for displaying only the user image by inverting the mirror image when displaying the environmental image obtained from the second input means. Furthermore, the gist of this media processing system is that it has means for displaying only the image from the first input means or the image from the second input means. Further, the media processing device of the present invention has a plurality of input means for limiting the information object to obtain different limited information, a means for processing the obtained information, and a means for outputting the processing information. The device includes a first input means provided on the front surface of the device to obtain image information of the user, and a fixed or movable second input means provided in a different direction from the front surface of the device. The gist is to have means for displaying an environmental image obtained from the second input means on all or part of the background of the image.

【００５５】さらにこのメディア処理装置は、使用者画
像が表示画面に占める大きさまたは範囲を設定する手段
を有することを要旨とする。また、さらにこのメディア
処理装置は、第２の入力手段から得られる環境画像を表
示する際に、使用者画像だけを鏡像反転して表示する手
段を有することを要旨とする。さらに、このメディア処
理装置は、第１の入力手段からの画像または第２の入力
手段からの画像だけも表示できる手段を有することを要
旨とする。Further, the gist of this media processing device is to have means for setting the size or range of the user image on the display screen. Further, the gist of this media processing device is to have means for displaying only the user image by mirror-inversion when displaying the environmental image obtained from the second input means. Furthermore, the gist of this media processing device is that it has means for displaying only the image from the first input means or the image from the second input means.

【００５６】また本発明のメディア処理システムは、情
報対象を限定して異なる限定された情報を獲得する複数
の入力手段と、該獲得情報の処理を行う手段と、処理情
報を出力する手段とを有するメディア処理システムにお
いて、使用者の画像情報を得る手段と、コード化された
音声または環境音情報を得る手段と、該コードから環境
画像情報を得る手段と、使用者画像の背景の全部または
一部に環境画像情報を用いて得られる環境画像を表示す
る手段とを有することを要旨とする。Further, the media processing system of the present invention comprises a plurality of input means for limiting the information object to obtain different limited information, a means for processing the obtained information, and a means for outputting the processing information. In the media processing system, a means for obtaining the image information of the user, a means for obtaining encoded voice or environmental sound information, a means for obtaining the environmental image information from the code, and all or one of the backgrounds of the user image. The gist is to have means for displaying an environmental image obtained by using the environmental image information in the section.

【００５７】コード化された音情報を得る手段とコード
化から環境画像情報を得る手段として、音から抽出した
ピッチ周波数をコード情報とし、コードを基に環境画像
を変化させる手段を有することを要旨とする。また、コ
ード化された音情報を得る手段とコード化から環境画像
情報を得る手段として、音から抽出したＬＰＣ係数を基
に環境画像を変化させる手段を有することを要旨とす
る。As means for obtaining coded sound information and means for obtaining environmental image information from coding, there is provided means for changing the environmental image based on the code, using the pitch frequency extracted from the sound as code information. And The gist of the present invention is to have means for changing the environmental image based on the LPC coefficient extracted from the sound, as means for obtaining the coded sound information and means for obtaining the environmental image information from the coding.

【００５８】また、本願第５の発明は、情報対象に係る
画像情報をそれぞれが限定して獲得する１又は複数の第
１の画像情報入力手段と、この第１の画像情報入力手段
で獲得された第１の画像情報に含まれる任意の領域に係
る画像情報を獲得する第２の画像情報入力手段と、前記
第１の画像情報入力手段で獲得された第１の画素値デー
タの間に、前記第２の画像情報入力手段で獲得された第
２の画素値データを挿入する手段処理とを有することを
要旨とする。In the fifth invention of the present application, one or a plurality of first image information inputting means for individually obtaining image information related to the information object, and the first image information inputting means. Between the second image information input means for obtaining image information relating to an arbitrary area included in the first image information and the first pixel value data obtained by the first image information input means, And a means process for inserting the second pixel value data acquired by the second image information input means.

【００５９】本発明においては、情報対象を限定して異
なる限定された情報を獲得する複数の入力手段と、該獲
得情報の処理を行う手段と、処理情報を出力する手段と
を有するメディア処理システムにおいて、入力手段とし
て少なくとも２つの撮像手段を有し、第１の撮像手段の
撮影範囲に第２の撮像手段の撮影範囲が含まれ、処理を
行う手段においては、第１の撮像手段からの第１の画像
データと、第２の撮像手段からの第２の画像データを合
わせて１つの統合画像データとして処理されることを要
旨とする。In the present invention, a media processing system having a plurality of input means for limiting the information object to obtain different limited information, means for processing the obtained information, and means for outputting the processing information. In at least two image pickup means as input means, the image pickup range of the first image pickup means includes the image pickup range of the second image pickup means. The gist is that one image data and the second image data from the second image pickup means are combined and processed as one integrated image data.

【００６０】また、情報対象を限定して異なる限定され
た情報を獲得する複数の入力手段と、該獲得情報の処理
を行う手段と、処理情報を出力する手段とを有するメデ
ィア処理システムにおいて、入力手段として少なくとも
１つの撮像手段と、撮像手段からの画像データによって
入力状態修正信号を発生する修正信号発生手段と、その
入力状態修正信号によって、撮像手段あるいはその他の
入力手段の状態を修正する手段を有することを要旨とす
る。Further, in a media processing system having a plurality of input means for limiting different information objects to obtain different limited information, means for processing the obtained information, and means for outputting the processing information, At least one image pickup means as means, a correction signal generation means for generating an input state correction signal in accordance with image data from the image pickup means, and a means for correcting the state of the image pickup means or other input means by the input state correction signal. Having it is the gist.

【００６１】また、修正信号発生手段においては、上記
画像データから顔領域を検出し、その顔領域が画面内で
所定の位置や大きさになるように、撮像手段の向きある
いは倍率を修正する修正信号を発生することを要旨とす
る。また、入力手段として、音声入力手段を有し、修正
信号発生手段においては、上記画像データから顔領域を
検出し、その顔領域の口に相当する部分において、上記
音声入力手段の指向性が高まるように修正する修正信号
を発生することを要旨とする。また、発光手段を有し、
処理を行う手段または、修正信号発生手段においては、
上記画像データの中から、上記発光手段から放射された
光の反射光を検出し、その検出結果に基づいて所定の領
域を検知することを要旨とする。The correction signal generating means detects a face area from the image data and corrects the orientation or magnification of the image pickup means so that the face area has a predetermined position or size on the screen. The point is to generate a signal. Further, the input means has a voice input means, the correction signal generation means detects a face area from the image data, and the directivity of the voice input means is increased in a portion corresponding to the mouth of the face area. The gist is to generate a correction signal for correction as described above. Also, having a light emitting means,
In the processing means or the correction signal generating means,
The gist of the present invention is to detect reflected light of the light emitted from the light emitting means from the image data and detect a predetermined area based on the detection result.

【００６２】また、画像表示手段を有し、修正信号発生
手段においては、上記画像データから顔領域を検出し、
その顔領域の目の方向に上記画像表示手段の向きを修正
する修正信号を発生することを特徴とするものである。
また、撮像手段はメディア処理システムから分離されて
おり、撮像手段を支える支持手段と、撮像手段から入力
される画面の水平保持手段を有することを特徴とするも
のである。Further, it has an image display means, and the correction signal generation means detects a face area from the image data,
It is characterized in that a correction signal for correcting the orientation of the image display means is generated in the direction of the eyes of the face area.
Further, the image pickup means is separated from the media processing system, and has a support means for supporting the image pickup means and a horizontal holding means for a screen input from the image pickup means.

【００６３】また、本願第６の発明は、情報対象に係る
情報を獲得する第１の入力手段と、情報対象の一部を置
き換えるための候補を蓄積しておく候補蓄積手段と、情
報対象入力における入力条件を得る入力条件獲得手段
と、情報対象に係る情報を獲得する第２の入力手段と、
前記入力条件獲得手段で獲得した入力条件と前記第１の
入力手段からの入力情報とをもとに、少なくとも前記候
補蓄積手段と第２の入力手段の一方からの情報から置き
換えのもととなる情報を選択し、当該情報対象入力の一
部を置き換え情報に置き換える置き換え手段とを有する
ことを要旨とする。Further, the sixth invention of the present application is: first input means for acquiring information relating to an information object; candidate accumulating means for accumulating candidates for replacing a part of the information object; and information object input Input condition acquisition means for acquiring the input condition in, and second input means for acquiring the information related to the information object,
Based on the input condition acquired by the input condition acquisition unit and the input information from the first input unit, it becomes a source of replacement from at least information from one of the candidate accumulating unit and the second input unit. The gist is to have a replacement unit that selects information and replaces a part of the information target input with replacement information.

【００６４】上記の課題を解決するため、本発明では、
情報対象を限定して異なる限定された情報を獲得する複
数の入力手段と、該獲得情報の処理を行う手段と、処理
情報を出力する手段とを有するメディア入力装置であっ
て、情報対象の一部を置き換えるための候補を蓄積して
おく手段と、情報対象入力における入力条件を得る手段
と、前記入力手段とは異なる他の入力手段から入力され
た情報を得る手段を有し、前記入力条件と前記入力手段
からの入力情報をもとに前記候補蓄積手段および／また
は前記他の入力手段からの情報から置き換えのもととな
る情報を選択し、前記入力条件と前記入力情報をもとに
前記選択された置き換え情報を変形し、前記情報対象入
力の一部を前記選択変形された置き換え情報に置き換え
ることを要旨とする。In order to solve the above-mentioned problems, in the present invention,
What is claimed is: 1. A media input device comprising: a plurality of input means for limiting different information objects to obtain different limited information; a means for processing the obtained information; and a means for outputting processing information. The input condition includes means for accumulating candidates for replacing a part, means for obtaining an input condition for inputting an information object, and means for obtaining information input from another input means different from the input means. And the information from the candidate accumulating means and / or the information from the other input means based on the input information from the input means, and based on the input condition and the input information. The gist is to modify the selected replacement information and replace a part of the information object input with the selected modified replacement information.

【００６５】さらには、前記メディア入力装置からの情
報対象入力条件、情報対象置き換え情報をもとに前記置
き換えられた情報を再生する手段と、置き換えのもとに
なる情報の候補を蓄積しておく手段と、前記情報対象入
力情報をもとにメディア入力手段で入力された情報を再
生する手段と、前記再生された置き換え情報と前記再生
情報を合成する手段とを有することを要旨とする。Further, means for reproducing the replaced information based on the information object input condition and the information object replacement information from the media input device, and the candidates of information to be replaced are accumulated. It is characterized in that it has means, means for reproducing the information input by the media input means based on the information target input information, and means for synthesizing the reproduced replacement information and the reproduced information.

【００６６】また、少なくとも１つ以上の前記メディア
入力装置と、前記メディア再生装置と、前記置き換えの
もとになる情報を提供する手段とからなり、前記メディ
ア入力手段における置き換えのもとになる情報が入力装
置内の蓄積情報および／または前記置き換え情報提供手
段および／または他のメディア入力手段から提供され、
前記メディア再生装置における置き換え情報再生のもと
となる情報が再生装置内の蓄積情報および／または前記
置き換え情報提供手段および／または他のメディア入力
手段から提供されることを要旨とする。Further, it comprises at least one or more of the media input device, the media reproducing device, and means for providing information to be a source of the replacement, and information to be a source of replacement in the media input means. Is provided from the stored information in the input device and / or the replacement information providing means and / or other media input means,
It is a gist that the information that is the basis for reproducing the replacement information in the media reproducing device is provided from the accumulated information in the reproducing device and / or the replacement information providing means and / or another media input means.

【００６７】また、本願第７の発明は、情報対象の画像
情報をそれぞれが限定して獲得する複数の画像情報入力
手段と、情報対象の音響情報を獲得する音響情報入力手
段と、前記複数の画像情報入力手段と音響情報入力手段
でそれぞれ獲得される獲得情報から各々の情報対象の状
態を獲得する状態獲得手段と、この状態獲得手段で獲得
された情報対象の状態に対応して人工画の各部位を補正
する補正手段とを有することを要旨とする。In the seventh invention of the present application, a plurality of image information input means for respectively acquiring image information of the information object, an acoustic information input means for acquiring acoustic information of the information object, and the plurality of image information input means. A state acquisition means for acquiring the state of each information object from the acquired information obtained by the image information input means and the acoustic information input means, and an artificial image corresponding to the state of the information object acquired by this state acquisition means. The gist is to have a correction means for correcting each part.

【００６８】本発明は、少なくとも２つ以上のカメラに
より構成され、前記カメラが情報源の一部を撮影し所望
の限定情報を獲得する手段および各獲得情報から各々の
状態を得る手段、および人工画の各部位を前記状態に従
い動かす手段を有することを要旨とする。また本発明
は、少なくとも２つ以上のカメラと１つの音響マイクに
より構成され、前記カメラが情報源の一部を撮影し所望
の限定情報を獲得する手段、および各獲得情報から各々
の状態を得る手段、および前記音響マイクより得られる
音声情報から音声状態を得る手段、および前記音声状態
を用いて前記状態を補正する手段、および人工画の各部
位を前記状態に従い動かす手段を有することを要旨とす
る。The present invention comprises at least two or more cameras, said camera capturing a part of an information source to obtain desired limited information, means for obtaining each state from each obtained information, and artificial The gist is to have means for moving each part of the image according to the above-mentioned state. Further, the present invention includes at least two or more cameras and one acoustic microphone, the camera obtains desired limited information by photographing a part of the information source, and obtains each state from each obtained information. And a means for obtaining a voice state from voice information obtained from the acoustic microphone, a means for correcting the state using the voice state, and a means for moving each part of the artificial image in accordance with the state. To do.

【００６９】さらに本発明は、前記状態補正手段におい
て、カメラからの獲得情報の状態と音声情報の状態のう
ち、背景雑音レベルが低いとき音響マイクからの獲得情
報の状態を出力し、背景雑音レベルが高いときカメラか
らの獲得情報の状態を出力することを要旨とする。さら
に本発明は、少なくとも２つ以上のカメラにより構成さ
れ、前記カメラが情報源の一部を撮影し所望の限定情報
を獲得する手段、および各獲得情報から各々の線画を得
る手段、および該線画を統合して表示する手段を有する
ことを要旨とする。Further, according to the present invention, in the state correcting means, the state of the acquired information from the acoustic microphone is output when the background noise level is low among the state of the acquired information from the camera and the state of the audio information, and the background noise level is output. The point is to output the state of the acquired information from the camera when is high. Further, the present invention comprises at least two or more cameras, said camera capturing a part of an information source to obtain desired limited information, means for obtaining each line drawing from each obtained information, and the line drawing. The gist is to have means for integrating and displaying.

【００７０】さらに本発明は、少なくとも２つ以上のカ
メラと１つの音響マイクにより構成され、前記カメラが
情報源の一部を撮影し所望の限定情報を獲得する手段、
および各獲得情報から各々の線画を得る手段、および前
記音響マイクより得られる音声情報から音声状態を得る
手段、および前記音声状態を用いて前記線画を補正する
手段、および該線画を統合して表示する手段を有するこ
とを要旨とする。さらに本発明は、前記線画補正手段に
おいて、カメラからの獲得情報の状態と音声情報の状態
のうち、背景雑音レベルが低いとき音響マイクからの獲
得情報の線画を出力し、背景雑音レベルが高いときカメ
ラからの獲得情報の線画を出力することを要旨とする。Furthermore, the present invention comprises means for at least two or more cameras and one acoustic microphone, said camera taking a part of an information source to obtain desired limited information.
And a means for obtaining each line drawing from each acquired information, a means for obtaining a voice state from voice information obtained from the acoustic microphone, a means for correcting the line drawing using the voice state, and an integrated display of the line drawings. The gist is to have means for doing so. Further, according to the present invention, the line drawing correction means outputs a line drawing of the acquired information from the acoustic microphone when the background noise level is low among the states of the information acquired from the camera and the state of the audio information, and when the background noise level is high. The point is to output a line drawing of the information acquired from the camera.

【００７１】さらに、本願第８の発明は、情報対象に係
る画像情報を獲得する第１の画像入力手段と、この第１
の画像入力手段が獲得する画像領域の一部または全領域
の画像情報を獲得する第２の画像入力手段と、前記第１
の画像入力手段と第２の画像入力手段が獲得した画像情
報の少なくとも一方の経時的変化分を抽出する抽出手段
と、この抽出手段で抽出された経時的変化分に応じて前
記第１の画像入力手段で獲得された画像情報の一部を前
記第２の画像入力手段で獲得された画像情報に置き換え
る置き換え手段と、この置き換え手段による画像情報の
置き換えを、前記第１の画像入力手段で獲得された画像
情報に含まれる情報対象の全体的な動きに伴う当該情報
対象の一部に動きに対応するべく制御する制御手段とを
有することを要旨とする。Further, the eighth invention of the present application is a first image input means for acquiring image information of an information object, and the first image input means.
Second image input means for acquiring image information of a part or the whole of the image area acquired by the image input means of
Extracting means for extracting a temporal change of at least one of the image information acquired by the image input means and the second image input means, and the first image according to the temporal change extracted by the extracting means. The replacement means for replacing a part of the image information acquired by the input means with the image information acquired by the second image input means, and the replacement of the image information by this replacement means are acquired by the first image input means. The gist of the present invention is to have a control means for controlling a part of the information object included in the image information thus generated, which corresponds to the overall motion of the information object.

【００７２】本発明は背景を含む顔全体の画像を入力し
顔全体の動きや目や口などの顔面内の各局部的な動きを
検出・分析した情報を得て、それを符号化する装置或い
は被撮影者の行動を分析するマンマシンインタフェース
等のメディア処理装置に用いられ、上記入力手段として
複数の画像入力手段を有し、該画像入力手段は一つの主
な画像入力手段と該一つの主な画像入力手段に従属する
他の入力手段が存在し、該主な画像入力手段は上記顔全
体の画像を入力し、該従属する入力手段は目や口の周
辺、背景など該主な入力手段が得る画像の一部または全
部の画像を入力し、該処理を行う手段は顔の画像の時間
的変化分の抽出や、該主な入力手段が得た顔全体画像の
一部、すなわち目や口の周辺部分等を該従属する入力手
段が得た画像で置き換えすることを行い、該処理を行う
手段はまた顔の全体的な動きを抽出し、顔全体の動きに
伴う目や口の周辺等の動きに該従属する入力手段が追随
できるように該従属する入力手段を制御する手段を有す
ることを要旨とする。The present invention is a device for inputting an image of the entire face including a background, detecting and analyzing the movement of the entire face and each local movement in the face such as eyes and mouth, and encoding the information. Alternatively, it is used in a media processing device such as a man-machine interface for analyzing the behavior of a person to be photographed, and has a plurality of image input means as the input means, and the image input means includes one main image input means and the one image input means. There is another input means subordinate to the main image input means, the main image input means inputs the image of the entire face, and the subordinate input means inputs the main input such as around the eyes and mouth and the background. The means for inputting a part or all of the image obtained by the means, and the means for performing the processing are extraction of a temporal change of the face image and a part of the whole face image obtained by the main input means, that is, the eye. Place the surrounding area of the mouth and mouth with the image obtained by the subordinate input means. The means for performing the processing also extracts the overall movement of the face, and the subordinate input means can follow the movement of the eyes and mouth around the movement of the entire face. The gist is to have a means for controlling the input means for performing.

【００７３】また、入力手段として頭部に装着する入力
装置を用い、その入力装置は顔全体の位置検出や動きの
検出、または目や口の周辺の画像を得ることを要旨とす
る。また、赤外域における波長の光の画像を得る装置を
入力手段として有し、背景も含む顔全体の画像から顔の
領域を抽出すること、または顔全体の画像から瞳孔また
は口腔、鼻腔の位置を検出することを要旨とする。Further, the gist of the present invention is to use an input device to be worn on the head as the input means, and to detect the position of the entire face, the motion, or the image around the eyes and mouth. Also, having a device for obtaining an image of light having a wavelength in the infrared region as an input means, extracting the face region from the image of the entire face including the background, or the position of the pupil or oral cavity, the nasal cavity from the image of the entire face. The point is to detect.

【００７４】[0074]

【作用】本願第１の発明のメディア処理システムは、複
数の入力手段が、対象とする情報をそれぞれが限定して
情報を獲得すると、この獲得された獲得情報の属性に基
づいて処理手段で獲得情報の統合化が行われ、出力手段
から出力される。In the media processing system according to the first aspect of the present invention, when the plurality of input means acquire the information by limiting the target information, the processing means acquires the information based on the attribute of the acquired acquisition information. Information is integrated and output from the output means.

【００７５】本願第２の発明のメディア処理システム
は、複数の入力手段が、対象とする情報をそれぞれが限
定して情報を獲得すると、この獲得された獲得情報から
重要度の低い情報を除いて処理手段で獲得情報の統合化
が行われ、出力手段から出力される。In the media processing system according to the second aspect of the present invention, when the plurality of input means each obtain limited information by limiting the target information, the less important information is removed from the obtained obtained information. The acquired information is integrated by the processing means and output from the output means.

【００７６】本願第３の発明のメディア処理システム
は、複数の画像情報入力手段で、情報対象に係る画像情
報をそれぞれが限定して獲得され、この獲得画像情報に
含まれる情報対象の経時的変化が検出手段で検出され、
この経時的変化情報が出力される。In the media processing system of the third invention of the present application, the image information relating to the information object is limitedly acquired by the plurality of image information inputting means, and the time-dependent change of the information object included in the acquired image information. Is detected by the detection means,
This time-dependent change information is output.

【００７７】本発明によれば、実際に符号化すべき画面
内の各動物体、あるいは動部分の動きを最もよく表現す
る動きのカメラからの情報を適応的に利用することが可
能になるため、実際の動きを全て動きベクトルとして表
現する必要が無くなり、ＭＣによる動きベクトルの発生
情報量を少なくすることが可能である。さらにカメラの
動きによって動き量検出範囲が自動的に拡大されるの
で、実際の動きベクトルを計算する際には検出範囲を限
定することが可能になり、動きベクトルに割り当てる符
号も減少して、動きベクトル情報を少なくおさえること
も可能となる。また求められた各動物体、あるいは動部
分の実際の動きをもとに、カメラを動かしたり、あるい
は自動的に動きを制御することにより、動きベクトルの
発生情報量をさらに減少させることも可能となる。According to the present invention, it is possible to adaptively use the information from the camera of the motion that best expresses the motion of each moving object or moving part in the screen to be encoded. It is not necessary to represent all actual motions as motion vectors, and it is possible to reduce the amount of information generated by MCs about motion vectors. Furthermore, since the motion amount detection range is automatically expanded by the camera movement, it is possible to limit the detection range when calculating the actual motion vector, and the code assigned to the motion vector is also reduced. It is also possible to reduce the amount of vector information. It is also possible to further reduce the amount of information generated in the motion vector by moving the camera or automatically controlling the motion based on the actual motion of each moving object or moving part obtained. Become.

【００７８】本願第４の発明のメディア処理システム
は、１又は複数の第１の情報入力手段により情報対象に
係る画像情報を含む情報をそれぞれが限定して獲得され
る。この第１の情報入力手段を操作するもの、例えば操
作者を第２の情報入力手段で情報対象として画像情報を
含む情報を獲得する。さらに前記第１の情報入力手段で
獲得された第１の情報と第２の情報入力手段で獲得され
た第２の情報とを合成手段により合成する。In the media processing system according to the fourth aspect of the present invention, each piece of information including image information relating to the information target is acquired by the one or more first information input means. Information including image information is acquired by operating the first information input means, for example, the operator as the information target by the second information input means. Furthermore, the first information obtained by the first information input means and the second information obtained by the second information input means are combined by the combining means.

【００７９】本発明によるメディア処理システムは、シ
ステムの使用者が情報発信する際に、使用者の画像（客
観視点の画像）と使用者が見ている環境画像（主観視点
の画像）を自然で臨場感のある１つの画像にするため
に、第１の入力手段で使用者の画像情報を得て、第２の
入力手段で使用者側から見える環境画像を得る。In the media processing system according to the present invention, when the user of the system transmits information, the image of the user (the image of the objective viewpoint) and the environmental image (the image of the subjective viewpoint) viewed by the user are naturally displayed. In order to obtain one realistic image, the first input means obtains the image information of the user, and the second input means obtains the environmental image viewed from the user side.

【００８０】次に使用者画像の背景の部分に、第２の入
力手段から得られる環境画像を表示することにより、使
用者画像とこの環境画像を自然に合成することができ
る。Next, by displaying the environment image obtained from the second input means on the background portion of the user image, the user image and this environment image can be naturally combined.

【００８１】さらに、使用者画像の表示画面を占める大
きさまたは範囲を設定する手段を有することにより、使
用者画像と環境画像のどちらを中心に表示したいかを自
由に使い分けることができるようになる。Furthermore, by providing means for setting the size or range of the user image occupying the display screen, it becomes possible to freely use which of the user image and the environmental image is desired to be displayed. .

【００８２】さらに、第２の入力手段から得られる環境
画像を表示する際、使用者画像だけを鏡像反転して表示
する手段を有することにより、使用者が実環境を見て指
し示す物の方向と、表示される画面の中で使用者の画像
が指し示す物の方向を一致させることができる。Further, when displaying the environment image obtained from the second input means, by providing means for displaying only the user image by mirror-inversion, the direction of the object pointed to by the user looking at the actual environment can be determined. The direction of the object pointed by the image of the user can be matched in the displayed screen.

【００８３】また、本発明のメディア処理装置は、装置
の前面の使用者の画像情報を得るために備えられた第１
の入力手段と、装置の前面と異なる向きに備えられた固
定または可動の第２の入力手段を備えるため、使用者が
装置の前面を自分側に向けて使用することにより、どの
ような使い方においても妨害無く使用者側からみた環境
画像を獲得できる。Further, the media processing device of the present invention is provided with the first device provided for obtaining the image information of the user on the front surface of the device.
Since the input means and the second input means, which is fixed or movable and is provided in a different direction from the front surface of the apparatus, are provided, the user can use the front surface of the apparatus with his or her front side facing toward the user. However, the environmental image viewed from the user side can be acquired without any interference.

【００８４】さらに、本発明のメディア処理システム
は、コード化された環境音情報または音声情報を得て、
このコードを予め設定した方法により画像情報に変換
し、これを環境画像として使用者画像の背景に表示させ
ることにより、使用者側からの音の変化に合ったタイミ
ングで環境画像を変化させることができる。また、コー
ド情報として音声や環境音から抽出したピッチ周波数を
用いると、ピッチ周波数を基に環境画像情報を周期化さ
せるという変化をもたらすことができる。別のコード情
報として、抽出したＬＰＣ係数情報を基に背景画像にフ
ィルタをかけて画質を変化させると、環境画像に音から
得られる変化に対応する画質の変化を与えることが可能
となる。したがって、送り手側の音の変化の雰囲気を伝
える秘匿化された背景画像を表示するメディア処理シス
テムを提供できる。Further, the media processing system of the present invention obtains the encoded environmental sound information or voice information,
By converting this code into image information by a preset method and displaying it as the environment image on the background of the user image, the environment image can be changed at a timing that matches the change in the sound from the user side. it can. Further, when the pitch frequency extracted from the voice or the environmental sound is used as the code information, it is possible to bring about a change that the environmental image information is made periodic based on the pitch frequency. As another code information, when the background image is filtered based on the extracted LPC coefficient information and the image quality is changed, it is possible to give the environmental image a change in image quality corresponding to the change obtained from the sound. Therefore, it is possible to provide a media processing system that displays a concealed background image that conveys the atmosphere of changes in the sound on the sender side.

【００８５】本願第５の発明のメディア処理システム
は、１又は複数の第１の画像情報入力手段が情報対象に
係る画像情報をそれぞれが限定して獲得する。この第１
の画像情報入力手段で獲得された第１の画像情報に含ま
れる任意の領域に係る画像情報を第２の画像情報入力手
段が獲得する。さらに、前記第１の画像情報入力手段で
獲得された第１の画素値データの間に、前記第２の画像
情報入力手段で獲得された第２の画素値データを挿入す
る。In the media processing system according to the fifth aspect of the present invention, one or a plurality of first image information input means each obtain limited image information of an information object. This first
The second image information input means acquires image information relating to an arbitrary area included in the first image information acquired by the image information input means. Further, the second pixel value data acquired by the second image information input unit is inserted between the first pixel value data acquired by the first image information input unit.

【００８６】このように構成されたものにおいては、第
１の撮像手段で画面全体を撮像し、第２の撮像手段で重
要な部分を撮像すると、処理を行う手段においては、重
要な部分が他の部分よりも解像度が高い統合画像データ
として処理することが可能となる。また、修正信号発生
手段においては、撮像手段からの画像データによって入
力状態修正信号が発生され、その入力状態修正信号によ
って、撮像手段あるいはその他の入力手段の状態を最適
な状態に時々刻々修正することが可能となる。また、修
正信号発生手段においては、画像データから顔領域が検
出され、その顔領域が画面内で所定の位置や大きさにな
るように撮像手段の向きあるいは倍率を修正する修正信
号が発生され、撮像範囲からはみ出すことなく顔を撮像
することが可能となる。また、修正信号発生手段におい
ては、画像データから顔領域が検出され、その顔領域の
口に相当する部分において音声入力手段の指向性が高ま
るように修正する修正信号を発生することにより、音声
を良好に入力することが可能となる。また、処理を行う
手段または、修正信号発生手段において、画像データの
中から、発光手段から放射された光の反射光が検出さ
れ、その検出結果に基づいて眼鏡の位置、目の位置など
を検知することが可能となる。In such a configuration, when the first image pickup means picks up the entire screen and the second image pickup means picks up an important part, the processing means performs the other important part. It is possible to process the image data as integrated image data having a higher resolution than the portion. Further, in the correction signal generation means, an input state correction signal is generated by the image data from the image pickup means, and the input state correction signal is used to correct the state of the image pickup means or other input means to an optimum state every moment. Is possible. In the correction signal generation means, a face area is detected from the image data, and a correction signal for correcting the orientation or magnification of the image pickup means is generated so that the face area has a predetermined position or size on the screen. It is possible to image the face without protruding from the imaging range. Further, in the correction signal generating means, a face area is detected from the image data, and a correction signal for correcting the directivity of the voice input means is generated in a portion corresponding to the mouth of the face area to generate a voice. It becomes possible to input well. The processing means or the correction signal generating means detects the reflected light of the light emitted from the light emitting means from the image data, and detects the position of the glasses, the position of the eyes, etc. based on the detection result. It becomes possible to do.

【００８７】また、修正信号発生手段において、画像デ
ータから顔領域が検出され、その顔領域の目の方向に画
像表示手段の向きを修正する修正信号が発生されること
により、使用者が動いても常に画像表示手段の向きは使
用者が見やすい方向に保たれる。また、撮像手段を支え
る支持手段と、外部状況によっては撮像手段を傾けて支
持しなければならない場合があるが、その場合でも画面
の水平保持手段によって、画面は常に水平に保たれる。Further, the correction signal generating means detects a face area from the image data and generates a correction signal for correcting the orientation of the image display means in the direction of the eyes of the face area, whereby the user is moved. Also, the orientation of the image display means is always kept in a direction that is easy for the user to see. Further, the support means for supporting the image pickup means and the image pickup means may be required to be tilted and supported depending on the external situation. Even in that case, the screen horizontal holding means keeps the screen horizontal.

【００８８】本願第６の発明のメディア処理システム
は、第１の入力手段と第２の入力手段により、情報対象
に係る情報が獲得される。また、候補蓄積手段に情報対
象の一部を置き換えるための候補が蓄積される。一方、
入力条件獲得手段で獲得した入力条件と前記第１の入力
手段からの入力情報とをもとに、少なくとも前記候補蓄
積手段と第２の入力手段の一方からの情報から置き換え
のもととなる情報を選択し、当該情報対象入力の一部を
置き換え情報に置き換える。In the media processing system of the sixth invention of the present application, the information related to the information object is acquired by the first input means and the second input means. Further, candidates for replacing a part of the information target are stored in the candidate storage means. on the other hand,
Information serving as a source of replacement from at least information from one of the candidate accumulating means and the second input means based on the input condition acquired by the input condition acquiring means and the input information from the first input means. To replace part of the information target input with replacement information.

【００８９】本発明によるメディア入力装置では、入力
した画像／音等のメディアのうち、入力条件がわかれば
データベースアクセスや他の入力手段からの情報をもと
に置き換えが可能な部分を探し分離する処理が行われ
る。そして、伝送／蓄積の際に、置き換え可能な部分に
ついては画像／音等をそのまま伝送するのではなく、入
力条件情報を伝送／蓄積し、再生の際にはこの情報をも
とにデータベースアクセスや他の入力手段からの情報を
もとに置き換え画像／音を作成し合成する方法が用いら
れる。In the media input device according to the present invention, in the media such as the input image / sound, if the input conditions are known, a replaceable part is searched for and separated based on the information from the database access or other input means. Processing is performed. Then, when transmitting / accumulating, instead of transmitting the image / sound or the like as it is to the replaceable portion, the input condition information is transmitted / accumulated, and at the time of reproduction, database access or A method of creating and synthesizing a replacement image / sound based on information from other input means is used.

【００９０】例えば、画像において人物部分のような高
忠実性が要求される部分は入力した画像をそのまま圧縮
符号化し伝送／蓄積するが、背景部分のように雰囲気が
伝えられれば良い部分については入力画像をそのまま用
いるのではなく、入力条件情報を伝送／蓄積する。この
ため、背景部分の伝送／蓄積のための情報量を従来の伝
送／蓄積装置に比べ大幅に削減することができる。ま
た、画像／音をそのまま圧縮符号化する部分について
も、顔、人物といったように入力対象を限定してそれぞ
れの入力手段を用いることができ、全体をひとつの入力
として扱う場合に比べ構成要素に分離しやすく、それぞ
れの構成要素に合った圧縮符号化を行い高い符号化効率
を得ることができる。また、伝送／蓄積される情報は人
物と背景のように構成要素毎に分離されているため、再
生の際にある構成要素を他のものに置き換えるといった
処理を行い易い。For example, in a portion of the image such as a human portion, which requires high fidelity, the input image is compressed and coded as it is, and transmitted / stored. The input condition information is transmitted / stored instead of using the image as it is. Therefore, the amount of information for transmission / accumulation of the background portion can be significantly reduced as compared with the conventional transmission / accumulation device. In addition, even for the part where the image / sound is compression-coded as it is, each input means can be used by limiting the input target such as a face and a person, and compared with the case where the whole is treated as one input, it becomes a constituent element. It is easy to separate, and high compression efficiency can be obtained by performing compression encoding suitable for each component. Further, since the information to be transmitted / stored is separated for each component such as a person and a background, it is easy to perform a process of replacing a certain component with another one during reproduction.

【００９１】本願第７の発明のメディア処理システム
は、複数の画像情報入力手段で情報対象の画像情報をそ
れぞれが限定して獲得し、音響情報入力手段が情報対象
の音響情報を獲得する。これら情報入力手段で、それぞ
れ獲得される獲得情報から各々の情報対象の状態が状態
獲得手段により獲得される。この状態獲得手段で獲得さ
れた情報対象の状態に対応して人工画の各部位が補正手
段により補正される。In the media processing system of the seventh invention of the present application, the image information of the information object is limited and acquired by the plurality of image information input means, and the acoustic information input means acquires the acoustic information of the information object. The status of each information object is acquired by the status acquisition means from the acquired information acquired by these information input means. The correction means corrects each part of the artificial image in accordance with the state of the information target acquired by the state acquisition means.

【００９２】この結果、本発明によれば、限定された情
報から画像を合成できるので非常に少ない情報で動画像
を表現することができる。また、高精細な人工画を基に
動画像を生成するため、歪のない高品質な動画を生成す
ることができる。また、好みの人工画をデータベースか
ら選択できるため、同一の情報源に対し様々な合成画像
を得ることができる。さらに、合成画像の動きは音声情
報により補正されているため、違和感の無い合成画像を
得ることができる。As a result, according to the present invention, since the image can be combined from the limited information, the moving image can be expressed with very little information. Moreover, since a moving image is generated based on a high-definition artificial image, it is possible to generate a high-quality moving image without distortion. In addition, since a desired artificial image can be selected from the database, various synthetic images can be obtained for the same information source. Furthermore, since the movement of the composite image is corrected by the audio information, it is possible to obtain a composite image with no discomfort.

【００９３】本願第８の発明のメディア処理システム
は、この第１の画像入力手段で獲得される情報対象に係
る画像情報の画像領域の一部または全領域の画像情報が
第２の画像入力手段により獲得される。第１の画像入力
手段と第２の画像入力手段が獲得した画像情報の少なく
とも一方の経時的変化分が抽出手段で抽出される。この
抽出手段で抽出された経時的変化分に応じて前記第１の
画像入力手段で獲得された画像情報の一部を前記第２の
画像入力手段で獲得された画像情報に置き換え手段が置
き換える。この置き換え手段による画像情報の置き換え
は、制御手段により第１の画像入力手段で獲得された画
像情報に含まれる情報対象の全体的な動きに伴う当該情
報対象の一部に動きに対応するべく制御される。In the media processing system of the eighth invention of the present application, the image information of a part or the whole of the image area of the image information related to the information object acquired by the first image input means is the second image input means. Earned by. At least one temporal change of the image information acquired by the first image input means and the second image input means is extracted by the extraction means. The replacement unit replaces a part of the image information acquired by the first image input unit with the image information acquired by the second image input unit according to the change with time extracted by the extraction unit. The replacement of the image information by the replacement means is controlled by the control means so as to correspond to the movement of a part of the information object included in the image information acquired by the first image input means To be done.

【００９４】本発明では顔全体の画像より顔全体の動き
を検出し、別に顔全体の動きに追従させながら目や口の
周辺の画像を入力する。そのようにして得た目や口の周
辺の画像は顔全体の動きが追従の操作によって相殺され
ているため、それらの局所的な動きを容易に捉えること
ができる。従って顔全体の動きと目や口周辺の局所的な
動きを分離しかつ正確に、またブレのより少ない鮮明な
品質の画像を得ることができる。これにより例えば顔画
像の符号化において顔の動きに関する情報量を抑えるこ
とができる。特に被撮影者の顔をワイヤフレームモデル
などの形でモデル化して符号化を行う場合、顔全体の動
きと各局部の動きを別個に検出・分析できるのでモデル
を自然に動かすことができるようになる。In the present invention, the movement of the entire face is detected from the image of the entire face, and the images around the eyes and mouth are separately input while following the movement of the entire face. In the images around the eyes and mouth obtained in this way, the movements of the entire face are canceled by the follow-up operation, so that these local movements can be easily captured. Therefore, it is possible to separate the motion of the entire face from the local motion around the eyes and the mouth accurately, and obtain a clear quality image with less blurring. As a result, for example, the amount of information about the movement of the face can be suppressed when the face image is encoded. Especially when the face of the photographed person is modeled and encoded in the form of a wireframe model, etc., the motion of the entire face and the motion of each local area can be detected and analyzed separately, so that the model can be moved naturally. Become.

【００９５】また被撮影者の動きを正確に分析できるた
めマンマシンニンタフェースにおいて適性な処理を一層
確実にすることができる。また、頭部に装着するような
入力装置を用い、その装置で顔の位置検出や動きを検出
し、また目や口の周辺の画像も得ることにより、顔全体
の位置や動き、目や口周辺の局所的な動きの情報をより
正確に得ることができる。また、赤外域における波長の
光の画像を得る装置を入力手段として有し、背景も含む
顔全体の画像から発熱体である顔の領域を抽出するこ
と、または顔全体の画像から自身は発熱しない瞳孔また
は口腔、鼻腔の位置を検出することをより正確に行うこ
とができる。Further, since the motion of the person to be photographed can be accurately analyzed, it is possible to further ensure appropriate processing in the man-machine interface. In addition, by using an input device that is worn on the head, the device detects the position and movement of the face, and also obtains images of the area around the eyes and mouth, to detect the position and movement of the entire face, eyes and mouth. It is possible to more accurately obtain information on the local movement of the periphery. Further, it has a device for obtaining an image of light having a wavelength in the infrared region as an input means, and extracts the area of the face, which is a heating element, from the image of the entire face including the background, or does not generate heat from the image of the entire face. It is possible to detect the positions of the pupil, the oral cavity, and the nasal cavity more accurately.

【００９６】[0096]

【実施例】以下、本発明に係る一実施例を図面を参照し
て説明する。図１は本発明に係るメディア処理システム
の構成を示したブロック図である。図１に示すように、
情報対象としての任意の情報源１に対して、１又は複数
の限定入力装置３ａ，〜，３ｎが設けられ、この限定入
力装置３により限定情報Ｉ1 が獲得され、獲得情報Ｉ3
として処理装置５に入力される。この獲得情報Ｉ3は、
処理装置５において適宜処理され、処理情報Ｉ5 として
出力装置７に入力され、さらに出力装置７から映像、音
等として出力される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a media processing system according to the present invention. As shown in Figure 1,
For any information source 1 as an information object, one or a plurality of limited input devices 3a to 3n are provided, and the limited input device 3 acquires the limited information I1 and the acquired information I3.
Is input to the processing device 5. This acquisition information I3 is
The information is appropriately processed in the processing device 5, is input to the output device 7 as the processing information I5, and is further output from the output device 7 as an image, a sound or the like.

【００９７】次に、本発明に係る第１の実施例について
説明する。まず、図２を参照するに、限定入力装置１１
０は、複数のカメラ１０１，〜，１０５，〜で構成され
ており、これらは例えば固定カメラ１０１、水平方向に
移動するカメラ１０２、垂直方向に移動するカメラ１０
３、カメラと被写体とを結ぶ軸の周りに回転するカメラ
１０４、あるいはある一定の範囲で反復運動をするカメ
ラ１０５などである。これらから取り込まれた各獲得情
報は処理装置１４０に送られ、圧縮などの処理が行われ
た後、伝送路を介して復号側に伝送される。復号側の復
号化器１５０ではこれら伝送されてきた情報をもとに情
報を復号し、再生情報を構成して出力する。Next, a first embodiment according to the present invention will be described. First, referring to FIG. 2, the limited input device 11
0 is composed of a plurality of cameras 101, ..., 105, .. These are, for example, a fixed camera 101, a camera 102 that moves in the horizontal direction, and a camera 10 that moves in the vertical direction.
3, a camera 104 that rotates around an axis that connects the camera and the subject, or a camera 105 that repeatedly moves within a certain range. The acquired information fetched from these is sent to the processing device 140, and after being subjected to processing such as compression, it is transmitted to the decoding side via the transmission path. The decoder 150 on the decoding side decodes the information on the basis of the transmitted information, forms reproduction information, and outputs the reproduction information.

【００９８】一方、処理装置１４０は、動き量設定回路
１３０、及び符号化器１２０がある。動き量設定回路１
３０からは、限定入力装置１１０の動きを制御する信号
が出力されると同時に、その動き情報が符号化器１２０
に転送される。この例では、これら動きが外部の操作ボ
ード１３１により、マニュアルで独立に設定できるよう
になっているが、カメラを覗いて、動物体の動きに追従
するように人間がボードを操作することも可能である。
また直接カメラを覗いて、そのカメラを人間が動かして
もかまわない。On the other hand, the processing device 140 has a motion amount setting circuit 130 and an encoder 120. Movement amount setting circuit 1
A signal for controlling the motion of the limited input device 110 is output from the 30 and at the same time, the motion information is output from the encoder 120.
Transferred to. In this example, these movements can be manually set independently by the external operation board 131, but it is also possible for a human to operate the board by looking into the camera and following the movement of the moving object. Is.
In addition, it does not matter if a person directly looks into the camera and moves the camera.

【００９９】図３は符号化器１２０の内部を詳細に示し
た図であり、図４はこの符号化器で実際に行われる動き
補償（以下、単にＭＣと略記）を説明した図である。限
定入力装置１１０の複数のカメラ１０１〜からの画像情
報は入力情報切り替え回路１２１に入力され、また各カ
メラの動き情報、そのものはカメラの動き１、カメラの
動き２、カメラの動き３、…として動き補償回路１２３
に取り込まれる。動き補償回路１２３では、画面内を複
数の領域に分割して、各領域ごとに従来と略同様なＭＣ
を行うが、この時各カメラから取り込まれた情報（複数
の現フレーム情報）とフレームメモリ１２４内の情報
（前フレーム情報）との間でＭＣを行う。そしてＭＣ情
報量が最も少なくなる現フレーム情報、つまり最も少な
い動きで前フレームから作成できる現フレーム情報を取
り込んだカメラの動き（具体的には、図３内のカメラの
動き１、カメラの動き２、カメラの動き３、…のいずれ
か）を初期の動きベクトルとすると共に、動き量情報と
して動き量設定回路１３０へ出力される。ＭＣ部分の詳
細を図４を用いて説明する。フレームメモリ内の前フレ
ームＡにおいて、動物体ａが図４（ａ）の位置にあり、
次のフレーム（符号化すべき現フレーム）Ｂでは図４
（ａ）または（ｂ）の矢印ｂの方向に動いて図４（ｂ）
のａへ移動したとする。この時動きベクトルは図のｂそ
のものになり、従来はこのベクトルを情報として符号化
していた。ここでもし、カメラ自体が図４（ｃ）のｄの
ように動物体に追従して動いたとすると、もともとカメ
ラが動いていなかった場合に図４（ｃ）の現フレームＢ
内のｃの位置にあるはずの動物体は、移動後の現フレー
ムＣにおいては左サイドの位置にとどまっていることに
なる。FIG. 3 is a diagram showing the inside of the encoder 120 in detail, and FIG. 4 is a diagram explaining the motion compensation (hereinafter simply referred to as MC) actually performed by this encoder. The image information from the plurality of cameras 101 to 101 of the limited input device 110 is input to the input information switching circuit 121, and the motion information of each camera is represented as camera movement 1, camera movement 2, camera movement 3, ... Motion compensation circuit 123
Is taken into. In the motion compensation circuit 123, the screen is divided into a plurality of areas, and each area has a MC similar to the conventional one.
At this time, MC is performed between the information (plurality of current frame information) fetched from each camera and the information (previous frame information) in the frame memory 124. Then, the current frame information with the smallest amount of MC information, that is, the camera movement that takes in the current frame information that can be created from the previous frame with the least movement (specifically, the camera movement 1 and the camera movement 2 in FIG. 3). , Camera movement 3, ...) As an initial motion vector, and is output to the motion amount setting circuit 130 as motion amount information. Details of the MC portion will be described with reference to FIG. In the previous frame A in the frame memory, the moving object a is in the position of FIG. 4 (a),
The next frame (current frame to be encoded) B is shown in FIG.
Moving in the direction of arrow b in (a) or (b), FIG.
It is assumed that the user has moved to a. At this time, the motion vector becomes b itself in the figure, and conventionally, this vector was encoded as information. Here, if the camera itself moves following the moving object as indicated by d in FIG. 4C, if the camera originally did not move, the current frame B in FIG. 4C is displayed.
The moving object, which should be in the position c, remains in the left side position in the current frame C after the movement.

【０１００】このとき、フレームＡ内におけるａの位置
関係が、図４（ｃ）ではフレームＣ内におけるｆの位置
関係であるとすると、フレーム内でのこの動物体の動き
は、図４（ｃ）に矢印ｅで示される。これはカメラの動
きｅが実際の動物体の動きｂに近ければ近いほど小さな
値をとる。そして最終的な動きベクトルは（ｄ＋ｅ）で
表現される。At this time, if the positional relationship of a in the frame A is the positional relationship of f in the frame C in FIG. 4C, the movement of the moving object in the frame is as shown in FIG. ) Is indicated by an arrow e. This takes a smaller value as the camera movement e is closer to the actual movement b of the moving object. The final motion vector is represented by (d + e).

【０１０１】以上の操作は各「動」物体、あるいは
「動」領域についてそれぞれ行われる。つまり、異なる
動きの領域については動きが参照されるカメラも当然異
なるため、各領域ごとに上記ｄに相当するカメラの動き
を求めることになる。The above operation is performed for each "moving" object or "moving" area. That is, since the cameras whose motions are referred to are naturally different for the regions of different motions, the motion of the camera corresponding to the above d is calculated for each region.

【０１０２】再び、図３を参照するに、限定入力装置１
１０からの信号は、上記図４を用いた説明において、ｅ
が最も小さくなるように入力情報切り替え回路１２１で
切り替えられる。つまり限定入力装置１１０からの信号
が順次動き補償回路１２３に取り込まれ、その時のカメ
ラの動きが初期動きベクトルとしてフレームメモリ１２
４内に送られて、ここからさらに小さな動きベクトルｅ
を動き補償回路１２３内で計算する。このｅが最も小さ
くなる時のカメラの動きが本当の初期ベクトルとして採
用される。Referring again to FIG. 3, the limited input device 1
The signal from 10 is e
Is switched by the input information switching circuit 121 so as to be the smallest. That is, the signals from the limited input device 110 are sequentially fetched into the motion compensation circuit 123, and the motion of the camera at that time is used as the initial motion vector in the frame memory 12.
4 and the smaller motion vector e from here
Is calculated in the motion compensation circuit 123. The motion of the camera when this e becomes the smallest is adopted as the true initial vector.

【０１０３】この時カメラの動きが予め設定されてお
り、復号側でもわかる場合は、カメラの動きを伝送する
必要はないが、操作ボード１３１において随時変更して
いる場合、あるいは実際に人間がカメラを動かしている
場合は、最終的に初期ベクトルとして選択されたカメラ
の動き（図３内のカメラの動き１、カメラの動き２、カ
メラの動き３・・・のいずれか）を復号側へ転送する必
要がある。At this time, if the motion of the camera is set in advance and the decoding side can understand it, it is not necessary to transmit the motion of the camera, but if it is changed on the operation board 131 at any time, or if the human actually does the camera. If is moving, the motion of the camera finally selected as the initial vector (any of camera motion 1, camera motion 2, camera motion 3 ... In FIG. 3) is transferred to the decoding side. There is a need to.

【０１０４】実際のＭＣは、（ｄ＋ｅ）を本当の動きベ
クトルとして減算器１２７においてＭＣ誤差信号が取ら
れるが、このとき現フレーム信号として入力情報切り替
え回路１２１から取り込まれる情報は、固定カメラ１０
１から取り込まれた情報（つまり図４（ｃ）のＢの状
態）であり、上記（ｄ＋ｅ）によってｂを作成したこと
になる。なお、このとき送受信間で、動き量に対する同
一の予測を行うことができれば、動き量に対する情報を
送信する必要がないのは言うまでもない。For the actual MC, the MC error signal is taken by the subtracter 127 with (d + e) as the true motion vector, but the information taken in from the input information switching circuit 121 as the current frame signal at this time is the fixed camera 10.
It is the information taken in from 1 (that is, the state of B in FIG. 4C), and it means that b is created by the above (d + e). Needless to say, at this time, if the same prediction regarding the motion amount can be performed during transmission and reception, it is not necessary to transmit the information regarding the motion amount.

【０１０５】ＭＣ誤差信号は符号化回路１２２で圧縮符
号化され、復号側に送られると同時に、復号化器１２５
で復号された後、加算回路１２６でＭＣに用いた前フレ
ーム信号と加算され、再生信号が作られてフレームメモ
リ１２４内に格納される。The MC error signal is compression-encoded by the encoding circuit 122 and sent to the decoding side, and at the same time, the decoder 125
After being decoded in (1), it is added to the previous frame signal used for MC in the adder circuit 126, and a reproduction signal is created and stored in the frame memory 124.

【０１０６】図５は第１の実施例の他の一例の符号化側
を示した図である。複数のカメラ１６１，１６２から取
り込まれた情報を用いて、図２，図３と同様の処理が処
理装置１７１で行われているものとする。この時、実際
の動物体の動きは処理装置内で明らかになるので（前記
の例ではｂ）、動物体の動きが時間方向に相関がある場
合は、この動きに基づいて各カメラの動きを自動的に制
御することが可能になる。一例として、動き制御回路１
７２を図のように設置し、実際に測定された動きをベク
トル量子化と同様な手法でカメラの数にクラス分けし
て、それぞれの動きを各カメラに伝達する。各カメラは
動き制御回路１７２からの指示に従って動くので、動き
に時間相関があれば、図４（ｃ）における小さな動きベ
クトルｅを自動的に小さくすることが可能になる。FIG. 5 is a diagram showing the encoding side of another example of the first embodiment. It is assumed that the processing device 171 is performing the same processing as that shown in FIGS. 2 and 3 using the information captured from the plurality of cameras 161 and 162. At this time, since the actual movement of the moving object becomes clear in the processing device (b in the above example), if there is a correlation in the moving direction of the moving object, the movement of each camera is calculated based on this movement. It becomes possible to control automatically. As an example, the motion control circuit 1
72 is installed as shown in the figure, and the actually measured motion is classified into the number of cameras by a method similar to the vector quantization, and the respective motions are transmitted to each camera. Since each camera moves according to the instruction from the motion control circuit 172, if the motion has a time correlation, the small motion vector e in FIG. 4C can be automatically reduced.

【０１０７】次に、本発明に係る第２の実施例を説明す
る。図７は、本発明のメディア処理システムの原理的構
成を示す図である。この図では、使用者の画像を獲得す
るための限定入力装置２０２と、使用者側から見える環
境画像を獲得するための限定入力装置２０１とから得ら
れる２種類の画像情報を処理部２０３で処理して得られ
る画像を出力装置２０４で表示する例を示している。処
理部２０３は入力された環境画像が入力された使用者画
像の背景部分となるように２つの画像情報の合成処理を
行う。Next, a second embodiment according to the present invention will be described. FIG. 7 is a diagram showing the basic configuration of the media processing system of the present invention. In this figure, the processing unit 203 processes two types of image information obtained from the limited input device 202 for obtaining the image of the user and the limited input device 201 for obtaining the environmental image viewed from the user side. An example of displaying the image obtained by the output device 204 is shown. The processing unit 203 performs a combining process of two pieces of image information so that the input environment image becomes the background portion of the input user image.

【０１０８】合成処理の方法の例としては、使用者画像
に付帯する不要な背景画像を分離除去して得られる純粋
な使用者画像を環境画像に上書きする方法が考えられ
る。使用者画像部分の領域判定法については公知の様々
な方法があるが、ここでは説明を省略する。As an example of the synthesizing method, there is a method of overwriting the environment image with a pure user image obtained by separating and removing an unnecessary background image incidental to the user image. There are various known methods for determining the area of the user image portion, but description thereof will be omitted here.

【０１０９】次の本発明をメディア処理装置に応用した
第２の実施例の他の一例について説明する。図８は、発
明法のメディア処理装置の一例の構成を表したものであ
る。入力装置２０６は装置の前面の使用者の画像情報を
得るために備えられたカメラである。これに対し入力装
置２０５は、使用者側から見た環境画像を妨害無く獲得
できるように装置の前面と異なる向きに備えれらた固定
または可動のカメラである。ここでは説明の簡単化のた
め入力装置２０５は装置背面に備え付けられているとす
るが、使用者側から見える画像を入力するという目的を
達成できる入力装置であれば装置上部や使用者の体に装
着されていてもかまわない。Another example of the second embodiment in which the present invention is applied to a media processing device will be described below. FIG. 8 shows an example of the configuration of the media processing device of the invented method. The input device 206 is a camera provided to obtain image information of the user on the front surface of the device. On the other hand, the input device 205 is a fixed or movable camera provided in a direction different from the front surface of the device so that the environmental image viewed from the user side can be acquired without obstruction. Here, for simplification of description, it is assumed that the input device 205 is provided on the back surface of the device, but if the input device can achieve the purpose of inputting an image viewed from the user side, the input device 205 may be installed on the upper part of the device or the user's body. It doesn't matter if it is installed.

【０１１０】処理部２０７は上記環境画像を自然な形で
上記使用者画像の背景に表示できるようにするための処
理を行う。ここでは処理部２０７において背景分離除去
処理２０８、画像サイズ設定処理２０９、表示範囲設定
処理２１０及び鏡像判定処理２１１が行われる。サイズ
設定処理２０９及び範囲設定処理２１０は使用者画像を
最終的に表示するときの大きさと表示位置を設定するも
のであり、ここでは装置外部から設定できる構成として
いる。The processing unit 207 performs processing for displaying the environment image in the background of the user image in a natural form. Here, the processing unit 207 performs background separation removal processing 208, image size setting processing 209, display range setting processing 210, and mirror image determination processing 211. The size setting process 209 and the range setting process 210 are for setting the size and the display position when the user image is finally displayed. Here, the size setting process and the range setting process 210 can be set from outside the apparatus.

【０１１１】鏡像反転処理２１１は、例えば本発明のメ
ディア処理装置を通信端末として使用する場合、相手側
と環境画像の中の物体について会話をするときに、使用
者が指し示す物の方向と表示画面の中で使用者の画像が
指し示す物の方向を一致させるために特に必要となる。
このことを図９を用いて説明する。図９（ａ）では使用
者がメディア処理装置２２０の使用中に実環境（風景）
を見て、もの（この例では月）を指し示している様子を
表している。このとき装置背面のカメラ２２１が獲得す
る環境画像は例えば図９（ｂ）のようになり、装置前面
のカメラ２２２が獲得する使用者画像は図９（ｃ）のよ
うになる。この２つの画像を合成するときに、使用者が
月の方向を指し示すようにするためには図９（ｃ）の使
用者画像を鏡像反転して図９（ｂ）の環境画像と合成す
ればよい。このようにして得られる合成画像を図９
（ｄ）に示す。The mirror image inversion processing 211 is, for example, when the media processing device of the present invention is used as a communication terminal, when the user talks with the other party about an object in the environmental image, the direction and display screen of the object pointed by the user. In particular, it is necessary to match the direction of the object pointed by the user's image.
This will be described with reference to FIG. In FIG. 9A, the user is in a real environment (landscape) while using the media processing device 220.
Shows that you are pointing at something (the moon in this example). At this time, the environmental image captured by the camera 221 on the rear side of the device is as shown in FIG. 9B, and the user image captured by the camera 222 on the front side of the device is as shown in FIG. 9C. When the two images are combined, in order to allow the user to point in the direction of the moon, the user image in FIG. 9 (c) is mirror-inverted and combined with the environment image in FIG. 9 (b). Good. The composite image thus obtained is shown in FIG.
It shows in (d).

【０１１２】画像セレクタ２１２は入力装置２０５から
の環境画像、入力装置２０６からの使用者画像、処理部
２０７からの背景のない使用者画像を入力し、画面指定
情報に従って、使用者画像、背景のない使用者画像、使
用者側から見える環境画像、使用者側から見える環境画
像を使用者画像の背景に組み込んだ合成画像のいずれか
を出力装置２１３に出力する。The image selector 212 inputs the environment image from the input device 205, the user image from the input device 206, and the user image without the background from the processing unit 207, and according to the screen designation information, the user image and the background image are displayed. One of a non-user image, an environment image viewed from the user side, and a composite image in which an environment image viewed from the user side is incorporated in the background of the user image is output to the output device 213.

【０１１３】また、このとき人物を鏡像化し、環境画は
そのままとしたり、あるいは人物はそのままで、環境画
を鏡像化するようにしても良い。At this time, the person may be made into a mirror image and the environment image may be left unchanged, or the person may be left as it is and the environment image may be made into a mirror image.

【０１１４】次に本発明のメディア処理システムに応用
した第２の実施例の、さらに他の一例について説明す
る。図１０は、第２の実施例に係る他の一例のメディア
処理システムの構成を表したものである。ここでは送信
側で入力した情報を符号化して伝送し、受信側で伝送さ
れたきた情報を復号して出力する符号化復号化のシステ
ムについて説明する。Next, another example of the second embodiment applied to the media processing system of the present invention will be described. FIG. 10 shows the configuration of another example of the media processing system according to the second embodiment. Here, an encoding / decoding system will be described in which the information input on the transmitting side is encoded and transmitted, and the information transmitted on the receiving side is decoded and output.

【０１１５】まずシステムの送信側の説明をする。図１
０において入力装置２２１は音声または環境音を入力す
るマイクである。また、入力装置２２２は使用者の画像
を獲得するためのカメラである。音情報コード化部２３
０では入力した音をピッチ分析部２３１またはＬＰＣ分
析部２３２により分析して、そこから得られるピッチ周
波数情報またはＬＰＣ係数情報を符号化等の方法でコー
ド化する。音符号化部２５０は入力された音を符号化し
て出力する。画像符号化部２４０は入力された画像から
使用者の背景部分の画像を分離除去して環境情報の秘匿
化を行うと共に使用画像を符号化して出力する。多重化
部２５１は音コード情報と符号化された画像情報と符号
化された音情報とを多重化して伝送路に出力する。First, the transmission side of the system will be described. FIG.
At 0, the input device 221 is a microphone for inputting voice or environmental sound. The input device 222 is a camera for capturing an image of the user. Sound information coding unit 23
At 0, the input sound is analyzed by the pitch analysis unit 231 or the LPC analysis unit 232, and the pitch frequency information or the LPC coefficient information obtained therefrom is encoded by a method such as encoding. The sound encoding unit 250 encodes the input sound and outputs it. The image encoding unit 240 separates and removes the image of the background portion of the user from the input image to conceal the environmental information, and encodes and outputs the used image. The multiplexing unit 251 multiplexes the sound code information, the coded image information, and the coded sound information, and outputs the multiplexed sound code information to the transmission path.

【０１１６】次にシステムの受信側の説明をする。逆多
重化部２５２は受信データを逆多重化して、音コード情
報と符号化された画像情報と符号化された音情報に情報
を分離する。画像生成部２５３は音コード情報を用いて
予め設定した方法により環境画像情報を生成する。例と
して、コード情報に音声や環境音から抽出したピッチ周
波数を用いると、時間的に変化するピッチ周波数を基に
して時間的に周期化画像を生成できる。また時間的に画
像を周期化する位置を変えるとさらに複雑な環境画像を
生成することができる。さらに別な方法として、ピッチ
周波数と環境画像の色の対応関係を設定することにより
様々に色が変化する環境画像を提供できる。Next, the receiving side of the system will be described. The demultiplexing unit 252 demultiplexes the received data to separate the information into sound code information, encoded image information, and encoded sound information. The image generation unit 253 generates the environmental image information by a preset method using the sound code information. As an example, when a pitch frequency extracted from a voice or an environmental sound is used for the code information, a periodic image can be generated in time based on the pitch frequency that changes in time. In addition, a more complicated environmental image can be generated by changing the position at which the image is cyclically changed in time. As another method, it is possible to provide an environmental image in which colors are variously changed by setting the correspondence between the pitch frequency and the color of the environmental image.

【０１１７】また、別の音情報としてコード情報に抽出
したＬＰＣ係数情報を用いると、ＬＰＣ係数を基に構成
されるフィルタをかけて画質を変化させることができる
ので、時間的に様々に画質が変化する環境画像を生成す
ることが可能となる。さらに別な方法として、ＬＰＣ係
数情報と環境画像の色の対応関係を設定することによ
り、様々に色が変化する環境画像を提供できる。When the LPC coefficient information extracted as chord information is used as another sound information, the image quality can be changed by applying a filter based on the LPC coefficient, so that the image quality can be varied in time. It is possible to generate a changing environment image. As yet another method, by setting the correspondence between the LPC coefficient information and the color of the environmental image, it is possible to provide the environmental image with various colors.

【０１１８】このような方法により送信側の環境の変化
とタイミングの合った環境画像を背景に表示できるの
で、秘匿化された背景画像の単調さを大幅に軽減でき
る。また、背景の一部分に絵画の枠に当たる表示窓を表
示するとともに、この窓の中に環境画像を表示すると、
一種の芸術性を有する、いわゆるアートとして音から生
成される環境画像を効果的に表示できる。With such a method, an environment image that coincides with the change in the environment on the transmission side can be displayed on the background, so that the monotony of the concealed background image can be greatly reduced. In addition, when a display window corresponding to the frame of the painting is displayed on a part of the background and an environmental image is displayed in this window,
It is possible to effectively display an environmental image generated from sound as so-called art, which has a kind of artistry.

【０１１９】次に本発明をメディア処理システムに応用
した第２の実施例のさらに他の一例について説明する。
図１１は、第２の実施例に係るメディア処理システムの
構成を表したものであり、先に説明した実施例の変形例
である。図１１において図１０と共通の部分は共通の番
号を付けてその説明を省略する。この実施例ではマイク
２２１とは別に音コードを得るための環境音を供給する
音源または音源端子２６０を有する構成になっている。
このように音コードを得るための音源をマイクと分離す
ることにより、好みの音楽を使って変化に富んだ環境画
像を生成でき、しかも音楽によって通話を妨害すること
を防止できる。Next, still another example of the second embodiment in which the present invention is applied to the media processing system will be described.
FIG. 11 shows the configuration of the media processing system according to the second embodiment, which is a modification of the above-described embodiment. 11, the same parts as those in FIG. 10 are designated by common reference numerals and the description thereof will be omitted. In this embodiment, in addition to the microphone 221, a sound source or a sound source terminal 260 for supplying an environmental sound for obtaining a sound code is provided.
By separating the sound source for obtaining the sound code from the microphone in this way, it is possible to generate a variety of environmental images using favorite music and prevent the music from interfering with the call.

【０１２０】以下、本発明に係る第３の実施例を図面を
用いて説明する。図１３は、第３の実施例の一例を示す
図である。カメラ３０１で風景の全景、例えば樹木と花
が撮像され、全景画像データが処理装置３０３に送られ
る。同時にカメラ３０２で風景の一部、ここでは花のみ
が撮像され、その花画像データも処理装置に送られる。
全景画像データを図１４（ａ）に示す。この画像は図１
４（ｂ）に示す様な、離散的に配置される画素の集まり
で表現される。（実際は図１４（ｂ）ほど粗くはない
が、ここでは図を見やすくするために２５画素で構成し
た。）ここで、画素の密度が十分でない場合は、樹木の
葉や花の柄といった細かい部分はぼけてしまい全景画像
データには現れない。A third embodiment of the present invention will be described below with reference to the drawings. FIG. 13 is a diagram showing an example of the third embodiment. The camera 301 captures a panoramic view of the landscape, for example, trees and flowers, and the panoramic image data is sent to the processing device 303. At the same time, the camera 302 images only a part of the landscape, here flowers, and the flower image data is also sent to the processing device.
The panoramic image data is shown in FIG. This image is shown in Figure 1.
It is expressed by a group of pixels arranged discretely as shown in FIG. (Actually, it is not as rough as in FIG. 14B, but here it is composed of 25 pixels to make the diagram easier to see.) Here, if the pixel density is not sufficient, small parts such as tree leaves and flower patterns It is blurred and does not appear in the panoramic image data.

【０１２１】一方、花画像データを図１５（ａ）に示
す。このデータも図１５（ｂ）に示すように全景画像デ
ータと同じ画素数で構成されているが、撮影範囲を花に
限定したので、樹木は写っていないが、花についてはそ
の詳細な形状及び柄まではっきりと現れている。On the other hand, the flower image data is shown in FIG. This data is also composed of the same number of pixels as the panoramic image data as shown in FIG. 15B, but the shooting range is limited to flowers, so no trees are shown, but the detailed shape and The handle is clearly visible.

【０１２２】処理装置３０３では全景画像データと花画
像データが合成され、図１６（ａ）に示す合成画像デー
タが作成される。図１４（ａ）と図１５（ａ）を合成し
たものが図１６（ａ）なので、図１６（ａ）は樹木、花
とも写り、かつ、花についてはその柄まで鮮明なものに
なっている。この画素配置を図１６（ｂ）に示す。合成
画像データはディスプレイ３０４に入力される。ディス
プレイ３０４の画素数が図１６（ｃ）に示す様に十分に
多い場合は樹木の部分については画素内挿して合成画像
データに含まれていない画素のデータを補間し、花の部
分については合成画像データの高密度のデータをそのま
ま用いる。このように風景全体を概観しつつも特に花に
注目したい場合、画素数の少ない、例えば廉価なカメラ
を２つ組み合わせることにより、花の部分については高
い解像度の画像を得ることができる。In the processing device 303, the panoramic image data and the flower image data are combined to create the combined image data shown in FIG. 16 (a). Since FIG. 16 (a) is a combination of FIG. 14 (a) and FIG. 15 (a), both tree and flower are shown in FIG. 16 (a), and the pattern of the flower is clear. . This pixel arrangement is shown in FIG. The composite image data is input to the display 304. When the number of pixels of the display 304 is sufficiently large as shown in FIG. 16C, pixel parts are interpolated for the tree part to interpolate pixel data not included in the combined image data, and the flower part is combined. The high-density image data is used as it is. In this way, if one wants to pay particular attention to flowers while overviewing the entire landscape, it is possible to obtain a high-resolution image of the flower portion by combining two inexpensive cameras with a small number of pixels.

【０１２３】また、ディスプレイ３０４の代わりに画像
符号化器を用いれば、注目している部分については高い
解像度を保持しつつ、そのほかの部分については少ない
情報量で表現することができるので、全体を高い解像度
で伝送するよりも少ない符号量にまで圧縮することがで
きる。If an image encoder is used instead of the display 304, it is possible to maintain a high resolution for the part of interest and to express the other parts with a small amount of information, so that the whole part can be represented. It can be compressed to a code amount smaller than that transmitted at high resolution.

【０１２４】図１７は本第３の実施例をテレビ電話など
に用いた他の例を示すブロック図である。カメラ３１１
から画像データが修正装置３１３と処理装置３１４に送
られる。また、マイク３１２から処理装置３１３に音声
データが送られる。FIG. 17 is a block diagram showing another example in which the third embodiment is used for a videophone or the like. Camera 311
From the image data is sent to the correction device 313 and the processing device 314. Also, voice data is sent from the microphone 312 to the processing device 313.

【０１２５】修正装置３１３では、まず図１８に示す様
に画像データから、顔領域３２１を検出する。ここで、
画像データ３２０の左下角を座標（０，０）、顔領域の
左下角を座標（ｋ，ｌ）とする。また、画像データ３２
０の横幅をＭ、高さをＮ、顔領域の横幅をＫ、高さをＬ
とする。そして顔が画面の中心になるようにカメラの向
きを横方向については（（ｋ＋Ｋ）／２）−（Ｍ／
２）、縦方向については（（ｌ＋Ｌ）／２）−（Ｎ／
２）だけ動かす信号をカメラ３１１に送る。また、顔が
画面のなかで所定の大きさ（高さＪ）になるようにカメ
ラ倍率をＪ／Ｌ倍する信号をカメラ３１１に送る。カメ
ラ３１１はその指示通りに向き、倍率が修正される。In the correction device 313, first, as shown in FIG. 18, the face area 321 is detected from the image data. here,
The lower left corner of the image data 320 is coordinate (0,0), and the lower left corner of the face area is coordinate (k, l). In addition, the image data 32
0 width is M, height is N, face area width is K, height is L
And The camera is oriented so that the face is in the center of the screen ((k + K) / 2)-(M /
2), in the vertical direction ((l + L) / 2)-(N /
2) Send a signal to the camera 311 to move only. Also, a signal for multiplying the camera magnification by J / L is sent to the camera 311 so that the face has a predetermined size (height J) on the screen. The camera 311 faces as instructed and the magnification is corrected.

【０１２６】マイク３１２についてもその感度が使用者
の口の方向で高くなるように、現在のカメラ３１１とマ
イク３１２の向き及び顔領域の検出結果からマイク３１
２の向きの修正角を求めて向きを変更する。口の位置は
例えば図１８で座標（（ｋ＋Ｋ）／２，（１＋Ｌ）／
４）として決める。As for the microphone 312 as well, the microphone 31 and the microphone 312 are detected based on the current orientations of the camera 311 and the microphone 312 and the detection result of the face area so that the sensitivity becomes higher in the direction of the user's mouth.
Change the direction to find the correction angle of the direction of 2. The position of the mouth is, for example, in FIG. 18, coordinates ((k + K) / 2, (1 + L) /
Decide as 4).

【０１２７】また、ディスプレイがある場合は、マイク
の場合と同様にして図１８で座標（（ｋ＋Ｋ）／２，
（ｌ＋Ｌ）／２）の方向に向くようにする。If there is a display, the coordinates ((k + K) / 2, in FIG.
Be oriented in the direction of (l + L) / 2).

【０１２８】図１９は本第３の実施例の領域検出のため
の他の例を示すブロック図である。発光装置３３１から
は所定の色の光が前面に照射され、同時にカメラ３３２
で画像が取り込まれ、その画像データが処理装置３３４
と修正装置３３３に送られる。画像データに人の顔が写
っている場合、その眼鏡あるいは眼球など目の部分で反
射された発光装置３３１から照射された光が写る。従っ
て修正装置３３３では画像データの中から照射光の部分
を見つけその部分を目の領域として決定する。あるいは
検出する信号としては色の代わりに空間的に変化する画
像パターンや時変パターンを用いても良い。また、発光
装置３３１がディスプレイを兼ねても良い。その場合、
たまたま表示する画像の中の色やパターンを適当に選ん
で、検出する信号に用いても良い。FIG. 19 is a block diagram showing another example for area detection of the third embodiment. Light of a predetermined color is emitted from the light emitting device 331 to the front surface, and at the same time, the camera 332 is illuminated.
, The image data is captured, and the image data is processed by the processing device 334.
And sent to the correction device 333. When a person's face is included in the image data, the light emitted from the light emitting device 331 reflected by the eye portion such as the eyeglasses or the eyeballs is included. Therefore, the correction device 333 finds a portion of the irradiation light from the image data and determines the portion as the eye region. Alternatively, as the signal to be detected, an image pattern that changes spatially or a time-varying pattern may be used instead of the color. The light emitting device 331 may also serve as a display. In that case,
The color or pattern in the image to be displayed may be appropriately selected and used as a signal to be detected.

【０１２９】図２０は本第３の実施例の撮像手段をメデ
ィア処理システムから分離した実施例を示す外観図であ
る。カメラ３４２は球状の容器３４１に納められてい
る。容器３４１は接続部３４３を介して第１の支持棒３
４４と接続している。また、第１の支持棒３４４は接続
部３４６を介して第２の支持棒３４５に接続している。
接続部３４３は経度方向、接続部３４６は緯度方向に可
変とする。第２の支持棒３４５の略中間部分には可動節
が設けられ、任意の方向に屈曲することができ、通常は
半固定としておく。また、この第２の支持棒３４５には
クリップ３４７が接続されている。クリップ３４７はカ
メラ３４２を使用者の近くの壁、柱、パソコン、などに
取り付けるためのもので、このクリップ３４７に代え
て、若しくは併用して吸盤、磁石、引っかけ金具、ピ
ン、マジックテープなど環境に合わせて用いることがで
きる。FIG. 20 is an external view showing an embodiment in which the image pickup means of the third embodiment is separated from the media processing system. The camera 342 is housed in a spherical container 341. The container 341 is connected to the first support rod 3 via the connecting portion 343.
It is connected to 44. Further, the first support rod 344 is connected to the second support rod 345 via the connecting portion 346.
The connection unit 343 is variable in the longitude direction and the connection unit 346 is variable in the latitude direction. A movable joint is provided at a substantially middle portion of the second support rod 345, which can be bent in any direction, and is usually semi-fixed. A clip 347 is connected to the second support rod 345. The clip 347 is for attaching the camera 342 to a wall, a pillar, a personal computer, etc. near the user. Instead of or together with this clip 347, it can be used in environments such as suction cups, magnets, hooks, pins, and velcro. It can be used together.

【０１３０】容器３４１の中身の断面図の一例を図２１
に示す。カメラ３４２の筐体３５２は磁石などの磁性体
でできており、撮像画面を水平にしたときの底部に重り
３５３を固定する。筐体３５２の周囲には多数の電磁石
３５４が配置される枠体３５５が設けられる。撮像装置
の電源を入れると電磁石３５４にも通電され、筐体３５
２と電磁石３５４の向かい合った面でＮ極同士あるいは
Ｓ極同士が向かい合うようにしておけば筐体３５２は少
し浮上する。このとき電磁石３５４の部分は永久磁石で
も構わない。カメラ３４２には重り３５３が入っている
ので、カメラ３４２の撮像方向に拘らず、重力により撮
像画面は常に水平に保たれる。An example of a sectional view of the contents of the container 341 is shown in FIG.
Shown in The housing 352 of the camera 342 is made of a magnetic material such as a magnet, and the weight 353 is fixed to the bottom portion when the imaging screen is horizontal. A frame body 355 in which a large number of electromagnets 354 are arranged is provided around the housing 352. When the power of the imaging device is turned on, the electromagnet 354 is also energized, and the housing 35
If the N poles or the S poles face each other on the surfaces of the 2 and the electromagnet 354 that face each other, the housing 352 slightly floats. At this time, the electromagnet 354 may be a permanent magnet. Since the camera 342 includes the weight 353, the imaging screen is always kept horizontal by gravity regardless of the imaging direction of the camera 342.

【０１３１】例えば自動車を運転しながら自分の顔を撮
るためにクリップ３４７をルームミラーに取り付けたと
する。その場合カメラ３４２は一般に傾いてしまう。そ
こで接続部３４３と接続部３４６を手で調節すればカメ
ラ３４２を自分の顔の方に向けることができる。する
と、それに合わせて、電磁気的に浮いたカメラ３４２の
撮像画面は重力によって自動的に水平な画面となるので
別途水平調節をする煩わしさがない。さらに自動車が多
少揺れてもそれはカメラ３４２に伝わらず、揺れのない
良好な画像を入力できる。For example, suppose that a clip 347 is attached to a rearview mirror to take a picture of one's face while driving a car. In that case, the camera 342 generally tilts. Therefore, if the connection portion 343 and the connection portion 346 are manually adjusted, the camera 342 can be directed toward one's face. Then, in accordance therewith, the image pickup screen of the electromagnetically floating camera 342 automatically becomes a horizontal screen due to gravity, so that there is no need to separately perform horizontal adjustment. Further, even if the vehicle shakes a little, it is not transmitted to the camera 342, and a good image without shaking can be input.

【０１３２】さらに本第３の実施例の他の例を図２２に
示す。カメラ３６１、第１の支持棒３６３、第２の支持
棒３６５、第３の支持棒３６７はそれぞれ図のように接
続部３６２，３６４，３６６を介して接続されている。
この実施例では接続部３６６で経度方向、接続部３６２
で緯度方向を調節する。そして、接続部３６４は締め付
けを弱くしておき第１の支持棒３６３とカメラ３６１が
釣り下がるようにしておく。するとやはり重力によりカ
メラ３６１の撮像画面は常に水平に保たれる。FIG. 22 shows another example of the third embodiment. The camera 361, the first support rod 363, the second support rod 365, and the third support rod 367 are connected via connection portions 362, 364, 366, respectively, as shown in the figure.
In this embodiment, the connecting portion 366 is used for the longitudinal direction and the connecting portion 362 is used.
Adjust the latitude direction with. Then, the connection portion 364 is weakly tightened so that the first support rod 363 and the camera 361 can be hung down. Then, the image pickup screen of the camera 361 is always kept horizontal due to gravity.

【０１３３】次に本発明に係る第４の実施例の一例とし
てメディア入力装置を示す。本実施例は画像及び音を伝
送あるいは記録蓄積することを目的とする装置である。
本装置は、入力した画像／音のうちデータベースに蓄積
されている画像／音に置き換え可能な部分については、
そのデータベースインデックス等を伝送／蓄積すること
により伝送／蓄積に係わる情報量を大幅に削減している
ところに特徴がある。Next, a media input device will be shown as an example of the fourth embodiment according to the present invention. This embodiment is an apparatus for transmitting or recording and storing images and sounds.
In this device, regarding the part of the input image / sound that can be replaced with the image / sound stored in the database,
It is characterized in that the amount of information related to transmission / storage is significantly reduced by transmitting / storing the database index and the like.

【０１３４】図２５を参照するに、カメラ４０１及びマ
イク４０２からはそれぞれ伝送／蓄積の対象となる画像
及び音声が入力される。位置検出器４０３は本メディア
入力装置がある現在地を検出する手段である。これは地
磁気をもとに検出しても良いし、ＧＰＳ（Ｇｌｏｂａｌ
ＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ；世界的位置
決定システム）を用いても良い。温度・湿度センサ４０
４は温度、湿度、風量等の気象条件を検出する手段であ
る。計時器４０５は現在の日時を得る手段である。方位
計４０６はカメラ４０１及びマイク４０２がとらえてい
る画像／音の装置本体からの方向を検出するための方位
センサである。画像／音データベース４０９は、背景部
分のように入力画像／音の中で位置、時刻等の情報をも
とに置き換え易い画像／音が蓄積されている。この画像
／音データベース４０９には世界中のあらゆる地点の画
像／音を蓄積しておいても良いし、入力装置を用いる場
所が限定されている場合にはその限定場所内での画像／
音を蓄積しておいても良い。変形加工器４１０では必要
に応じて画像／音データベース４０９から選択された画
像／音に対して変形加工が行われる。Referring to FIG. 25, an image and a sound to be transmitted / stored are input from the camera 401 and the microphone 402, respectively. The position detector 403 is means for detecting the present location of the media input device. This may be detected based on the geomagnetism, or GPS (Global)
Positioning System (a global positioning system) may be used. Temperature / humidity sensor 40
Reference numeral 4 is a means for detecting weather conditions such as temperature, humidity, and air volume. The timer 405 is a means for obtaining the current date and time. The azimuth meter 406 is an azimuth sensor for detecting the direction of the image / sound captured by the camera 401 and the microphone 402 from the apparatus body. The image / sound database 409 stores images / sounds that are easy to replace based on information such as position and time in the input image / sound like the background portion. The image / sound database 409 may store images / sounds at any points in the world, and if the place where the input device is used is limited, the images / sounds in the limited place are stored.
Sound may be accumulated. The transformation processor 410 performs transformation processing on the image / sound selected from the image / sound database 409 as necessary.

【０１３５】画像／音データベース４０９に蓄積されて
いる画像のうちどれを用いるか、それをどのように変形
加工を行うかは背景画像／音決定器４０８で決定され
る。背景画像／音決定器４０８には位置検出器４０３、
温度・湿度センサ４０４、計時器４０５及び方位計４０
６が接続されており、ここから入力された現在地、方
位、気象条件、日時に該当するデータベースのインデッ
クスを決定する。ここで、実際の背景と高精度で一致す
るものがデータベース中にある場合はこれをそのまま用
いるが、そうでない場合はデータベース中の画像／音に
対してどのような変形を行ったらいいかが決定される。The background image / sound determiner 408 determines which of the images stored in the image / sound database 409 is to be used and how to modify the image. The background image / sound determiner 408 includes a position detector 403,
Temperature / humidity sensor 404, timer 405 and compass 40
6 is connected, and the index of the database corresponding to the current position, direction, weather condition, date and time inputted from here is determined. Here, if there is a highly accurate match in the database with the actual background, this is used as it is, but if not, it is determined what kind of transformation should be applied to the image / sound in the database. .

【０１３６】例えば、カメラが北北西の方向を向いてい
るにも係わらずデータベース中には北西と北の方向の画
像しか蓄積されていない場合にはこれら２方向の画像か
ら回転を考慮した合成を行って北北西方向の画像を作成
する。また、データベースには太陽の位置や雲等季節、
時刻、気象条件によって変化する画像は含めず、季節、
時刻、現在地、方位、気象条件をもとに太陽の位置や雲
の形状や程度（夜の場合には星の位置や輝き）、照度、
木々のそよぎ具合等を計算し、これに対応した太陽や雲
等の画像を作成してデータベース中の画像と合成しても
良い。また、野鳥が生息する場所の場合にはデータベー
ス中に蓄積されている鳥の鳴き声を背景音として選択
し、これを季節や気象条件をもとに鳴き声の大きさを変
える等の加工を行っても良い。なお、位置検出器４０
３、温度・湿度センサ４０４、計時器４０５及び方位計
４０６で検出される情報には誤差が含まれるため、実際
とは多少異なる背景画像／音が作成されてしまう可能性
がある。これを防ぐためにカメラ４０１及びマイク４０
２も背景画像／音決定器４０８に接続し、これらからの
入力画像／音をもとにデータベースインデックスの決定
及び加工、変形方法の決定に補正を加えればより精度の
高い背景画像／音をつくることができる。For example, when only the images in the northwest and north directions are stored in the database even though the camera is oriented in the north-northwest direction, a composition considering rotation from these two-direction images is performed. Go and create north-northwest images. In addition, the database contains the position of the sun, the seasons such as clouds,
Images that change according to time and weather conditions are not included, seasons,
The position of the sun, the shape and extent of the cloud (the position and brightness of the stars in the case of night), the illuminance, based on the time of day, current location, direction, and weather conditions.
It is also possible to calculate the degree of swaying of trees, create an image of the sun, clouds, etc. corresponding to this, and combine it with the image in the database. In the case of a place where wild birds inhabit, the bird's bark accumulated in the database is selected as the background sound, and this is processed by changing the bark's loudness based on the season and weather conditions. Is also good. The position detector 40
3. Since information detected by the temperature / humidity sensor 404, the timer 405, and the azimuth meter 406 includes an error, a background image / sound that is slightly different from the actual one may be created. To prevent this, the camera 401 and the microphone 40
2 is also connected to the background image / sound determiner 408, and based on the input images / sounds from these, corrections are added to the determination and processing of the database index and the determination of the transformation method to create a more accurate background image / sound. be able to.

【０１３７】カメラ４０１、マイク４０２及び変形加工
器４１０で作成された背景画像／音信号は背景抽出器４
０７に入力され、画像／音の背景分離・抽出が行われ
る。背景分離・抽出は、例えば入力画像／音のうち作成
された背景信号との残差が小さい部分を背景とし、それ
以外を非背景とすれば良い。あるいは、人物画像や人の
声のような重要な部分を非背景としそれ以外の、あまり
意味のない部分を背景として分離しても良い。The background image / sound signal created by the camera 401, the microphone 402 and the deformation processor 410 is the background extractor 4.
The image / sound background is separated and extracted. For background separation / extraction, for example, a portion of the input image / sound having a small residual difference from the created background signal may be set as the background, and the other portions may be set as non-background. Alternatively, an important part such as a person image or a human voice may be set as a non-background, and other parts that are not so significant may be separated as a background.

【０１３８】背景抽出器４０７で非背景と判定された領
域は高能率符号化を行い非背景情報として伝送／蓄積す
る。この高能率符号化は例えば画像に対しては動き補償
＋ＤＣＴ、音に対してはＡＤＰＣＭ等の符号化方式を用
いれば良い。あるいは、人物画像や非との声といった重
要と判定されて分離された物についてはこれに適した高
能率符号化方式、例えば、人の顔についてはモデルペー
スト符号化、人の声についてはＣＥＬＰ等を用いて符号
化を行う。一方、背景と判定された部分については背景
データベースのインデックス、背景の変形のパラメータ
が伝送／蓄積の対象となる。変形パラメータは変形方法
を直接的に表すパラメータ（例えば、フィルタリングを
行う場合にはそのフィルタ係数）でも良いが、これでは
情報量が多すぎる場合にはその変形を決定するもととな
った現在地、時刻、気象条件等の情報を符号化して伝送
／蓄積しても良い。これらは多重化器４１１で多重化さ
れて伝送／蓄積される。The area determined as non-background by the background extractor 407 is subjected to high efficiency coding and transmitted / stored as non-background information. For this high-efficiency encoding, for example, an encoding method such as motion compensation + DCT for an image and ADPCM for a sound may be used. Alternatively, a highly efficient coding method suitable for objects that have been determined to be important, such as a human image or non-voice, for example, model paste coding for a human face, CELP for a human voice, etc. Encode using. On the other hand, with respect to the portion determined to be the background, the index of the background database and the parameter of the background deformation are targets for transmission / storage. The transformation parameter may be a parameter that directly represents the transformation method (for example, the filter coefficient when filtering is performed), but if the amount of information is too large, the current location that is the basis for determining the transformation, Information such as time and weather conditions may be encoded and transmitted / stored. These are multiplexed by the multiplexer 411 and transmitted / stored.

【０１３９】図２６は入力画像からの背景画像の生成及
び背景／非背景分離の例を示した物である。画像／音デ
ータベース４０９から選択し、照度等の変換を行った背
景画像４２０に対して、変形加工器４１０において太陽
画像４２５、雲画像４２３、木画像４２６が合成され
る。太陽画像４２３は季節、時刻、気象条件、現在地、
方位をもとに計算された画像である。雲画像４２５は入
力画像と背景画像４２０との差分として検出しても良
く、この場合は非背景部分と同様に動き補償＋ＤＣＴ等
のジュネリックな符号化方式で符号化を行い、非背景情
報として伝送／蓄積する。あるいは、入力画像との忠実
性がそれほど要求されない場合は気象条件等をもとにし
て作成して合成しても良い。この場合、雲画像は背景に
関する情報をもとに作成可能であるため非背景情報とし
て伝送／蓄積する必要がない。木画像４２６はデータベ
ース中にそのもとが入っている物であるが、気象条件等
を考慮して揺らぎ等の変形が行われる。人物のような重
要な画像を検出する場合は人物部分４２１が分離され、
モデルペースト符号化等これに適した符号化が行われ
る。また、動物４２２のようにデータベースには蓄積さ
れておらず人物ほど重要ではないが伝送／蓄積を行った
ほうがよい画像は人物部分４２１よりは低い精度でジュ
ネリックに符号化される。FIG. 26 shows an example of generation of a background image from an input image and background / non-background separation. The sun image 425, the cloud image 423, and the tree image 426 are combined in the deformation processor 410 with the background image 420 selected from the image / sound database 409 and subjected to conversion of illuminance and the like. The solar image 423 shows the season, time, weather conditions, current location,
It is an image calculated based on the azimuth. The cloud image 425 may be detected as a difference between the input image and the background image 420. In this case, the cloud image 425 is encoded by a general encoding method such as motion compensation + DCT as in the non-background portion, and is transmitted as non-background information. /accumulate. Alternatively, when the fidelity with the input image is not required so much, the images may be created and combined based on the weather conditions. In this case, since the cloud image can be created based on the information about the background, it is not necessary to transmit / store it as non-background information. The tree image 426 is the one whose source is contained in the database, but is deformed such as fluctuation in consideration of weather conditions and the like. When detecting an important image such as a person, the person portion 421 is separated,
Encoding suitable for this, such as model paste encoding, is performed. Further, an image like the animal 422, which is not stored in the database and is not so important as the person, but which should be transmitted / stored, is generically encoded with lower accuracy than the person portion 421.

【０１４０】図２７は図２５で伝送／蓄積された情報か
ら画像／音を再生する装置のブロック図である。まず、
逆多重化器４３１で非背景情報、背景データベースイン
デックス、変形パラメータの分離が行われる。非背景情
報は非背景画像／音作成器４３４に入力され、非背景部
分の画像／音が再生される。非背景画像／音作成器４３
４における再生処理は図２５の非背景符号化器４１２に
対応した方法で行われる。これら情報を非背景画像作成
器４３２に入力され背景画像／音が再生される。画像／
音データベース４３３から背景データベースインデック
スに対応する画像／音が取り出され、これに対して変形
パラメータに対応した変形／加工が行われる。この変形
／加工は図２５の変形加工器４１０の処理と同一であ
る。再生された背景画像／音及び非背景画像／音は合成
器４３５で合成され再生画像及び再生音として出力され
る。FIG. 27 is a block diagram of an apparatus for reproducing an image / sound from the information transmitted / stored in FIG. First,
The demultiplexer 431 separates the non-background information, background database index, and deformation parameter. The non-background information is input to the non-background image / sound creator 434, and the image / sound of the non-background portion is reproduced. Non-background image / sound creator 43
The reproduction processing in No. 4 is performed by a method corresponding to the non-background encoder 412 in FIG. These pieces of information are input to the non-background image generator 432 to reproduce the background image / sound. image/
The image / sound corresponding to the background database index is extracted from the sound database 433, and the image / sound corresponding to the deformation parameter is subjected to deformation / processing. This transformation / machining is the same as the processing of the transformation processor 410 of FIG. The reproduced background image / sound and non-background image / sound are combined by the combiner 435 and output as a reproduced image and reproduced sound.

【０１４１】図２８は本発明に係る第４の実施例の他の
一例のブロック図である。本装置も入力した画像／音の
伝送／蓄積を行うことを目的とし、背景等をデータベー
ス等で置き換えることにより伝送／蓄積に係わる情報量
を削減している。また、画像／音を入力するカメラ及び
マイクが複数個あり、背景部分の置き換えのもとになる
画像／音として装置内部のデータベースだけでなく外部
のデータベース、街角カメラ／マイクや他の同様の入力
装置からの画像／音も用いる。FIG. 28 is a block diagram of another example of the fourth embodiment according to the present invention. This device also aims at transmitting / accumulating input images / sounds, and reduces the amount of information relating to transmission / accumulation by replacing the background with a database or the like. Also, there are multiple cameras and microphones for inputting images / sounds, and not only the internal database of the device but also external databases, street corner cameras / microphones and other similar inputs as images / sounds that are the basis of the background replacement. Images / sound from the device are also used.

【０１４２】カメラ４４１〜４４３及びマイク４４４〜
４４６からはそれぞれ伝送／蓄積の対象となる画像及び
音声が入力される。カメラ４４１〜４４３及びマイク４
４４〜４４６はそれぞれ全体の画像／音を入力するもの
及び中心的な人物、物体及び音声等を入力するものとい
うような役割分担がなされている。また、人物カメラ４
４１は主たる人物の全体をとらえることを目的として撮
影方向、範囲、焦点距離等を調整し、人物全体像を入力
する。また、顔カメラ４４２はその人物のうち顔部分を
とらえて入力する。全体カメラ４４３からは全景を入力
する。Cameras 441-443 and microphone 444-
Images and sounds to be transmitted / stored are input from 446, respectively. Cameras 441-443 and microphone 4
Each of 44 to 446 is assigned a role of inputting the whole image / sound and one of inputting a central person, object, voice and the like. Also, the person camera 4
Reference numeral 41 adjusts the shooting direction, range, focal length, etc. for the purpose of capturing the entire main person, and inputs the whole person image. Further, the face camera 442 captures and inputs the face portion of the person. The entire view is input from the whole camera 443.

【０１４３】このように人物カメラ４４１及び顔カメラ
４４２を全景カメラ４４３とは別に設けているため、よ
り重要な情報である人物像、顔画像は他の部分より高い
解像度を得ることが可能であり、また、後の処理を他の
部分とは異なるものとすることができる。マイク４４４
〜４４６についても役割分担がなされており、音声マイ
ク４４４は中心人物の音声を、背景マイク４４５，４４
６は背景音をとらえることを目的として設置されてい
る。音声マイク４４４は中心人物が持つハンドマイクや
身体や服の一部に装着するものとして装置本体に電波や
赤外線を使って音声を伝送するようにしても良く、指向
性を有するマイクを装置本体に装備して用いても良い。
また、マイク４４５，４４６はそれぞれ左方向と右方向
あるいは前方向と後方向といったように異なる方向の音
をとらえるようにしても良く、それぞれのマイク４４
５，４４６に２本のマイクを内蔵してステレオ音（２チ
ャンネルあるいは４チャンネル）をなすようにしても良
い。Since the person camera 441 and the face camera 442 are provided separately from the panoramic camera 443 as described above, the person image and the face image, which are more important information, can have higher resolution than other portions. Also, subsequent processing can be different from other parts. Microphone 444
~ 446 are also divided into roles, and the voice microphone 444 outputs the voice of the central person to the background microphones 445, 44.
6 is installed for the purpose of capturing the background sound. The voice microphone 444 may be a handheld microphone held by a central person, or may be attached to a part of the body or clothes to transmit voice to the device body using radio waves or infrared rays. You may equip it and use it.
Further, the microphones 445 and 446 may respectively capture sounds in different directions such as leftward and rightward directions or forward and backward directions.
Two microphones may be built in 5,446 to make a stereo sound (2 channels or 4 channels).

【０１４４】現在地検出器４４７、気象センサ４４８、
計時器４４９、方位計４５０はそれぞれ図２５に示す位
置検出器４０３、温度・湿度センサ４０４、計時器４０
５及び方位計４０６と同様の働きをするものであり、装
置の現在地、気象条件、現在の日時、及び入力対象の装
置からみた方向を検出する手段である。画像／音データ
ベース４５３は入力画像／音の一部置き換えの素となる
画像／音のデータベースである。Current location detector 447, weather sensor 448,
The timer 449 and the compass 450 are respectively a position detector 403, a temperature / humidity sensor 404, and a timer 40 shown in FIG.
5 and azimuth meter 406, and is means for detecting the current location of the device, weather conditions, the current date and time, and the direction viewed from the device to be input. The image / sound database 453 is a database of images / sounds that are the basis of partial replacement of input images / sounds.

【０１４５】位置検出器４４７、温度・湿度センサ４４
８、計時器４４９及び方位計４５０及び全体カメラ４４
３、背景マイク４４５，４４６は背景画像／音作成器４
５２に接続されており、ここで背景画像／音が作成され
るが、図２５に示す実施例と異なるのは背景画像／音の
素として装置内のデータベース４５３だけでなく、必要
に応じて他のデータベース４６３ａ、街角カメラ／マイ
ク４６４ａ及び他の同様の入力装置４６２からの画像／
音を用いているところである。外部データベース、街角
カメラ／マイク、他の入力装置はそれぞれ複数づつあっ
てもよく、それぞれ有線あるいは無線のネットワーク４
６１を介して接続されている。Position detector 447, temperature / humidity sensor 44
8, timer 449, azimuth meter 450, and overall camera 44
3. Background microphones 445 and 446 are background image / sound creator 4.
A background image / sound is created here, but the difference from the embodiment shown in FIG. 25 is that not only the database 453 in the device as the background image / sound element, but also other data as necessary. Database 463a, street corner camera / microphone 464a and images from other similar input devices 462 /
I am using sound. There may be multiple external databases, street corner cameras / microphones, and other input devices, each of which is a wired or wireless network 4
It is connected via 61.

【０１４６】背景画像／音作成器４５２でどのような背
景画像作成が行われるかを入力画像の図２９を用いて説
明する。図２９を全景カメラ４４３からの入力とすると
ここで背景画像となるのは人物画像４７１ａ以外の部分
である。このうち、空、建物のように現在地、カメラが
向いている方向、日時、気象条件をもとにしてデータベ
ースアクセスにより画像を得ることが可能な部分につい
てはデータベース画像による置き換えを行う。ここで、
もし装置内部のデータベース４５３に対象の画像が見つ
からない場合には外部のデータベース４６３ａの画像を
用いる。What kind of background image is created by the background image / sound creator 452 will be described with reference to FIG. 29 of the input image. When FIG. 29 is input from the panoramic camera 443, the background image here is a portion other than the person image 471a. Of these, the portions such as the sky and the building where the image can be obtained by database access based on the current location, the direction in which the camera is facing, the date and time, and the weather conditions are replaced with the database image. here,
If the target image is not found in the internal database 453 of the apparatus, the image of the external database 463a is used.

【０１４７】データベースから取り出された画像は気象
条件、時刻等の条件によって補正が行われる。例えば、
日中の快晴時には空を青空にして全体の照度を明るく
し、建物の影をそれに適した方向、濃度でつけ、曇や雨
の場合には全体の照度を暗くして影も薄くする。太陽画
像は気象条件、季節、時刻によって計算し合成する。以
上の補正や合成の方法を決定するに際してはカメラ４４
３からの入力も参照してより高精度の背景画が作成され
るようにしても良い。The images retrieved from the database are corrected according to the conditions such as weather conditions and time. For example,
When it is fine during the daytime, the sky is made blue and the overall illuminance is made brighter, and the shadow of the building is attached in a direction and concentration suitable for it, and when it is cloudy or rainy, the overall illuminance is made dark and the shadows are made light. The solar image is calculated and composed according to the weather conditions, seasons, and time. The camera 44 is used when determining the above correction and combining methods.
A higher-precision background image may be created by also referring to the input from 3.

【０１４８】背景中の動く車のようなものはデータベー
ス中には無いものである。このような部分を精度良くあ
らわす場合は街角カメラ４６４からの入力画像を用いれ
ば良い。街角カメラがとらえている範囲が領域４７４の
破線で囲んだ部分である。この部分は街角カメラ／マイ
ク装置４６４で必要により高能率符号化が行われ、ネッ
トワークを介して伝送されてくる。高能率符号化が行わ
れた場合にはそれを復号して画像として再生する処理が
行われた後、領域４７４の部分をその画像で置き換え
る。あるいは、車のような部分にはそれほどの精度は必
要とされず大体の交通量がわかれば良い程度であれば、
街角カメラからの画像で置き換えることはせず、車種や
交通量パラメータのみを伝送／蓄積して再生時にこれに
応じて適量の車画像を発生させて合成するようにすれば
この部分に係わる情報量は大幅に削減することができ
る。Things like moving cars in the background are not in the database. To accurately represent such a portion, an input image from the street corner camera 464 may be used. The area captured by the street corner camera is the area enclosed by the broken line in the area 474. This portion is subjected to high-efficiency coding if necessary by the street corner camera / microphone device 464 and transmitted through the network. When the high-efficiency encoding is performed, it is decoded and reproduced as an image, and then the area 474 is replaced with the image. Or, if you don't need that much precision for a car-like part and you only need to know the approximate traffic volume,
If the image from the street corner camera is not replaced, only the vehicle type and traffic volume parameters are transmitted / stored, and an appropriate amount of vehicle image is generated and synthesized according to this when reproducing. Can be significantly reduced.

【０１４９】背景中の人物４７１ｂはデータベースには
含まれておらず、街角カメラでもとらえられていない
が、必要に応じて高精度の画像を得たいとする。この場
合、他の入力装置４６２がこの人物を高精度でとらえて
いればそこからの画像を入力して用いればよい。他の入
力装置で人物４７１ｂ部分の画像の伝送に際して高能率
符号化が行われている場合にはこれを復号する処理を行
い再生画像を背景に合成する。なお、街角カメラや他の
入力装置からの入力画像は、全体カメラ４４３がとらえ
ている画像とは撮像方向が異なったり、撮影範囲が必要
とするものより広くなっていることがある。このような
場合は撮像方向の変換と必要な部分の切り出し処理を行
って背景への合成を行う。撮影方向の変換は例えば画像
を３次元モデルにモデル化し、本装置と街角カメラや他
の装置それぞれの現在地、撮影方向を考慮して視点を変
えた時の画像を得るようにすれば良い。The person 471b in the background is not included in the database and is not captured by the street corner camera, but wants to obtain a highly accurate image as necessary. In this case, if another input device 462 captures this person with high accuracy, an image from that person may be input and used. If high-efficiency coding is being performed when the image of the person 471b is transmitted by another input device, a process of decoding this is performed and the reproduced image is combined with the background. The input image from the street corner camera or another input device may have a different image capturing direction from the image captured by the overall camera 443, or the image capturing range may be wider than necessary. In such a case, the image pickup direction is converted and the necessary portion is cut out to synthesize it with the background. The conversion of the shooting direction may be performed, for example, by modeling the image into a three-dimensional model and obtaining an image when the viewpoint is changed in consideration of the current location and the shooting direction of the present device, the street corner camera and other devices.

【０１５０】背景画像／音作成器４５２で作成された背
景画像／音は非背景画像／音抽出器４５１に送られ、背
景／非背景分離が行われる。ここでの画像に対する処理
を図２９の画像を例として説明する。中心となる人物４
７１ａについては人物カメラ４４１、顔カメラ４４２で
も撮像されている。これらからの入力を図２９中の破線
で囲んだ領域４７２及び領域４７３部分とする。The background image / sound created by the background image / sound creator 452 is sent to the non-background image / sound extractor 451 for background / non-background separation. The processing on the image here will be described by taking the image of FIG. 29 as an example. Central person 4
71a is also imaged by the person camera 441 and the face camera 442. Inputs from these are defined as a region 472 and a region 473 surrounded by broken lines in FIG.

【０１５１】この中には人物、顔以外にこの周囲の背景
部分も含まれているため、この背景部分は取り除かれ
る。この分離は、例えば、カメラからの入力と、生成さ
れた背景画像の領域４７２及び領域４７３内の部分が高
い精度でマッチングする部分を背景として取り除くよう
にすれば良い。このようにして切り出された人物、顔画
像と全景カメラ４４３からの入力マッチングを取ること
により中心人物４７１ａとそれ以外の部分に分離するこ
とができる。人物以外と判定された部分は全景カメラ４
４３からの入力と作成された背景画像の間でマッチング
が取られ、高い精度でマッチングした部分がまず背景と
判定される。Since the background portion around this is included in addition to the person and the face, this background portion is removed. This separation may be performed, for example, by removing, as a background, a portion where the input from the camera and the portions in the regions 472 and 473 of the generated background image match with high accuracy. By input matching from the person and face images cut out in this way and the panoramic camera 443, the central person 471a and other portions can be separated. The panorama camera 4 is the part that is determined to be other than a person.
Matching is performed between the input from 43 and the created background image, and the highly matched portion is first determined as the background.

【０１５２】次に、マッチングで誤差が大きかった部分
についてはその部分が重要な情報か否かが判定される。
例えば、図２９中に示す飛行船はデータベースに存在せ
ず、街角カメラや他の入力装置でもとらえられ無かった
ため背景画像には入っておらず、かつ、状況を表すには
好適な画像であるため、非背景と判定した後に符号化処
理を行う。重要でないノイズ成分等は非背景とは判定せ
ず、符号化処理も行わない。Next, with respect to the portion having a large error in matching, it is judged whether or not the portion is important information.
For example, the airship shown in FIG. 29 does not exist in the database, is not included in the background image because it was not captured by the street corner camera or other input device, and is an image suitable for representing the situation, Encoding processing is performed after it is determined that the background is not present. Non-important noise components and the like are not judged to be non-background, and no encoding processing is performed.

【０１５３】一方、音については、音声マイク４４４か
らの入力を音声信号として、背景マイク４４５，４４６
からの入力を背景音とし、背景音についてはデータベー
スや街角マイクの音による置き換えを行えば良い。ただ
し、音声マイクからの入力にも背景音が含まれている場
合があり、後の符号化等の処理に不都合な場合がある。
この場合は、音声マイク入力中に含まれる背景音の除去
処理を非背景画像／音抽出器４５１で行う。この方法は
例えば有音／無音判定を行って無音時には音を消去して
符号化等の処理を行わないようにしても良いし、背景マ
イクからの入力を参照信号とする学習同定法、ＬＭＳア
ルゴリズム、ＲＭＳアルゴリズム等を用いた適応ノイズ
キャンセラにより背景音除去を行っても良い。On the other hand, regarding the sound, the input from the voice microphone 444 is used as a voice signal and the background microphones 445 and 446 are used.
The input from is used as the background sound, and the background sound may be replaced by the sound of the database or the street corner microphone. However, the background sound may also be included in the input from the voice microphone, which may be inconvenient for the subsequent processing such as encoding.
In this case, the non-background image / sound extractor 451 performs the background sound removal processing included in the voice microphone input. In this method, for example, the presence / absence of sound may be determined, and when there is no sound, the sound may be erased and the processing such as encoding may not be performed, or the learning identification method using the input from the background microphone as a reference signal, the LMS algorithm. Background noise may be removed by an adaptive noise canceller using the RMS algorithm or the like.

【０１５４】非背景画像／音抽出器４５１で非背景と判
定された画像／音は符号化器４５４に送られて符号化さ
れる。中心人物や顔の画像あるいは音声はこれに適した
符号化方式、例えば、画像はモデルペースト符号化、音
声はＣＥＬＰ等を用いて符号化すれば良い。一方、図２
９中の飛行船のように背景としては作成されなかったが
ある程度の精度で伝えたいものはジュネリックな符号化
を用いて中心人物よりは低い精度で符号化する。The image / sound determined to be non-background by the non-background image / sound extractor 451 is sent to the encoder 454 and is encoded. Images or sounds of the central person or face may be coded using a coding method suitable for this, for example, images may be coded using model paste coding and sounds may be coded using CELP or the like. On the other hand, FIG.
Things such as the airship in 9 that were not created as a background but are to be transmitted with a certain degree of accuracy are encoded with a lower degree of accuracy than the central person using general encoding.

【０１５５】４５５はネットワーク及び蓄積媒体とのイ
ンタフェースである。ここでは、背景画像／音作成器４
５２からの要求に応じて外部のデータベース、街角カメ
ラ及び他の入力装置との間のアクセスが行われ、得られ
たデータを背景画像／音作成器４５２に送り返す。ま
た、背景画像／音作成器４５２から出力された背景画像
／音に関する情報と符号化器４５４から出力された非背
景画像／音の符号化情報を多重化し、蓄積媒体４５６、
外部蓄積装置４６５、あるいは後述する外部の再生装置
へ送り出す。Reference numeral 455 is an interface with a network and a storage medium. Here, the background image / sound creator 4
In response to a request from 52, an external database, a street corner camera, and other input devices are accessed, and the obtained data is sent back to the background image / sound creator 452. Further, the information about the background image / sound output from the background image / sound creator 452 and the encoded information about the non-background image / sound output from the encoder 454 are multiplexed, and the storage medium 456,
The data is sent to the external storage device 465 or an external playback device described later.

【０１５６】多重化に際しては、再生のために必要な最
小限の情報のみが選択される。例えば、再生装置がネッ
トワーク４６６に接続されており、データベース４６
３、街角カメラ／マイク４６４、他の入力装置４６２に
もアクセス可能である場合は、これらの外部装置から得
られる情報についてはその情報は伝送する必要がない。
例えば図２９の画像について説明すれば、空、建物、太
陽については入力装置４６０の現在地や撮影方向、気象
条件、日時等がわかれば再生装置内部のデータベースや
外部データベース４８２の画像を素に変形、補間、合成
を行って作成することが可能であるため、現在地、撮影
方向、気象条件、日時等の情報のみを伝送すれば良い。
ただし、再生がリアルタイムで行われる場合には日時は
入力装置と同一（あるいは時差がある場合でも補正可
能）であるため伝送の必要がない。さらには入力装置と
再生装置の位置が比較的近い場合には気象条件もそれほ
ど大きな差がないと考えられるため必要によって伝送し
ないようにしても良い。なお、現在地、撮影方向情報は
カメラが固定の場合には伝送の最初に１度だけ伝送する
ようにしても良く、また、カメラの動く範囲が狭い場合
には最初に初期現在地、撮影方向を伝送してその後はそ
の修正情報を伝送するようにしても良い。次に、破線４
７４内の部分、背景内の人物４７１ｂは、リアルタイム
再生を行う場合には、街角カメラ４６３や他の入力装置
４６２から得ることができるので、どの街角カメラ、入
力装置から得た情報をどのように変形、切り出しを行っ
たかを表す情報のみを伝送するようにしても良い。一
方、蓄積系等非リアルタイム再生を行う場合には、街角
カメラ、他の入力装置からのリアルタイムな情報を得る
ことができない。街角カメラ、他の入力装置内に蓄積手
段が付属されており、再生装置での背景画像生成のため
に必要な情報をその蓄積手段中に記録しておくことが可
能な場合には、入力装置４６０は街角カメラ、入力装置
に対して、必要な情報を記録しておき再生装置からの要
求に応じてその情報を提供することを要求する。街角カ
メラ、他の入力装置が情報を蓄積することが不可能な場
合には、外部蓄積装置４８１へ情報を蓄積しておいても
良い。この場合には街角カメラ、他の入力装置から外部
蓄積装置への情報伝送を行うだけで良く、入力装置４６
０から送り出される情報量は増加しない。それも不可能
な場合には、インタフェース４５５での多重化の際に領
域４７４，人物４７１ｂ部分の画像情報も多重化してお
く。なお、中心人物４７１ａや飛行船のような非背景情
報については他の装置から情報を得ることができないた
め、伝送／蓄積する情報中に必ず含めるようにする。Upon multiplexing, only the minimum information necessary for reproduction is selected. For example, a playback device is connected to the network 466 and the database 46
3. If the street corner camera / microphone 464 and the other input device 462 are also accessible, it is not necessary to transmit the information obtained from these external devices.
For example, referring to the image of FIG. 29, for the sky, the building, and the sun, if the current location of the input device 460, the shooting direction, the weather conditions, the date and time, etc. are known, the images in the internal database of the playback device and the external database 482 are transformed, Since it can be created by performing interpolation and composition, only information such as the current location, shooting direction, weather conditions, date and time, etc. need to be transmitted.
However, when the reproduction is performed in real time, the date and time is the same as that of the input device (or can be corrected even when there is a time difference), and therefore transmission is not necessary. Further, when the positions of the input device and the reproducing device are relatively close to each other, it is considered that the weather conditions are not so different from each other. If the camera is fixed, the current location and shooting direction information may be transmitted only once at the beginning of transmission, or if the camera movement range is narrow, the initial current location and shooting direction are transmitted first. After that, the correction information may be transmitted. Next, broken line 4
A part in 74 and a person 471b in the background can be obtained from the street corner camera 463 or another input device 462 in the case of performing real-time reproduction. You may make it transmit only the information showing whether the deformation | transformation and clipping were performed. On the other hand, when performing non-real time reproduction such as a storage system, real time information cannot be obtained from a street corner camera or other input device. If the storage means is attached to the street corner camera or other input device and it is possible to record the information necessary for the background image generation in the playback device in the storage device, the input device Reference numeral 460 requests the street corner camera and the input device to record necessary information and provide the information in response to a request from the reproducing device. If it is impossible for the street corner camera and other input devices to store information, the information may be stored in the external storage device 481. In this case, it is only necessary to transmit information from the street corner camera or other input device to the external storage device.
The amount of information sent from 0 does not increase. If that is also impossible, the image information of the area 474 and the person 471b portion is also multiplexed at the time of multiplexing by the interface 455. It should be noted that non-background information such as the central person 471a and the airship cannot be obtained from other devices, and therefore must be included in the information to be transmitted / stored.

【０１５７】さらに、インタフェース４５５は他の入力
装置における背景の一部画像／音作成の素となる入力画
像を提供するためのインタフェースの役割を持つ。例え
ば、他の入力装置Ａにおいて人物４７１ａの画像を提供
することを要求されている場合、非背景情報のうち人物
画像に関する符号化情報を要求があった入力装置Ａに伝
送する。ここで、画像／音声情報を後述の画像／音声再
生装置にも伝送しており、この情報列中に入力装置Ａが
要求している人物画像も含まれている場合には、入力装
置Ａのために別個に人物画像情報を伝送する必要はな
く、再生装置に対する情報列をネットワークから取り込
みこのうち人物画像情報のみを取り出して用いるように
入力装置Ａに要求しても良く、この場合は本入力装置４
６０からの伝送情報量を新たに増やす必要が無くなる。
また、ネットワーク全体に流れている総情報量や入力装
置４６０に割り当てられている情報量、あるいは入力装
置４６０の許容処理量等から考えて、入力装置Ａからの
情報提供要求に答えることが不可能であると判断される
場合は、この要求を拒否しても良いし、入力装置Ａに伝
送する画像の品質を落とす（すなわち情報提供に係わる
情報量を削減する）ことを要求しても良い。Further, the interface 455 has a role of an interface for providing a partial image of the background in another input device / an input image which is a source of sound creation. For example, when the other input device A is requested to provide the image of the person 471a, the encoded information regarding the human image among the non-background information is transmitted to the input device A that has made the request. Here, the image / sound information is also transmitted to the image / sound reproduction device described later, and if the person image requested by the input device A is also included in this information sequence, the input device A Therefore, it is not necessary to separately transmit the personal image information, and the input device A may be requested to retrieve the information string for the reproducing device from the network and use only the personal image information out of the information sequence. Device 4
There is no need to newly increase the amount of information transmitted from 60.
Further, it is impossible to answer the information provision request from the input device A considering the total amount of information flowing in the entire network, the amount of information allocated to the input device 460, the allowable processing amount of the input device 460, and the like. If it is determined that this is the case, this request may be rejected, or it may be requested to reduce the quality of the image transmitted to the input device A (that is, reduce the amount of information related to information provision).

【０１５８】図２０は入力された画像／音を再生する装
置のブロック図である。再生に必要な情報はネットワー
ク４６６または蓄積媒体４５６あるいはその両方を介し
て再生装置４９０に提供される。４９１はネットワーク
４６６及び蓄積媒体４５６とのインタフェースであり、
外部データベース４６３、街角カメラ／マイク４６４、
他の入力装置４６２、外部蓄積装置４６５に必要な情報
を提供するよう要求し、蓄積媒体４５６から、あるい
は、ネットワークを介して入力装置４６０、外部データ
ベース４６３、街角カメラ／マイク４６４、他の入力装
置４８５、外部蓄積装置４６５から画像／音情報を入力
し、背景情報と非背景情報に分解し、それぞれ背景画像
／音作成器４９２、非背景画像／音再生器４９４へ出力
する。FIG. 20 is a block diagram of an apparatus for reproducing the input image / sound. Information necessary for reproduction is provided to the reproduction device 490 via the network 466 and / or the storage medium 456 or both. 491 is an interface with the network 466 and the storage medium 456,
External database 463, street corner camera / microphone 464,
The other input device 462 and the external storage device 465 are requested to provide necessary information, and the input device 460, the external database 463, the street corner camera / microphone 464, and the other input device are requested from the storage medium 456 or via the network. The image / sound information is input from the external storage device 465, decomposed into background information and non-background information, and output to the background image / sound creator 492 and the non-background image / sound reproducer 494, respectively.

【０１５９】非背景画像／音再生器では非背景部分の画
像／音の再生が行われる。非背景情報は高能率符号化さ
れているため、これを復号する処理が行われる。これ
は、図２８の入力装置４６０の符号化器４５４に対応す
る復号処理である。The non-background image / sound reproducer reproduces the image / sound of the non-background portion. Since the non-background information is high-efficiency coded, a process for decoding this is performed. This is a decoding process corresponding to the encoder 454 of the input device 460 of FIG.

【０１６０】背景画像／音作成器４９２では背景画像／
音が作成される。図２９の画像の例で説明すると、空、
建物、太陽のように、入力装置４６０で内部データベー
ス４５３の画像を素に季節、日時、気象条件に応じた変
形を行って作成した部分は、再生装置内のデータベース
４９３から該当する画像を選択しそれを変形して背景画
像とする。In the background image / sound creator 492, the background image /
The sound is created. In the example of the image in FIG. 29, the sky is
For a part such as a building or the sun, which is created by modifying the image of the internal database 453 with the input device 460 based on the season, date and time, and weather conditions, select the corresponding image from the database 493 in the playback device. It is transformed into a background image.

【０１６１】ただし、入力装置のデータベース４５３と
再生装置のデータベース４９３が異なり、再生装置内の
データベース４９３中に選択しようとする画像が存在し
ない場合には、再生装置内のデータベース４９３中の類
似の画像を用いるか、外部データベース４６３をアクセ
スして入力装置のデータベース４５３中の画像と同一か
類似の画像を取り出し、それを適宜変形して背景画像を
作成する。例えば、空は現在地が少し異なってもあまり
大きな差異は無い（例えば日本国内ならば空はほぼ同じ
と考えられる）ので、類似の地域の空画像をデータベー
スから選択し、必要に応じてそれを変形して用いれば良
い。However, when the database 453 of the input device and the database 493 of the playback device are different and the image to be selected does not exist in the database 493 in the playback device, similar images in the database 493 in the playback device are displayed. Or the external database 463 is accessed to extract an image that is the same as or similar to the image in the database 453 of the input device, and is appropriately modified to create a background image. For example, the sky does not make a big difference even if the current location is slightly different (for example, the sky is considered to be almost the same in Japan), so select a sky image of a similar area from the database and transform it if necessary. And use it.

【０１６２】建物はデータベースに画像が存在するが撮
影方向が少し異なるような場合はこれを補正する変形を
行って画像を作成すれば良い。あるいは、建物が存在す
ることには意味があるが、その建物の形そのものはそれ
ほど重要でないと判断されるときには、適当な建物画像
をデータベースから取り出して当てはめても構わない。
動く車や背景内の人物４７１ｂはそれぞれ街角カメラ及
び他の入力装置から入力された画像である。この部分に
ついてはそれぞれ該当する街角カメラ及び入力装置に情
報を伝送してもらうように要求し、伝送された情報をも
とに該部分の画像を作成する。If an image of the building exists in the database but the shooting direction is slightly different, the image may be created by performing a modification to correct this. Alternatively, when it is determined that the building exists, but the shape of the building itself is not so important, an appropriate building image may be extracted from the database and applied.
A moving car and a person 471b in the background are images input from a street corner camera and another input device, respectively. For this portion, the corresponding street corner camera and input device are requested to transmit information, and an image of the portion is created based on the transmitted information.

【０１６３】ただし、再生が非リアルタイム的に行われ
る場合には、入力装置４６０で背景画像作成を行った時
刻と再生時刻に街角カメラや他の入力装置から入力され
ている画像は異なるため、それをそのまま用いることは
できない。この場合、前述のように街角カメラ及び他の
入力装置中に蓄積媒体がありそこに入力装置４６０で背
景作成を行った時刻の画像情報が蓄積されている場合に
は、その画像情報を伝送するよう要求して背景作成を行
う。あるいは、入力装置に接続されている蓄積媒体４５
６や外部蓄積装置４６５に該当する画像情報が蓄積され
ている場合にはその画像情報を伝送してもらえば良い。
あるいは、入力装置４６０からの伝送情報中に該当部分
の画像情報がジュネリックな符号化列として含まれてい
る場合にはそれを素に復号処理を行い画像を得ることが
できる。However, when the reproduction is performed in non-real time, the time when the background image is created by the input device 460 is different from the image input from the street corner camera or the other input device at the reproduction time. Cannot be used as is. In this case, if there is a storage medium in the street corner camera and the other input device as described above and the image information at the time when the background is created by the input device 460 is stored therein, the image information is transmitted. To create the background. Alternatively, the storage medium 45 connected to the input device
6 and the external storage device 465 have stored corresponding image information, the image information may be transmitted.
Alternatively, when the transmission information from the input device 460 includes the image information of the corresponding portion as a generic coded sequence, the image can be obtained by performing a decoding process based on it.

【０１６４】これら、いずれにも画像情報が蓄積されて
いない場合には、データベース中の画像、あるいは、街
角カメラ、他の入力装置から現在入力されている画像か
ら類似の画像を探して背景画像を作成する。例えば、街
角カメラの画像を用いた領域４７４内の部分について
は、同一の街角カメラからの入力画像を用いれば良い。
この際、入力装置４６０での画像入力時と再生時刻での
時刻、季節、気象条件等の相違を補正するため、照度、
通行する車の数等について変形加工を行って背景画像と
して用いても良い。If image information is not stored in any of these, a similar image is searched for from the image in the database, or the image currently input from the street corner camera or other input device, and the background image is set. create. For example, for the portion in the region 474 using the image of the street corner camera, the input image from the same street corner camera may be used.
At this time, in order to correct the difference in time, season, weather conditions, etc. between the time of inputting an image with the input device 460 and the reproduction time, the illuminance,
The number of vehicles passing through may be modified and used as a background image.

【０１６５】非背景画像／音作成器４９４で再生された
非背景画像／音と背景画像作成器４９２で作成された背
景画像／音は合成器４９５で合成される。非背景情報に
はそれぞれ全景中のどの位置にあるものかを示す情報が
含まれておりそれをもとにもとの入力画像と同一の位置
に非背景画像の合成が行われる。A non-background image / sound reproduced by the non-background image / sound creator 494 and a background image / sound created by the background image creator 492 are combined by a combiner 495. The non-background information includes information indicating at which position in the whole view the non-background image is composed at the same position as the original input image based on the information.

【０１６６】なお、合成器４９５における合成にあたっ
ては、全景画像／音を忠実に再現するだけでなく、必要
に応じて一部分を高精細に取り出す等の処理が可能であ
る。例えば、背景画像／音は必要なく中心人物の画像／
音のみが必要な場合には中心人物の画像と音声情報のみ
を再生して取り出せば良い。あるいは、その中でも顔画
像のみを取り出して再生することも可能である。また、
音については音声信号と背景音の混合の強度を好みに応
じて変えることも可能であり、例えば、音声をはっきり
と聞きたい場合には音声信号を大きくすれば良い。[0166] Note that, when synthesizing in the synthesizer 495, not only faithful reproduction of the whole-view image / sound, but also processing such as extraction of a part with high precision can be performed if necessary. For example, the background image / the image of the central person without the sound /
If only the sound is required, only the image of the central person and the audio information may be reproduced and taken out. Alternatively, it is also possible to retrieve and reproduce only the face image among them. Also,
With regard to sound, it is possible to change the intensity of the mixture of the voice signal and the background sound according to preference. For example, when it is desired to hear the voice clearly, the voice signal may be increased.

【０１６７】また、有音／無音判定情報に基づき有音時
には背景音を小さくし無音時には背景音を大きくすると
いう適応処理を行うことにより中心人物の音声も分かり
やすく、かつ、背景の状況も把握し易くなる。また、背
景部分についても全景でなく中心人物の周囲のみのよう
に一部分を切り出して用いても良い。あるいは、逆にも
っと広範囲の背景情報が必要な場合には、伝送／蓄積さ
れた現在地、時刻、気象条件等を用い、データベースや
街角カメラから該当する画像を探して変形を行って大き
な背景画像を作成することが可能である。また、背景部
分についてさらに高精細な画像／音が必要な場合にその
部分をクローズアップすることも可能である。Also, by performing adaptive processing that the background sound is reduced when there is sound and the background sound is increased when there is no sound based on the presence / absence determination information, the voice of the central person can be easily understood and the background situation can be grasped. Easier to do. Further, the background portion may be cut out and used not only in the whole view but only around the central person. On the other hand, when a wider range of background information is required, on the other hand, using the transmitted / accumulated current location, time, weather conditions, etc., the database or street corner camera is searched for the corresponding image and transformed to create a large background image. It is possible to create. Further, when a higher-definition image / sound is required for the background portion, it is possible to close up the portion.

【０１６８】例えば、図２９の背景中の人物の精細な画
像及び会話の内容を得たい場合、その人物画像を取り込
んだ他の入力装置に該当する人物画像及び音声の情報を
伝送してもらうよう要求し、その情報をもとに画像、音
声の再生を行えば良い。For example, when it is desired to obtain a detailed image of a person in the background of FIG. 29 and the content of conversation, ask the other input device that has captured the person image to transmit the information of the corresponding person image and voice. It is only necessary to request and reproduce the image and sound based on the information.

【０１６９】図２７及び図３０の入力装置においては、
背景と非背景の情報が分離されているため、必要に応じ
て背景部分を他の画像／音に置き換えたり、時刻、気象
条件等のパラメータを変化させることにより同一場所の
異なる条件の背景画像／音に置き換えて画像／音を合成
することが容易に行える。In the input device of FIGS. 27 and 30,
Since the background and non-background information are separated, if necessary, the background part can be replaced by another image / sound, or parameters such as time and weather conditions can be changed to change the background image / It is possible to easily replace the sound and synthesize the image / sound.

【０１７０】図２５と図２７及び図２８と図３０では入
力装置と再生装置が分離して存在する例について述べた
が、入力装置と再生装置は同一きょう体あるいは接続し
て用いても構わない。この場合、データベースや蓄積媒
体、ネットワークインタフェース等の共用化が可能とな
る。また、他の実施例において他の入力装置４６２と再
生装置４９０が接続されている場合には、そこから取り
込んだ画像４７１ｂはネットワークを介して伝送する必
要は無い。In FIGS. 25 and 27 and FIGS. 28 and 30, an example in which the input device and the reproducing device exist separately has been described, but the input device and the reproducing device may be used in the same housing or connected. . In this case, the database, storage medium, network interface, etc. can be shared. Further, in another embodiment, when another input device 462 and the reproducing device 490 are connected, the image 471b captured from that is not required to be transmitted via the network.

【０１７１】次に、本発明に係る第５の実施例を図面に
従い説明する。図３３は同実施例を表すブロック図であ
る。この例では、３個のカメラからなる構成をとり、情
報源は人間の顔情報とする。図３３において、情報源５
１１はカメラＡ５１２ａ、カメラＢ５１２ｂ、カメラＣ
５１２ｃを介してそれぞれ異なる限定された情報として
入力される。この実施例では、カメラＡ５１２ａは右
目、カメラＢ５１２ｂは左目、カメラＣ５１２ｃは口情
報を得るのに適当な位置に設置されているとする。Next, a fifth embodiment according to the present invention will be described with reference to the drawings. FIG. 33 is a block diagram showing the same embodiment. In this example, the configuration is made up of three cameras, and the information source is human face information. In FIG. 33, the information source 5
11 is a camera A 512a, a camera B 512b, and a camera C
It is inputted as different limited information via 512c. In this embodiment, it is assumed that the camera A 512a is installed in the right eye, the camera B 512b is installed in the left eye, and the camera C 512c is installed in an appropriate position for obtaining mouth information.

【０１７２】カメラＡ５１２ａを介して、右目を含む画
像として右目情報を得る。該右目情報は右目状態獲得手
段５１３ａの入力となる。右目状態獲得手段５１３ａで
は、右目が開いているか閉じているかに応じて１ビット
の情報を出力する。目が開いているか閉じているかの判
定は、例えば右目情報にエッジ強調フィルタをかけ、雑
音除去フィルタをかけた後に、その出力信号を拡大・縮
退操作することで線分化する。このように線分化するこ
とを、今後簡単のため線分化処理すると呼ぶ。線分化処
理の流れ図を図３４に示す。また、拡大・縮退操作する
ことで線分化が行われる様子を図３５に示す。このよう
にして得られた線分化画像内に存在する最も面積の大き
な閉領域の大きさがある基準値より大きい場合、目が開
いていると判定し、基準値より小さい場合は目が閉じて
いると判定する。Right eye information is obtained as an image including the right eye through the camera A 512a. The right eye information is input to the right eye state acquisition means 513a. The right-eye state acquisition unit 513a outputs 1-bit information depending on whether the right eye is open or closed. The determination as to whether the eye is open or closed is performed by, for example, applying an edge enhancement filter to the right eye information, applying a noise removal filter, and then performing an expansion / reduction operation on the output signal to perform line segmentation. Such line segmentation will be referred to as line segmentation processing for simplicity hereinafter. FIG. 34 shows a flow chart of the line segmentation processing. Further, FIG. 35 shows how line segmentation is performed by performing an enlargement / reduction operation. If the size of the largest closed area present in the line segmented image obtained in this way is greater than a reference value, it is determined that the eyes are open, and if it is less than the reference value, the eyes are closed. Determine that

【０１７３】また、カメラＢ５１２ｂより獲得される左
目情報は左目状態獲得手段５１３ｂの入力となり、該左
目状態獲得手段５１３ｂでは同様の方法で左目が開いて
いるか閉じているかを判定し、１ビットの情報を出力す
る。カメラＣ５１２ｃより獲得される口情報は、口状態
獲得手段５１３ｃの入力となり、その出力として口の形
を表す口状態を得る。口状態獲得手段５１３ｃでは、口
の形としてあらかじめ用意されているテンプレートテー
ブルに含まれるテンプレートの中で、該テンプレートと
口情報を線分化処理することにより得られる口線分情報
との間の類似度が最大となる候補を選択することで達成
される。The left-eye information obtained from the camera B 512b is input to the left-eye state obtaining means 513b. The left-eye state obtaining means 513b determines whether the left eye is open or closed by the same method, and the 1-bit information is obtained. Is output. The mouth information acquired by the camera C 512c is input to the mouth condition acquisition unit 513c, and the mouth condition representing the mouth shape is obtained as the output. In the mouth state acquisition means 513c, among the templates included in the template table prepared in advance as the shape of the mouth, the degree of similarity between the template and the mouth line segment information obtained by line-dividing the mouth information. This is achieved by selecting the candidate with the maximum.

【０１７４】次に、類似度が最大となるテンプレートの
選択法の一例について説明する。前記口線分画像から口
の横方向の長さＨ、口の縦方向の長さＶを測定し、（Ｈ
とＶの関係を図３９に示す）この２つの値とテンプレー
トテーブルのＨｔ，Ｖｔとの間で、Ｅｔ＝｜Ｈ−Ｈｔ｜＋｜Ｖ−Ｖｔ｜但し、ｔ＝０〜Ｔ−１を求め、Ｅｔが最小となるテンプレートが類似度が最大
とみなし、テンプレート番号ｔを口状態として出力する
（テンプレートテーブルの一例を図４０に示す）。テン
プレートテーブルに含まれるテンプレートの数は、口が
閉じている状態を含め最低２個は必要でわり、音素の種
類だけ用意されていることが望ましい。Next, an example of a method of selecting a template having the maximum similarity will be described. The horizontal length H of the mouth and the vertical length V of the mouth are measured from the mouth line segment image, and (H
The relationship between V and V is shown in FIG. 39) Et = | H−Ht | + | V−Vt | where t = 0 to T−1 is calculated between these two values and Ht and Vt in the template table. , Et is considered to be the highest in similarity, and the template number t is output as the mouth state (an example of the template table is shown in FIG. 40). The number of templates included in the template table is at least two, including the state in which the mouth is closed, and it is desirable to prepare only the types of phonemes.

【０１７５】出力画像生成手段５１４では、人工画デー
タベース５１５から一つの人工画をあらかじめ選択して
おき、順次入力される右目状態、左目状態、口状態を基
に人工画データベースから小片を読みだし、人工画の対
応する各部分に当てはめて出力画像を生成し、表示機器
５１６に表示する。In the output image generation means 514, one artificial image is selected in advance from the artificial image database 515, and a small piece is read from the artificial image database based on the right eye state, left eye state and mouth state which are sequentially input, An output image is generated by applying it to each corresponding part of the artificial image and displayed on the display device 516.

【０１７６】次に、本実施例の他の一例を図３７に従い
説明する。なお図３７に含まれる各名称で図３３と同じ
名称を持つものは、上述したものと同じ機能を有するの
でここでは説明を省略する。Another example of this embodiment will be described below with reference to FIG. Note that the names included in FIG. 37, which have the same names as those in FIG. 33, have the same functions as those described above, so description thereof will be omitted here.

【０１７７】ここで、図３７に示す実施例について図３
３で述べた実施例との違いは、音響マイク５２２が構成
に含まれることと、この音響マイク５２２から得られる
音声情報が音声コーデック５２２ａを介してスピーカ５
２２ｃから出力されるとともに、この音声情報から得ら
れる音声状態を用いて口状態を状態補正手段５２７によ
って補正する点にある。つまり、先の実施例では口情報
からのみ口状態を決定していたのに対し、この実施例で
は口情報と音声情報の両者から口状態を決定する点が異
なる。Here, the embodiment shown in FIG. 37 will be described with reference to FIG.
The difference from the embodiment described in 3 is that the acoustic microphone 522 is included in the configuration, and the audio information obtained from the acoustic microphone 522 is transmitted to the speaker 5 via the audio codec 522a.
22c is output, and the mouth state is corrected by the state correction means 527 using the voice state obtained from this voice information. That is, the mouth state is determined only from the mouth information in the previous embodiment, whereas the mouth state is determined from both the mouth information and the voice information in this embodiment.

【０１７８】音声状態は音声情報を入力とする音声状態
獲得手段５２６の出力として得ることができる。音声状
態獲得手段５２６では、既存の音素認識技術を用いて得
られる音素情報から、図４０に示されるテンプレートテ
ーブルを参照し、テンプレート番号ｔｓを音声状態とし
て出力する。The voice state can be obtained as an output of the voice state acquisition means 526 which receives voice information. The speech state acquisition means 526 refers to the template table shown in FIG. 40 based on the phoneme information obtained by using the existing phoneme recognition technology, and outputs the template number ts as the speech state.

【０１７９】以下に、状態補正手段５２７の一実現法を
図３８を用いて説明する。状態補正手段５２７の入力と
して口状態ｔと音声状態ｔｓが与えられ、制御信号によ
って信頼性の高い状態が選択され出力される。制御信号
は、例えば背景雑音のレベルが小さいとき音声状態の信
頼度の方が口状態の信頼度より高いと考えられるので、
音声状態を選択するよう制御する。逆に、背景雑音レベ
ルの高いときは、音声状態より口状態の方が信頼性が高
いので（背景雑音により、音素が誤認識される可能性が
高いから）、口状態を選択するよう制御する。こうする
ことで、信頼性の高い口の動きを得ることができるた
め、より自然な合成画像を得ることができる。A method of realizing the state correction means 527 will be described below with reference to FIG. The mouth state t and the voice state ts are given as inputs to the state correcting means 527, and a highly reliable state is selected and output by the control signal. As for the control signal, for example, when the level of background noise is low, the reliability of the voice state is considered to be higher than the reliability of the mouth state.
Control to select the voice state. On the contrary, when the background noise level is high, the mouth state is more reliable than the voice state (because it is highly likely that the phoneme is erroneously recognized due to the background noise), so the mouth state is controlled to be selected. . By doing so, a highly reliable movement of the mouth can be obtained, and thus a more natural composite image can be obtained.

【０１８０】次に、本実施例のさらに他の実施例を図４
１に従い説明する。なお図４１に含まれる各名称で図３
３と同じ名称を持つものは、上述したものと同じ機能を
有するのでここでは説明を省略する。Next, another embodiment of this embodiment will be described with reference to FIG.
It will be described according to 1. Note that each name included in FIG.
Those having the same name as 3 have the same functions as those described above, and therefore the description thereof is omitted here.

【０１８１】ここで、図４１に示す実施例について図３
３で述べた実施例との違いは、先の実施例では人工画デ
ータベースに含まれる人工画の組み合わせから合成画像
を生成していたのに対し、第３の実施例では入力情報か
ら得られる線画の組み合わせにより合成画像を生成する
点が異なる。Here, the embodiment shown in FIG. 41 will be described with reference to FIG.
The third embodiment is different from the third embodiment in that a synthetic image is generated from a combination of artificial images included in the artificial image database in the previous example, whereas a line image obtained from input information is generated in the third example. The difference is that a combined image is generated depending on the combination of.

【０１８２】次に、右目線画獲得手段６０２について説
明する。右目線画獲得手段６０２では、右目情報に線分
化処理を施し右目線分化画像を得る。このようにして得
られた線分化画像内に存在する最も面積の大きな閉領域
を右目であると判定し、該閉領域の外側に存在する線分
は全て雑音だと考え、削除する。その様子を図４２に示
す。雑音削除された右目線分化画像を右目線画として出
力する。左目線画獲得手段６０３も同様に、左目情報か
ら左目線画を得ることができる。Next, the right eye line drawing acquisition means 602 will be described. The right-eye line drawing acquisition unit 602 performs line segmentation processing on the right-eye information to obtain a right-eye line segmented image. The closed area having the largest area existing in the line segmented image obtained in this way is determined to be the right eye, all line segments existing outside the closed area are considered to be noise, and are deleted. This is shown in FIG. The noise-removed right-eye line differentiated image is output as a right-eye line drawing. Similarly, the left-eye drawing acquisition unit 603 can also obtain the left-eye drawing from the left-eye information.

【０１８３】また口線画獲得手段６０４は、口情報に線
分化処理を施し口線分化画像を得て、前記線分化画像に
存在する最も大きな面積を有する閉領域を口であると判
定し、該閉領域の外側に存在する線分は全て雑音だと考
え削除する。雑音削除された口線分化画像を口線画とし
て出力する。Further, the mouth line drawing acquisition means 604 performs the line segmentation processing on the mouth information to obtain the mouth line segmentation image, determines that the closed region having the largest area existing in the line segmentation image is the mouth, and All line segments existing outside the closed area are considered to be noise and are deleted. The noise-removed mouth segmentation image is output as a mouthpiece.

【０１８４】出力線画作成手段６０５では、あらかじめ
適当な輪郭線画、髪型線画を線画データベース６０６か
ら選択しておき順次入力される右目線画、左目線画、口
線画を前記輪郭線画の対応する位置にはめ込み出力線画
を作成し、表示機器６０７に表示する。In the output line drawing creating means 605, an appropriate contour line drawing and hairstyle line drawing are selected in advance from the line drawing database 606 and the sequentially input right eye line drawing, left eye line drawing and mouth line drawing are inserted into corresponding positions of the contour line drawing and output. A line drawing is created and displayed on the display device 607.

【０１８５】次に、本実施例の他の一例を図４３に従い
説明する。なお図４３に含まれる各名称で図４１と同じ
名称を持つものは、上述したものと同じ機能を有するの
でここでは説明を省略する。Another example of this embodiment will be described below with reference to FIG. Note that the names included in FIG. 43 and having the same names as those in FIG. 41 have the same functions as those described above, so description thereof will be omitted here.

【０１８６】ここで、図４３に示す実施例について図４
１で述べた実施例との違いは、音響マイク６１２が構成
に含まれることと、この音響マイク６１２から得られる
音声情報が音声コーデック６１２ａを介してスピーカ６
１２ｂから出力されるとともに、この音声情報から得ら
れる音声状態を用いて口線画を線画補正手段６１８によ
って補正する点にある。つまり、先の実施例では口情報
からのみ口線画を決定していたのに対し、この実施例で
は口線画と音声状態の両者から口線画を決定する点が異
なる。Here, the embodiment shown in FIG. 43 will be described with reference to FIG.
The difference from the embodiment described in 1 is that the acoustic microphone 612 is included in the configuration, and the audio information obtained from the acoustic microphone 612 is transmitted to the speaker 6 via the audio codec 612a.
The point is that the line drawing correction means 618 corrects the mouth line drawing by using the audio state obtained from this audio information while being output from 12b. That is, in the previous embodiment, the mouth line drawing is determined only from the mouth information, but in this embodiment, the mouth line drawing is determined based on both the mouth line drawing and the voice state.

【０１８７】状態補正手段６１８については、前述した
実施例の状態補正手段５２７と同じなので、ここでは説
明を省略する。ただし、この実施例の線画補正手段６１
８で音声状態が選択された場合には、音声状態に対応す
る音素の線画が選択され口状態として出力される点が実
施例の状態補正手段５２７と異なる。また、状態補正手
段６１８の一実現法として、背景雑音のレベルによって
切り替える方法があるが、これは前述した第２の実施例
で説明してあるので、ここで新たに説明することは避け
る。The state correcting means 618 is the same as the state correcting means 527 of the above-mentioned embodiment, and therefore its explanation is omitted here. However, the line drawing correction means 61 of this embodiment
8 is different from the state correction unit 527 of the embodiment in that, when the voice state is selected in 8, the line drawing of the phoneme corresponding to the voice state is selected and output as the mouth state. Also, as a method of realizing the state correction means 618, there is a method of switching according to the level of background noise, but this has been described in the second embodiment described above, so a new description will be omitted here.

【０１８８】次に本発明に係る第６の実施例について説
明する。図４５は本発明を顔画像の符号化に用いる場合
の実施例を示したものである。図中６４２〜６４４はカ
メラ、６４１はマイクである。カメラ６４２は背景を含
む顔全体、カメラ６４４は目の周辺、カメラ６４２は口
の周辺や顎をそれぞれ撮す。カメラ６４４，６４２は目
や口に追従できるように可動できる。カメラ６４３は固
定でもよい。マイク６４１は写っている人物の声を拾
う。これらの入力装置から以下の画像情報や音声情報を
得る。すなわちカメラ６４３，６４４，６４２から各々
顔全体の画像６４５、目の周辺の画像６４７、口元の画
像６４８、マイク６４１から音声情報を得る。Next, a sixth embodiment according to the present invention will be described. FIG. 45 shows an embodiment when the present invention is used for encoding a face image. In the figure, 642 to 644 are cameras, and 641 is a microphone. The camera 642 takes an image of the entire face including the background, the camera 644 takes an image around the eyes, and the camera 642 takes an image around the mouth and the chin. The cameras 644 and 642 can be moved so as to follow the eyes and mouth. The camera 643 may be fixed. The microphone 641 picks up the voice of the person in the picture. The following image information and audio information are obtained from these input devices. That is, audio information is obtained from the image 64 of the entire face, the image 647 around the eyes, the image 648 of the mouth, and the microphone 641 from the cameras 643, 644, and 642, respectively.

【０１８９】これらの情報をもとに処理装置は以下の処
理を行う。顔全体の画像６４５から瞳孔、鼻腔、口腔等
を識別し位置関係を調べることで顔全体の動きを検出す
る。すなわち同瞳孔や口の間隔が変化しなければ平行移
動、同一の割合で伸縮すれば前後方向の移動として検出
される。次に顔全体の動きを参照してカメラ６４４，６
４２を目や口に追従させる。目の周辺の画像６４７と口
元の画像６４８を別に得るのは顔面内で目と口元は動く
頻度が高く形状の変化も大きいためである。目の周辺や
口元の動き検出にはそれらの画像６４７，６４８を用い
る。合成の際の処理を図４６に示す。顔全体の画像６５
０に目の周辺と口元の画像６４７，６４８を内挿する。
その内挿処理による復号画像の歪を抑えるため、分析す
る時に予め内挿した画像と顔全体の画像ｘとの誤差が小
さくなるように目の周辺や口元の画像６４７，６４８の
大きさなどを調整しておく。なお、目や口の部分に限ら
ず、他の部分も画像品質の関係上必要があれば同様の方
法で撮るようにしてもよい。The processing device performs the following processing based on these pieces of information. The movement of the entire face is detected by identifying the pupil, nasal cavity, oral cavity, and the like from the image 645 of the entire face and examining the positional relationship. That is, if the distance between the pupils and mouth does not change, it is detected as a parallel movement, and if they expand and contract at the same ratio, they are detected as a movement in the front-back direction. Next, referring to the movement of the entire face, the cameras 644 and 6
Make 42 follow your eyes and mouth. The image 647 around the eyes and the image 648 of the mouth are separately obtained because the eyes and the mouth move frequently in the face and the shape changes greatly. These images 647 and 648 are used to detect the movement around the eyes and around the mouth. FIG. 46 shows the processing at the time of composition. Whole face image 65
Images 647 and 648 around the eyes and the mouth are interpolated at 0.
In order to suppress the distortion of the decoded image due to the interpolation processing, the size of the images 647 and 648 around the eyes and the mouth are adjusted so that the error between the image interpolated in advance and the image x of the whole face during analysis is reduced. Adjust it. It should be noted that not only the eye and mouth portions but also other portions may be photographed by a similar method if necessary in view of image quality.

【０１９０】顔画像を符号化する場合、目の周辺や口元
の画像は画像６４７，６４８、それ以外の部分は全体の
画像６４５に相当する画像を異なる時刻に撮り、それら
に対して動き検出、動き補償などの処理を行うようにす
る。When a face image is coded, images corresponding to the images around the eyes and the mouth are taken as images 647 and 648, and other parts are taken as images of the whole image 645 at different times, and motion detection is performed for them. Perform processing such as motion compensation.

【０１９１】また被撮影者の顔のモデルをワイヤフレー
ムモデル、または顔面や頭髪の形状、色等の詳細を記録
した実物に近いモデルなどを使用して動きや色の変化等
の符号化を行う方式において、本実施例では以下のよう
に各部の動きの情報を与える。例えば顔全体の動きとし
て回転があるが、先に述べた顔全体画像からの瞳孔、口
腔の位置検出を利用して回転の検出を行うことができ
る。同瞳孔が高さ方向に位置がずれれば顔面内の回転、
間隔が狭まりながら画面の左右に寄れば首回りの回転、
及び瞳孔と口腔の間隔が変化しながら上下方向にずれれ
ば前後の回転としてそれぞれ検出する。図４７に上述し
た顔の動きと瞳孔、口腔の位置の変化を示す。（ａ），
（ｂ），（ｃ）がそれぞれ顔面内の回転、首回りの回
転、前後の回転の様子を表している。また目や口等の顔
の局部的な動きの詳細な分析情報は目の周辺や口元の画
像ｘ，ｘから得る。口周辺の動きは複雑であるため、そ
れをパターン化し、音声認識と併せて発音にあったパタ
ーンを選ぶようにしてもよい。音節と声の大きさのみの
情報を得てそれに合わせて口を開閉するような更に単純
化した方法もある。またモデルと実際の画像には多くの
場合差分が生じるが必要に応じてそれを伝送するように
してもよい。Also, a motion model, a change in color and the like are coded using a wire frame model as a model of the subject's face or a model close to a real thing in which details of the shape and color of the face and hair are recorded. In this method, in the present embodiment, information on the movement of each part is given as follows. For example, there is rotation as the movement of the entire face, but the rotation can be detected using the position detection of the pupil and the oral cavity from the image of the entire face described above. If the pupil shifts in the height direction, rotation in the face,
If you get closer to the left and right of the screen while the interval is narrowing, rotation around the neck,
Also, if the distance between the pupil and the oral cavity changes and it shifts in the vertical direction, it is detected as a front-back rotation. FIG. 47 shows the above-described movement of the face and changes in the positions of the pupil and the oral cavity. (A),
(B) and (c) show the rotation of the face, the rotation of the neck, and the rotation of the front and back, respectively. Further, detailed analysis information of the local movement of the face such as eyes and mouth is obtained from the images x around the eyes and the mouth. Since the movement around the mouth is complicated, it may be patterned and a pattern suitable for pronunciation may be selected together with voice recognition. There is a more simplified method in which only the syllable and the loudness of the voice are obtained and the mouth is opened and closed accordingly. In addition, a difference often occurs between the model and the actual image, but the difference may be transmitted as necessary.

【０１９２】図４８は目の周辺を撮す画像入力装置の別
の実施例である。入力装置６６０は眼鏡の形状をしてお
り、フレームにカメラ６６１が内蔵されている。眼鏡の
レンズ部分６６２はハーフミラーになっていて、それに
映った目の周辺部を横のフレームにあるカメラ本体６６
１で撮る。カメラ６６１に取り込まれた画像は先の実施
例同様有線または無線で伝送される。カメラが顔と一体
になっているため、目の位置を検出して追随させる機能
が不要になる。またこの入力装置にジャイロ等を内蔵し
ておくと顔の角度、回転を顔画像からの分析処理を経る
ことなく直接検知できる。この情報は先述のワイヤフレ
ームモデル等を動かすのに用いる。FIG. 48 shows another embodiment of the image input device for photographing the area around the eyes. The input device 660 has a shape of eyeglasses, and a camera 661 is incorporated in a frame. The lens portion 662 of the spectacles is a half mirror, and the peripheral portion of the eyes reflected in the lens body 662 is a camera body 66 in a horizontal frame.
Take with 1. The image captured by the camera 661 is transmitted by wire or wirelessly as in the previous embodiment. Since the camera is integrated with the face, the function of detecting the position of the eyes and following them is unnecessary. Further, if a gyro or the like is built in this input device, the angle and rotation of the face can be directly detected without performing analysis processing from the face image. This information is used to drive the wireframe model etc. described above.

【０１９３】次に本発明をマンマシンインタフェースに
用いた場合の実施例を挙げる。図４９は顔全体の画像と
目の周辺の画像を利用して視線の方向を補捉する処理に
ついて示したものである。まず顔全体の画像６６５を参
照し、測距装置６６４で目の正確な位置を検出し、次に
目の周辺の画像６６６によって視線の向いている方向を
調べて何を見ているかを特定する。測距装置６６４には
例えば向きの制御と距離を出力することができる自動焦
点カメラを用いると目とカメラの間の距離及びカメラの
向きから目の位置を割り出すことができる。この処理は
一般的に、装置の操作画面などにも用いることができ
る。被撮影者の周囲を撮るカメラを別に用意しておけば
広い範囲にわたって被撮影者が見ている物を補捉するこ
とができる。Next, an example of using the present invention for a man-machine interface will be described. FIG. 49 shows a process of capturing the direction of the line of sight using the image of the entire face and the image of the area around the eyes. First, the image of the entire face 665 is referred to, the accurate position of the eyes is detected by the distance measuring device 664, and then the direction of the line of sight is examined by the image 666 around the eyes to specify what is seen. . If, for example, an autofocus camera capable of controlling the direction and outputting the distance is used as the distance measuring device 664, the position of the eye can be determined from the distance between the eyes and the direction of the camera. This process can generally be used for the operation screen of the device. If a camera for taking a picture of the surroundings of the person to be photographed is separately prepared, it is possible to capture an object viewed by the person to be photographed over a wide range.

【０１９４】顔全体の動きを検出するために赤外線カメ
ラによる画像を併せて用いるとより精度を上げることが
できる。瞳孔、鼻腔、口腔等は皮膚の表面より温度が低
いので赤外線カメラによる画像を用いてそれらを識別す
る。通常のカメラの顔全体画像ｘとの照合の方法として
は、赤外線の画像と通常のカメラの画像で各々顔の部分
の抽出を行い、目や口に対応する輝度の変化等が両者で
一致するかを比較すればよい。If an image from an infrared camera is also used to detect the movement of the entire face, the accuracy can be further improved. Since the temperature of pupils, nasal cavities, oral cavity, etc. is lower than that of the surface of the skin, they are identified using an image from an infrared camera. As a method of collating with the whole face image x of the normal camera, the face image is extracted from the infrared image and the image of the normal camera, and the change in the brightness corresponding to the eyes and the mouth are the same. You can compare them.

【０１９５】[0195]

【発明の効果】以上、本発明によれば、情報入力の手段
が複数あるので多元的に情報をとらえることができ、情
報の特性に合った処理で情報のエッセンスを効率的に獲
得することが可能となる。これらの多元情報を統合する
際は不要な情報は除き、互いの情報の属性をもとに情報
の統合化を行うことにより、情報の加工、合成が容易に
なる。また、符号化においても効率的に情報量の削減が
できるようになる。As described above, according to the present invention, since there are a plurality of means for inputting information, information can be grasped in a multidimensional manner, and the essence of information can be efficiently obtained by processing suitable for the characteristics of the information. It will be possible. When these pieces of multi-dimensional information are integrated, unnecessary information is excluded, and the information is integrated based on the attributes of the mutual information, which facilitates the processing and combining of the information. Also, the amount of information can be efficiently reduced in encoding.

【０１９６】また、本発明によれば、複数のカメラの動
きの中から、実際に符号化すべき画面内の動物体、ある
いは動部分の動きを最もよく表現する動きを初期動きベ
クトルとして利用できるので、実際の動きを全て動きベ
クトルとして表現する必要が無くなり、ＭＣによる動き
ベクトルの発生情報量を少なくすることが可能となる。
さらにカメラの動きによって動き量検出範囲が自動的に
拡大されるので、実際の動きベクトルを計算する際には
検出範囲を限定することが可能になり、動きベクトルに
割り当てる符号も減少して、動きベクトル情報をさらに
少なく抑えることも可能となる。Further, according to the present invention, it is possible to use, as the initial motion vector, the motion that best expresses the motion of the moving object or the moving part in the screen to be actually encoded from among the motions of the plurality of cameras. Since it is not necessary to represent all actual motions as motion vectors, it is possible to reduce the amount of generated motion vector information by MC.
Furthermore, since the motion amount detection range is automatically expanded by the camera movement, it is possible to limit the detection range when calculating the actual motion vector, and the code assigned to the motion vector is also reduced. It is possible to further reduce the vector information.

【０１９７】また、本発明のメディア処理システムは、
使用者の画像（客観視点の画像）と使用者が見ている環
境画像（主観視点の画像）を同時に１つの自然で臨場感
のある画像として提供できる。また、背景画像を秘匿化
する場合には、使用者側の音の変化に合った単調になら
ない秘匿化された背景画像を提供できる。Further, the media processing system of the present invention is
It is possible to simultaneously provide the image of the user (the image of the objective viewpoint) and the environment image viewed by the user (the image of the subjective viewpoint) as one natural and realistic image. Further, when the background image is concealed, it is possible to provide a concealed background image that does not become monotonous in accordance with the change in the sound on the user side.

【０１９８】また、本発明によれば重要な部分の解像度
を向上でき、また、撮影環境の変化に自動的に追随する
ことが可能となる。Further, according to the present invention, it is possible to improve the resolution of an important portion, and it is possible to automatically follow the change of the photographing environment.

【０１９９】また、本発明によるメディア入力装置で
は、入力画像における背景部分のように入力条件がわか
ればおおよその画像を再生することが可能な部分を判定
し、この部分については入力条件情報のみを伝送／蓄積
するようにして伝送／蓄積に必要な情報量を大幅に削減
している。また、伝送／蓄積される情報は入力画像／音
が構成要素毎に分解されているため、再生装置において
特定の構成要素のみを取り出したり、ある構成要素を他
の画像／音に置き換えたりといった処理を容易に行うこ
とができる。Further, in the media input device according to the present invention, a portion such as a background portion in the input image that can reproduce an approximate image is determined if the input condition is known, and only the input condition information is determined for this portion. By transmitting / accumulating, the amount of information required for transmission / accumulation is greatly reduced. Further, since the input image / sound is decomposed for each component in the information to be transmitted / stored, a process of extracting only a specific component or replacing a certain component with another image / sound in the reproducing apparatus. Can be done easily.

【０２００】また、本発明によれば限定された情報から
画像を合成できるので非常に少ない情報で動画像を表現
することができる。また、高精細な人工画を基に動画像
を生成するため、歪のない高品質な動画を生成すること
ができる。また、好みの人工画をデータベースから選択
できるため、同一の情報源に対し様々な合成画像を得る
ことができる。さらに、合成画像の動きは音声情報によ
り補正されているため、違和感の無い合成画像を得るこ
とができる。さらに、前記人工画のかわりに線画による
画像合成が可能になる。Further, according to the present invention, since an image can be synthesized from limited information, a moving image can be expressed with very little information. Moreover, since a moving image is generated based on a high-definition artificial image, it is possible to generate a high-quality moving image without distortion. In addition, since a desired artificial image can be selected from the database, various synthetic images can be obtained for the same information source. Furthermore, since the movement of the composite image is corrected by the audio information, it is possible to obtain a composite image with no discomfort. Further, it is possible to synthesize an image by a line drawing instead of the artificial image.

【０２０１】さらに、本発明によれば、顔全体の動きと
顔面内の各部の局所的な動きに関する情報を各々分離し
て正確に得ることができる。従って顔画像を符号化する
際は顔の動きを効率良く符号化することができる。また
モデルベース符号化やマンマシンインタフェースへの適
用において、前記のように顔の動きを自然で合理的に分
析できるため性能向上を実現することができる。Furthermore, according to the present invention, it is possible to accurately obtain information regarding the movement of the entire face and the information regarding the local movement of each part in the face. Therefore, when the face image is encoded, the movement of the face can be efficiently encoded. Further, in the application to model-based coding and man-machine interface, the facial movement can be naturally and reasonably analyzed as described above, so that performance improvement can be realized.

[Brief description of drawings]

【図１】本発明の概略の構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of the present invention.

【図２】本発明に係る第１の実施例の符号化側、および
復号化側の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of an encoding side and a decoding side of the first example according to the present invention.

【図３】符号化器の内部構成を詳細に示すブロック図で
ある。FIG. 3 is a block diagram showing in detail an internal configuration of an encoder.

【図４】動き補償の一実施例を示すブロック図である。FIG. 4 is a block diagram showing an example of motion compensation.

【図５】本発明に係る他の一実施例の構成を示すブロッ
ク図である。FIG. 5 is a block diagram showing the configuration of another embodiment according to the present invention.

【図６】従来の構成を示すブロック図である。FIG. 6 is a block diagram showing a conventional configuration.

【図７】本発明に係る第２の実施例のメディア処理シス
テムを示すブロック図である。FIG. 7 is a block diagram showing a media processing system according to a second embodiment of the present invention.

【図８】第２の実施例に係る他の実施例のメディア処理
装置のブロック図である。FIG. 8 is a block diagram of a media processing device of another embodiment according to the second embodiment.

【図９】画像合成の一例の説明図である。FIG. 9 is an explanatory diagram of an example of image combination.

【図１０】第２の実施例に係る他の実施例のメディア処
理装置のブロック図である。FIG. 10 is a block diagram of a media processing device of another embodiment according to the second embodiment.

【図１１】第２の実施例に係る他の実施例のメディア処
理装置のブロック図である。FIG. 11 is a block diagram of a media processing device of another embodiment according to the second embodiment.

【図１２】従来の構成を示すブロック図である。FIG. 12 is a block diagram showing a conventional configuration.

【図１３】本発明に係る第３の実施例の構成を示すブロ
ック図である。FIG. 13 is a block diagram showing a configuration of a third exemplary embodiment of the present invention.

【図１４】第１の撮像画像を示す図である。FIG. 14 is a diagram showing a first captured image.

【図１５】第２の撮像画像を示す図である。FIG. 15 is a diagram showing a second captured image.

【図１６】合成画像を示す図である。FIG. 16 is a diagram showing a composite image.

【図１７】第３の実施例に係る他の実施例を示すブロッ
ク図である。FIG. 17 is a block diagram showing another embodiment according to the third embodiment.

【図１８】顔領域を示す図である。FIG. 18 is a diagram showing a face area.

【図１９】第３の実施例に係るさらに他の実施例を示す
ブロック図である。FIG. 19 is a block diagram showing still another embodiment according to the third embodiment.

【図２０】撮像手段の一実施例を示す外観図である。FIG. 20 is an external view showing an embodiment of an image pickup unit.

【図２１】図２０に示す実施例で適用される水平保持機
構を示す図である。21 is a diagram showing a horizontal holding mechanism applied in the embodiment shown in FIG.

【図２２】第３の実施例に係るさらに他の実施例を示す
外観図である。FIG. 22 is an external view showing still another embodiment according to the third embodiment.

【図２３】従来例を示す図である。FIG. 23 is a diagram showing a conventional example.

【図２４】従来例の画素配置を示す図である。FIG. 24 is a diagram showing a pixel arrangement of a conventional example.

【図２５】本発明に係る第４の実施例であるメディア入
力装置の構成を示すブロック図である。FIG. 25 is a block diagram showing a configuration of a media input device according to a fourth embodiment of the present invention.

【図２６】図２５のメディア入力装置からの入力画像と
背景／非背景分離を説明する図である。FIG. 26 is a diagram illustrating an input image from the media input device of FIG. 25 and background / non-background separation.

【図２７】図２５の入力装置に対応する画像再生装置の
構成を説明する図である。27 is a diagram illustrating a configuration of an image reproducing device corresponding to the input device of FIG. 25.

【図２８】第４の実施例のメディア入力装置の他の実施
例を示すブロック図である。FIG. 28 is a block diagram showing another embodiment of the media input device of the fourth embodiment.

【図２９】図２８に示すメディア入力装置からの入力画
像と背景／非背景分離を説明する図である。FIG. 29 is a diagram illustrating an input image from the media input device shown in FIG. 28 and background / non-background separation.

【図３０】図２８に示す入力装置に対応する画像再生装
置の構成を示すブロック図である。30 is a block diagram showing a configuration of an image reproducing device corresponding to the input device shown in FIG. 28.

【図３１】従来の技術によるメディア入力装置の一例を
示すブロック図である。FIG. 31 is a block diagram showing an example of a media input device according to a conventional technique.

【図３２】従来の技術によるメディア再生装置の一例を
示すブロック図である。FIG. 32 is a block diagram showing an example of a media reproducing device according to a conventional technique.

【図３３】本発明に係る第５の実施例の構成を示すブロ
ック図である。FIG. 33 is a block diagram showing a configuration of a fifth exemplary embodiment of the present invention.

【図３４】入力画像を線分化する処理の流れ図である。FIG. 34 is a flowchart of a process of dividing an input image into lines.

【図３５】拡大・縮退操作により線分化が行えることを
説明する図である。FIG. 35 is a diagram illustrating that line segmentation can be performed by an enlargement / reduction operation.

【図３６】目が開いているか閉じているかを判定する様
子を説明する図である。FIG. 36 is a diagram for explaining how to determine whether the eyes are open or closed.

【図３７】本発明の第５の実施例に係る他の例の構成を
示す図である。FIG. 37 is a diagram showing the configuration of another example according to the fifth example of the present invention.

【図３８】状態補正手段の実現方法の一例を説明する図
である。FIG. 38 is a diagram illustrating an example of a method of realizing a state correction unit.

【図３９】ＨとＶの関係を説明する図である。FIG. 39 is a diagram illustrating a relationship between H and V.

【図４０】テンプレートテーブルを示す図である。FIG. 40 is a diagram showing a template table.

【図４１】第５の実施例の他の一例の構成を示す図であ
る。FIG. 41 is a diagram showing the configuration of another example of the fifth embodiment.

【図４２】線分化画像の雑音の様子を示す図である。[Fig. 42] Fig. 42 is a diagram illustrating the appearance of noise in a line-differentiated image.

【図４３】第５の実施例のさらに他の一例の構成を示す
図である。FIG. 43 is a diagram showing the configuration of still another example of the fifth example.

【図４４】従来法を説明するブロック図である。FIG. 44 is a block diagram illustrating a conventional method.

【図４５】本発明に係る第６の実施例である画像符号化
装置の入力部分を示す図である。[Fig. 45] Fig. 45 is a diagram illustrating an input portion of an image encoding device that is a sixth embodiment according to the present invention.

【図４６】画像符号化装置における復号の際の画像処理
を示す図である。[Fig. 46] Fig. 46 is a diagram illustrating image processing during decoding in the image encoding device.

【図４７】顔の動きと瞳孔、口腔の位置の変化を説明す
るための図である。[Fig. 47] Fig. 47 is a diagram for describing movement of the face and changes in the positions of the pupil and the oral cavity.

【図４８】目の周辺を撮す画像入力装置の別の実施例を
示す図である。FIG. 48 is a diagram showing another embodiment of the image input device for taking a picture around the eyes.

【図４９】本実施例をマンマシンインタフェースに用い
た場合の例を示す図である。FIG. 49 is a diagram showing an example of using this embodiment for a man-machine interface.

【図５０】従来の方法による動き検出を示す図である。FIG. 50 is a diagram showing motion detection by a conventional method.

[Explanation of symbols]

１情報源３，１１０，２０１限定入力装置５，１４０，３０３処理装置７，２０４，２１３出力装置１２０符号化器１３０動き量設定回路１５０復号化器２０３，２０７処理部２０５入力装置２０８背景分離除去部２０９サイズ設定部２１０範囲設定部２１１鏡像反転部２１２画像セレクタ３０１，４０１，５１２カメラ３０４ディスプレイ４０２，６４１マイク４０７背景抽出部４０８背景画像／音決定部４０９画像／音データベース４１１多重化部５１３状態獲得手段５１４出力画像作成手段５１５人工画データベース５１６表示機器６４２口元を撮るカメラ６４３顔全体の画像を撮るカメラ６４４目の周辺を撮るカメラ６４５顔全体の画像６４７目の周辺の画像６４８口元の画像６４７復号化された目の周辺画像６４８復号化された口元の画像６５０最終的に復号化された顔全体の画像６６０目の周辺の画像の入力装置６６１カメラ６６２レンズ（ハーフミラー）６６３顔全体を撮るカメラ６６４目の周辺を撮り測距機能を持つカメラ６６５顔全体の画像６６６目の周辺の画像 1 information source 3,110,201 limited input device 5,140,303 processing device 7,204,213 output device 120 encoder 130 motion amount setting circuit 150 decoder 203,207 processing unit 205 input device 208 background separation removal Part 209 Size setting part 210 Range setting part 211 Mirror image inverting part 212 Image selector 301, 401, 512 Camera 304 Display 402, 641 Microphone 407 Background extraction part 408 Background image / sound determination part 409 Image / sound database 411 Multiplexing part 513 State Acquisition means 514 Output image creation means 515 Artificial image database 516 Display device 642 Camera that captures the mouth 643 Camera that captures an image of the whole face 644 Camera that captures the area around the eye 645 Image of the entire face 647 Image around the eye 648 Image of the mouth 647 Recovery Image around the eye 648 Decoded image of the mouth 650 Image of the final decoded whole face 660 Input device for image around the eye 661 Camera 662 Lens (half mirror) 663 Camera for taking the entire face 664 Camera around the eyes that has a distance measurement function 665 Image of the entire face 666 Image around the eyes

───────────────────────────────────────────────────── フロントページの続き (72)発明者菊池義浩神奈川県川崎市幸区小向東芝町１株式会社東芝研究開発センター内 (72)発明者押切正浩神奈川県川崎市幸区小向東芝町１株式会社東芝研究開発センター内 (72)発明者神庭進神奈川県川崎市幸区小向東芝町１株式会社東芝研究開発センター内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Yoshihiro Kikuchi Yoshiko Kikuchi 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki-shi, Kanagawa Toshiba Research & Development Center (72) Inventor Masahiro Oshikiri Komukai-Toshiba, Kawasaki-shi, Kanagawa 1 Incorporated Toshiba Research and Development Center (72) Inventor Susumu Kanba 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki-shi, Kanagawa 1 Incorporated Toshiba Research and Development Center

Claims

[Claims]

1. A plurality of input means for respectively limiting target information to acquire information, a processing means for integrating acquisition information based on attributes of the acquired acquisition information, and this processing means. And a means for outputting processing information obtained by being integrated with the media processing system.

2. A plurality of input means for respectively limiting target information and acquiring information, and a processing means for integrating the acquired information by removing information of low importance from the acquired information. A media processing system comprising: and an output unit that outputs processing information obtained by being integrated by the processing unit.

3. A plurality of image information inputting means for respectively acquiring image information related to the information object, and a detecting means for detecting a change with time of the information object included in the acquired acquired image information, A media processing system comprising: an output unit that outputs time-dependent change information obtained by the detection unit.

4. One or a plurality of first information inputting means, each of which obtains information including image information related to the information object, in a limited manner, and an image of an object for operating the first information inputting means as an information object. Second information input means for obtaining information including information, first information and second information obtained by the first information input means
A media processing system comprising: a synthesizing unit for synthesizing the second information acquired by the information inputting unit.

5. One or a plurality of first image information inputting means, each of which obtains image information related to an information object in a limited manner, and the first image information obtained by the first image information inputting means. Second, which acquires image information relating to an arbitrary area included
The second pixel value data acquired by the second image information input means is inserted between the image information input means and the first pixel value data acquired by the first image information input means. A media processing system having means processing.

6. A first input means for acquiring information on an information object, a candidate accumulating means for accumulating candidates for replacing a part of the information object, and an input condition for obtaining an input condition for inputting the information object. Acquisition means, second input means for acquiring information relating to the information object, at least the candidate accumulation based on the input condition acquired by the input condition acquisition means and the input information from the first input means. A media processing system comprising: means for selecting a source of replacement based on information from one of the means and the second input means, and replacing means for replacing a part of the information target input with the replacement information.

7. A plurality of image information input means for respectively acquiring image information of an information target, an acoustic information input means for acquiring acoustic information of the information target, the plurality of image information input means and acoustic information Status acquisition means for acquiring the status of each information object from the acquired information acquired by the input means, and correction means for correcting each part of the artificial image corresponding to the status of the information object acquired by the status acquisition means A media processing system comprising:

8. A first method for acquiring image information of an information object
Image input means, second image input means for acquiring image information of a part or all of the image area acquired by the first image input means, the first image input means and second image Extraction means for extracting at least one temporal change of the image information acquired by the input means, and image information acquired by the first image input means according to the temporal change extracted by the extracting means. Replacement means for partially replacing the image information acquired by the second image input means, and replacement of the image information by the replacement means, an information object included in the image information acquired by the first image input means. And a control means for controlling a part of the information object according to the overall movement of the device so as to correspond to the motion.