JP2005191964A

JP2005191964A - Image processing method

Info

Publication number: JP2005191964A
Application number: JP2003431353A
Authority: JP
Inventors: Hidenori Takeshima; 秀則竹島; Takashi Ida; 孝井田; Yoshitaka Omori; 義啓大盛; Nobuyuki Matsumoto; 信幸松本; Yasunori Taguchi; 安則田口
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-12-25
Filing date: 2003-12-25
Publication date: 2005-07-14
Anticipated expiration: 2023-12-25
Also published as: JP4188224B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing method in which a plurality of participants can communicate sharing one composite image. <P>SOLUTION: The image processing method includes a step for inputting self-portraits taken by cameras 14 of self terminals, an external terminal image inputting step for inputting each of the external terminal images taken by a plurality of external terminals via a network, an image inverting step for performing mirror transposition of each of images input from the self terminal belonging to a first terminal group containing the self terminal and from the other terminals, and a combining step for combining each of the first terminal group's images that have undergone the mirror transposition and the external terminal images of a second terminal group not belonging to the first terminal group to produce a composite image. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、例えば、複数の参加者が１つの合成画像を共有しながらコミュニケーションを行うための画像処理方法に関する。 The present invention relates to an image processing method for allowing a plurality of participants to communicate while sharing one composite image, for example.

離れた場所にいるユーザ同士でのコミュニケーションを支援するために、テレビ会議やビデオチャット等のシステムが提案されたが、これらのシステムでは親近感や一体感を感じることができないため、広く普及するにはいたっていない。そこで、親近感や一体感が感じられるシステムを目指し、参加者の画像を１つの画面に合成し、合成画像を共有しながらコミュニケーションを行うシステムが検討されている。 In order to support communication between users in remote locations, systems such as video conferencing and video chat have been proposed, but since these systems cannot feel a sense of familiarity and unity, they will be widely used. Yes, not. Therefore, with the aim of a system where a sense of familiarity and sense of unity can be felt, a system has been studied in which participants' images are combined on one screen and communication is performed while sharing the combined image.

非特許文献１では、図１９のようにカメラ（Ｓ１２０６、Ｓ１２０８）、鏡像反転装置（Ｓ１２０４、Ｓ１２０７）、クロマキー装置（Ｓ１２０２、Ｓ１２０３）、画面（Ｓ１２０１、Ｓ１２０５）を接続し、離れた場所に設置した複数のカメラで撮影し、ネットワークを介して撮影画像を送受信し、前景となる撮影映像の背景部分をクロマキー法（予めカメラの撮影範囲に青や緑等単色の背景を準備しておき、動作時には撮影画像から背景色と異なる部分を被写体とみなし、別の画像上に合成する画像処理方法）で除去して１つの画像として合成し、得られた合成画像を大画面のスクリーンやプロジェクタ装置を用いて等身大に近い大きさで各参加者に提示する。この際、カメラを２つに制限し、図１８のフローチャートに示すように自身の撮影映像については鏡像反転を行い（Ｓ１２０２）、相手の撮影映像については鏡像反転を行わずに合成を行い（Ｓ１２０３）、その合成画像を提示する（Ｓ１２０４）ことで、鏡を見るように自然で、一体感が感じられるコミュニケーションシステムが構築できる。このシステムの構築にあたっては、クロマキー法のために単色の背景という特殊な環境を準備する必要がある。 In Non-Patent Document 1, a camera (S1206, S1208), a mirror image inverting device (S1204, S1207), a chroma key device (S1202, S1203), and a screen (S1201, S1205) are connected as shown in FIG. Shooting with multiple cameras, sending and receiving shot images via the network, the background part of the foreground shot video is chroma key method (preparing a monochrome background such as blue or green in the camera shooting range in advance Sometimes, a portion different from the background color from the photographed image is regarded as a subject and is removed by an image processing method (compositing on another image) and synthesized as one image, and the resulting synthesized image is combined with a large screen screen or projector device. Use to present to each participant in a size close to life size. At this time, the number of cameras is limited to two, as shown in the flowchart of FIG. 18, mirror image reversal is performed for the captured video (S1202), and the other captured video is synthesized without performing mirror image reversal (S1203). ) By presenting the composite image (S1204), it is possible to construct a communication system that feels natural and unity like a mirror. In constructing this system, it is necessary to prepare a special environment with a monochrome background for the chroma key method.

一方、非特許文献２では、クロマキー法ではなく背景差分法（予めカメラの撮影範囲を撮影・記憶しておき、動作時には撮影画像から記憶した背景と異なる部分を被写体とみなし、別の画像上に合成する画像処理方法）を使って被写体の切り出しと合成を行い、合成画像を共有するシステムを構築している。背景差分法を使えば単色の背景を用意する必要がないため、特殊な環境を用意しなくても同様のコミュニケーションを行うことが可能となる。 On the other hand, in Non-Patent Document 2, instead of the chroma key method, the background difference method (photographing range of the camera is captured and stored in advance, and a portion different from the stored background from the captured image is regarded as a subject during operation, and is displayed on another image. The image processing method) to be combined is used to cut out and combine the subject and to construct a system for sharing the combined image. If the background subtraction method is used, it is not necessary to prepare a single-color background, so that the same communication can be performed without preparing a special environment.

ところが、背景差分のみでは被写体の正確な切り出しが難しい。そのため、さらに、正確に被写体の輪郭を抽出するために、背景差分で抽出した被写体の形状を利用して、非特許文献３のフラクタル輪郭抽出法で正確な輪郭の抽出を行っている。 However, it is difficult to accurately extract the subject only with the background difference. For this reason, in order to accurately extract the contour of the subject, accurate contour extraction is performed by the fractal contour extraction method of Non-Patent Document 3 using the shape of the subject extracted by the background difference.

このフラクタル輪郭抽出法について図２０のフローチャートに基づいて説明する。 This fractal contour extraction method will be described based on the flowchart of FIG.

まず、入力する情報は、画像データと、正確ではないが輪郭に近い形状データである概略形状データとである。なお、出力される情報は、より正確な形状データである。 First, information to be input is image data and rough shape data which is shape data that is not accurate but close to the contour. Note that the output information is more accurate shape data.

Ｓ１３０９：ブロックサイズから決定でき、かつ、正確に抽出できる写像回数Ｍを設定する。なお、Ｍは、縦横の画素数Ｘに、ブロック倍率ＡのＭ乗をかけた値が１以下になるような値である。すなわち、Ｘ×Ａ^Ｍ＝＜１を満たす値である。 S1309: The number M of mappings that can be determined from the block size and can be accurately extracted is set. Note that M is a value such that a value obtained by multiplying the number X of vertical and horizontal pixels by the M power of the block magnification A is 1 or less. That is, it is a value satisfying X × A ^M = <1.

Ｓ１３０２：概略形状データを参照しながら、その輪郭に複数の処理ブロック（例えば縦横各３２画素の正方形）を配置する。 S1302: While referring to the schematic shape data, a plurality of processing blocks (for example, squares of 32 pixels each in length and width) are arranged on the outline.

Ｓ１３０３：画像データを参照しながら、各処理ブロックについて、その処理ブロックをＡ倍した大きさ（例えばＡ＝２．０）を持ち、最もその誤差（例えば画素毎の誤差絶対値の総和値）が小さいブロックであるペアレントブロックを探索する。この探索は探索ステップが１画素である全探索か、はじめに大きな探索ステップで探索を行い、その解の近傍でのみ小さな探索ステップで探索を行うことを、探索ステップが１になるまで数回繰り返す階層的探索で行う。 S1303: With reference to the image data, each processing block has a size obtained by multiplying the processing block by A (for example, A = 2.0), and the error (for example, the sum of error absolute values for each pixel) is the largest. Search for parent blocks that are small blocks. This search is a full search in which the search step is one pixel, or a search in which a search is first performed in a large search step and a search is performed in a small search step only in the vicinity of the solution several times until the search step becomes 1. In the search.

Ｓ１３０４：次に、各ブロックで、形状データについてペアレントブロックから処理ブロックへの写像を行う。 S1304: Next, in each block, the shape data is mapped from the parent block to the processing block.

Ｓ１３０５：Ｍ回の写像が完了するまで、Ｓ１３０４の処理を繰り返す。 S1305: The process of S1304 is repeated until M mappings are completed.

Ｓ１３０６〜Ｓ１３０７：処理ブロックのサイズが十分に小さい（例えば縦横各４画素の正方形）なら、形状データを出力し画像処理を終了する。そうでなければ、処理ブロックをより小さなもの（例えば縦横それぞれが半分の長さをもつブロック）に変えてＳ１３０２に戻る。 S1306 to S1307: If the size of the processing block is sufficiently small (for example, a square of 4 pixels each in vertical and horizontal directions), shape data is output and the image processing is terminated. Otherwise, the processing block is changed to a smaller one (for example, a block having half the length and width), and the process returns to S1302.

フラクタル輪郭抽出法は図１７に示すように、処理ブロックのサイズが大きいほど、対応可能範囲は大きく、計算量は多く、得られる輪郭の精度は低くなるという特性をもつ。そこで、Ｓ１３０６〜Ｓ１３０７の処理では、対応可能範囲を大きく、かつ、得られる輪郭の精度を高くするために、始めに大きな処理ブロックのサイズで輪郭抽出処理を行った後、前の輪郭抽出処理の結果を次の入力として、段階的に対応可能範囲を小さくしながら輪郭抽出処理を繰り返す処理を行っている。このため対応可能範囲を大きくすると、その計算量は多くなる。 As shown in FIG. 17, the fractal contour extraction method has such characteristics that the larger the processing block size, the larger the applicable range, the more calculation amount, and the lower the accuracy of the obtained contour. Therefore, in the processing of S1306 to S1307, in order to increase the applicable range and increase the accuracy of the obtained contour, the contour extraction processing is first performed with a large processing block size, and then the previous contour extraction processing is performed. Using the result as the next input, the contour extraction process is repeated while gradually reducing the applicable range. For this reason, if the applicable range is increased, the amount of calculation increases.

このような画像処理を行うことで、処理前に比べ真の形状に近い形状データが得られる。
森川治，「超鏡：魅力あるビデオ対話方式をめざして」情報処理学会論文誌，Vol.41-3，pp.815-822，2000．竹島秀則，井田孝，堀修，「参加者の切りだし画像を実時間で共有するチャットシステムの開発」，2002年電子情報通信学会総合大会講演論文集，SD-3-10，pp.391-392，早稲田大学，Mar．2002． T. Ida and Y. Sambonsugi, ``Self-affine mapping system and its application to object contour extraction,'' IEEE Trans. Image Processing., vol.9, no.11, pp.1926-1936, Nov. 2000. By performing such image processing, shape data closer to the true shape than before processing can be obtained.
Osamu Morikawa, “Supermirror: Toward an Attractive Video Dialogue”, Transactions of Information Processing Society of Japan, Vol.41-3, pp.815-822, 2000. Hidenori Takeshima, Takashi Ida, Osamu Hori, “Development of a chat system for sharing participants' cut-out images in real time”, Proceedings of the 2002 IEICE General Conference, SD-3-10, pp.391- 392, Waseda University, Mar. 2002. T. Ida and Y. Sambonsugi, `` Self-affine mapping system and its application to object contour extraction, '' IEEE Trans. Image Processing., Vol.9, no.11, pp.1926-1936, Nov. 2000.

背景技術で述べた技術を用いて、カメラを２つに制限して自画像の鏡像反転を行い、背景差分法を用いて被写体の切り出しと合成を行えば、単色の背景という特殊な環境を用意しなくても、鏡を見るように自然で、一体感が感じられるコミュニケーションシステムが構築できる。 Using the technology described in the background art, if you limit the number of cameras to two and perform mirror image reversal of the self-portrait and cut out and combine the subject using the background subtraction method, a special environment of a single color background is prepared. Even without it, it is possible to build a communication system that feels as if you are looking at a mirror and feel a sense of unity.

しかし、このシステムでは３地点以上の通信ができないという問題点がある。 However, there is a problem that this system cannot communicate at three or more points.

そこで、本発明は、３人以上の参加者が１つの合成画像を共有しながらコミュニケーションを行うことができる画像処理方法を提供する。 Therefore, the present invention provides an image processing method in which three or more participants can communicate while sharing one composite image.

また、上記で説明した非特許文献３のフラクタル輪郭抽出法は計算コストのかかる画像処理であり、例えば、非特許文献２の方法でシステムを構成するとＬＡＮ接続での遅延時間は０．５秒程度となる。 Further, the fractal contour extraction method of Non-Patent Document 3 described above is an image processing that requires calculation cost. For example, when the system is configured by the method of Non-Patent Document 2, the delay time in LAN connection is about 0.5 seconds. It becomes.

そこで、本発明は、また、フラクタル輪郭抽出法を用いて後処理を行う場合に高速に処理できる画像処理方法を提供する。 Therefore, the present invention also provides an image processing method capable of high-speed processing when post-processing is performed using a fractal contour extraction method.

請求項１に係る発明は、ネットワークを介して接続された３個以上の端末から構成され、前記複数の端末の各撮影手段によってそれぞれ撮影された画像を互いに前記ネットワークを介して入出力するシステムを構成する端末における画像処理方法において、前記複数の端末を自端末が属する第１の端末グループと、前記自端末が属さない第２の端末グループに分割し、前記第１の端末グループに属する前記自端末及び一、または、複数の他の端末から入力された全ての画像を鏡像反転する画像反転ステップと、前記鏡像反転された前記第１の端末グループの各画像と、前記第２の端末グループに属する一、または、複数の他の端末から入力した画像とを全て重ね合わせて合成し、合成画像を生成する合成ステップと、を具備することを特徴とする画像処理方法である。 The invention according to claim 1 is a system that includes three or more terminals connected via a network, and that inputs and outputs images captured by each imaging means of the plurality of terminals via the network. In the image processing method in a constituting terminal, the plurality of terminals are divided into a first terminal group to which the own terminal belongs and a second terminal group to which the own terminal does not belong, and the own terminal belonging to the first terminal group is divided. An image inversion step for mirror-inverting all images input from the terminal and one or a plurality of other terminals, each image of the first terminal group that has been mirror-inverted, and the second terminal group And a synthesis step of generating a synthesized image by superimposing and synthesizing images input from one or a plurality of other terminals to which it belongs. That is an image processing method.

請求項２に係る発明は、自端末の撮影手段によって撮影された自画像を入力する自画像入力ステップと、複数の外部端末において撮影された各外部端末画像をネットワークを介して入力する外部端末画像入力ステップと、前記自端末を含む第１の端末グループに属する自端末と他の外部端末から入力された各画像を、鏡像反転する画像反転ステップと、前記鏡像反転された前記第１の端末グループの各画像と、前記第１の端末グループに属さない第２の端末グループの外部端末画像を重ね合わせて合成し、合成画像を生成する合成ステップと、を具備することを特徴とする画像処理方法である。 According to a second aspect of the present invention, there is provided a self-image input step of inputting a self-portrait photographed by the photographing means of the self-terminal, and an external terminal image input step of inputting each external terminal image photographed at a plurality of external terminals via a network An image inversion step of mirror-inverting each image input from the own terminal and other external terminals belonging to the first terminal group including the own terminal, and each of the first terminal group that has been mirror-inverted An image processing method comprising: a combining step of combining an image and an external terminal image of a second terminal group that does not belong to the first terminal group, and generating a combined image. .

請求項３に係る発明は、前記自端末及び前記外部端末の全てが、前記第１の端末グループに属することを特徴とする請求項２記載の画像処理方法である。 The invention according to claim 3 is the image processing method according to claim 2, wherein all of the own terminal and the external terminal belong to the first terminal group.

請求項４に係る発明は、ネットワークを介して接続された複数の端末から構成され、前記複数の端末の各撮影手段によってそれぞれ撮影された画像を互いに前記ネットワークを介して入出力するシステムを構成する端末における画像処理方法において、前記各画像の中の少なくとも一の画像を鏡像反転させる画像反転ステップと、前記鏡像反転させた画像の中で指定された反転処理除外領域のみを鏡像反転していない画像に置き換える画像置き換えステップと、前記反転処理除外領域のみを鏡像反転していない画像に置き換えた一の画像と他の画像とを重ね合わせて合成し、合成画像を生成する合成ステップと、を具備することを特徴とする画像処理方法である。 According to a fourth aspect of the present invention, there is provided a system that includes a plurality of terminals connected via a network, and that inputs and outputs images captured by each imaging unit of the plurality of terminals via the network. In the image processing method in the terminal, an image inversion step for mirror-inverting at least one of the images, and an image in which only the inversion processing exclusion area specified in the mirror-inverted image is not mirror-inverted An image replacement step for replacing the image with the image, and a combining step for generating a composite image by superimposing and synthesizing one image obtained by replacing only the inversion processing exclusion area with an image that has not been mirror-image inverted and another image. An image processing method characterized by this.

請求項５に係る発明は、ネットワークを介して接続された複数の端末から構成され、前記複数の端末の各撮影手段によってそれぞれ撮影された画像を互いに前記ネットワークを介して入出力するシステムを構成する端末における画像処理方法において、前記各画像の中の一の画像の被写体領域である被写体を算出する第１被写体ステップと、前記一の画像以外の画像の被写体領域である被写体を算出する第２被写体ステップと、前記一の画像の被写体と前記一の画像以外の画像の被写体を予め決められた重ね合わせ順序に従って前記一の画像上に重ね合わせて合成し、合成画像を得る合成ステップと、を具備することを特徴とする画像処理方法である。 The invention according to claim 5 is constituted by a plurality of terminals connected via a network, and constitutes a system for inputting and outputting images taken by the respective photographing means of the plurality of terminals to each other via the network. In the image processing method in the terminal, a first subject step for calculating a subject that is a subject region of one image in each image, and a second subject for calculating a subject that is a subject region of an image other than the one image And a synthesis step of superimposing and synthesizing the subject of the one image and the subject of an image other than the one image on the one image in accordance with a predetermined superposition sequence, and obtaining a composite image. An image processing method characterized by:

請求項６に係る発明は、前記合成ステップにおける重ね合わせ順序は、前記各画像の被写体から被写体の大きさを算出し、前記算出した各被写体の大きさを比較し、前記比較の結果に基づき決定することを特徴とする請求項５記載の画像処理方法である。 In the invention according to claim 6, the superposition order in the synthesis step is determined based on the comparison result by calculating the size of the subject from the subjects of the images and comparing the calculated sizes of the subjects. 6. The image processing method according to claim 5, wherein:

請求項７に係る発明は、自端末とネットワークを介して接続された一、または、複数の外部端末から構成され、前記複数の端末の各撮影手段によってそれぞれ撮影された画像を互いに前記ネットワークを介して入出力するシステムにおける自端末の画像処理方法において、前記外部端末の中の一の外部端末によって撮影された外部端末画像が入力する外部端末画像入力するステップと、前記一の外部端末によって撮影された画像であって、かつ、前記外部端末画像と区別可能な情報が付加された画像である参照画像が入力する参照画像入力ステップと、前記入力した参照画像を保存参照画像として保存する保存ステップと、前記外部端末画像が入力されたときには、前記保存参照画像と前記外部端末画像の異なる部分を検出して、この異なる部分を前記外部端末画像の被写体領域として算出する被写体領域算出ステップと、前記一の外部端末以外の外部端末によって撮影した外部端末画像、または、前記自端末で撮影した自画像と、前記算出された被写体領域を重ね合わせて合成し、合成画像を得る合成ステップと、を具備することを特徴とする画像処理方法である。 The invention according to claim 7 is composed of one or a plurality of external terminals connected to the own terminal via a network, and images captured by each imaging means of the plurality of terminals are mutually connected via the network. In the image processing method of the own terminal in the input / output system, an external terminal image input by an external terminal image captured by one external terminal among the external terminals is input, and the external terminal image is captured by the one external terminal. A reference image input step for inputting a reference image that is an image and an image to which information distinguishable from the external terminal image is added; and a storage step for storing the input reference image as a storage reference image; When the external terminal image is input, a different part of the stored reference image and the external terminal image is detected, and this different A subject area calculating step for calculating the minutes as a subject area of the external terminal image, an external terminal image taken by an external terminal other than the one external terminal, or a self-image taken by the own terminal, and the calculated subject An image processing method comprising: a combining step of superimposing regions and combining to obtain a combined image.

請求項８に係る発明は、Ｎ次元（但し、Ｎは２以上の自然数である）単位画素データの集合から構成されるＮ次元画像データを入力する第１ステップと、前記Ｎ次元画像データに表されている所定の形状の概略的な形状を示すＮ次元形状データを入力する第２ステップと、前記Ｎ次元形状データの境界部分に予め定めた大きさ及び形状を持つＮ次元処理ブロックを配置する第３ステップと、前記Ｎ次元処理ブロックと最も類似するようなＮ次元ペアレントブロックを前記Ｎ次元画像データ内で所定の変換方法に基づいて探索する探索処理と、前記Ｎ次元形状データ内で前記探索したＮ次元ペアレントブロックの内容に前記Ｎ次元処理ブロックの内容を置き換える置き換え処理をM回それぞれ繰り返す第４ステップと、前記Ｍ回繰り返して得られたＮ次元形状データを前記所定の形状のデータとして出力する第５ステップと、を具備し、前記第４ステップにおいて、（ｋ＋１）回目（但し、Ｍ＞ｋ≧１）のＮ次元処理ブロックの大きさが、ｋ回目のＮ次元処理ブロックの大きさより縮小され、前記探索処理では、前記（ｋ＋１）回目のＮ次元処理ブロックの大きさが大きいほど粗く探索するすることを特徴とする画像処理方法である。 According to an eighth aspect of the present invention, there is provided a first step of inputting N-dimensional image data composed of a set of N-dimensional (where N is a natural number of 2 or more) unit pixel data; A second step of inputting N-dimensional shape data indicating a rough shape of the predetermined shape, and an N-dimensional processing block having a predetermined size and shape at the boundary portion of the N-dimensional shape data A third step, search processing for searching an N-dimensional parent block most similar to the N-dimensional processing block in the N-dimensional image data based on a predetermined conversion method, and the search in the N-dimensional shape data The fourth step of repeating the replacement process for replacing the content of the N-dimensional processing block with the content of the N-dimensional parent block is repeated M times, and obtained by repeating the M times. A fifth step of outputting the N-dimensional shape data as the data of the predetermined shape, wherein in the fourth step, the size of the N-dimensional processing block of the (k + 1) th (where M> k ≧ 1). The image processing method is characterized in that the search is performed more coarsely as the size of the (k + 1) th N-dimensional processing block is larger in the search process. is there.

請求項９に係る発明は、自端末の撮影手段によって撮影された自画像を入力する自画像入力手段と、複数の外部端末において撮影された各外部端末画像をネットワークを介して入力する外部端末画像入力手段と、前記自端末を含む第１の端末グループに属する自端末と他の外部端末から入力された各画像を、鏡像反転する画像反転手段と、前記鏡像反転された前記第１の端末グループの各画像と、前記第１の端末グループに属さない第２の端末グループの外部端末画像を重ね合わせて合成し、合成画像を生成する合成手段と、を具備することを特徴とする画像処理装置である。 The invention according to claim 9 is a self-image input means for inputting a self-portrait photographed by a photographing means of the self-terminal, and an external terminal image input means for inputting each external terminal image photographed at a plurality of external terminals via a network. And image reversing means for mirror-inverting each image inputted from the own terminal belonging to the first terminal group including the own terminal and another external terminal, and each of the first terminal group having the mirror-inverted image An image processing apparatus comprising: a combining unit configured to superimpose and combine an image and an external terminal image of a second terminal group not belonging to the first terminal group to generate a combined image .

請求項１０に係る発明は、前記第１の端末グループに属する外部端末と、前記第２の端末グループに属する外部端末とを予め記憶しているグループ記憶手段を具備することを特徴とする請求項９記載の画像処理装置である。 The invention according to claim 10 comprises group storage means for storing in advance the external terminals belonging to the first terminal group and the external terminals belonging to the second terminal group. 9. The image processing apparatus according to 9.

請求項１、２、３、９に係る発明では、３地点以上でもユーザにとって自然な画像の共有が実現できる。 In the inventions according to claims 1, 2, 3, and 9, it is possible to realize natural image sharing for the user even at three or more points.

請求項４に係る発明では、反転除外領域の文字の鏡像反転を避けることができる。 In the invention according to claim 4, mirror image reversal of characters in the reversal exclusion region can be avoided.

請求項８に係る発明では、後処理であるフラクタル輪郭抽出法を高速に行えるため合成画像の表示までの遅延時間を少なくでき、ユーザ同士の円滑なコミュニケーションを可能にする。 In the invention according to claim 8, since the fractal contour extraction method which is post-processing can be performed at high speed, the delay time until the composite image is displayed can be reduced, and smooth communication between the users is enabled.

（実施形態１）
以下、本発明の実施形態１の画像処理方法について説明する。本実施形態では、３つ以上の端末１０（例えば、４つの端末１０）の接続に関するものであり、図１〜図５を用いて説明する。 (Embodiment 1)
The image processing method according to the first embodiment of the present invention will be described below. This embodiment relates to the connection of three or more terminals 10 (for example, four terminals 10), and will be described with reference to FIGS.

図１は、本実施形態の画像処理方法を実現する場合のフローチャートであり、図２は画像入力、被写体抽出、鏡像反転、画像合成を行う際の画像データの流れであり、図３は本実施形態におけるシステム構成の例、図４はシステムにおける端末１０内部の構成の例である。図５は鏡像反転状態を示す説明図である。 FIG. 1 is a flowchart for realizing the image processing method of the present embodiment. FIG. 2 is a flow of image data when image input, subject extraction, mirror image inversion, and image composition are performed. FIG. FIG. 4 is an example of the internal configuration of the terminal 10 in the system. FIG. 5 is an explanatory diagram showing a mirror image inversion state.

（１）システムの概要
図３に示すシステムは多地点テレビ会議の一種である。ネットワークに接続された複数の端末１０のそれぞれは、図４に示すように、まずカメラから画像を逐次撮影し、すなわち、所定時間毎に撮影して取り込むと同時に、ネットワークを介して接続された他の端末１０（外部端末１０）で撮影されたカメラ画像を取り込む。 (1) System Overview The system shown in FIG. 3 is a kind of multipoint video conference. As shown in FIG. 4, each of the plurality of terminals 10 connected to the network first sequentially captures images from the camera, that is, captures and captures images at predetermined time intervals, and simultaneously connects to other terminals connected via the network. The camera image taken by the terminal 10 (external terminal 10) is captured.

次に、取り込まれた複数の画像を重ねて合成し、画面（例えば、端末画面やテレビ画面、プロジェクタ）に表示する。 Next, the plurality of captured images are superimposed and combined and displayed on a screen (for example, a terminal screen, a television screen, or a projector).

ここで、表示された自画像を自然な画像とするためには、鏡像反転処理をする必要がある。なぜなら、ユーザが左右に移動すると表示される画面上では逆方向に移動するように見えるため使いにくいからである。なお、非特許文献１では、２地点間では自画像を鏡像反転したうえで合成することが可能であるが、３地点以上の通信では鏡像反転処理を行うことはできない。 Here, in order to make the displayed self-portrait a natural image, it is necessary to perform mirror image inversion processing. This is because it is difficult to use because it appears to move in the opposite direction on the screen displayed when the user moves left and right. In Non-Patent Document 1, it is possible to synthesize a self-portrait between two points after mirror-inversion, but mirror-image inversion processing cannot be performed in communication at three or more points.

そこで、本実施形態では、４つの端末１０を２つの端末グループのいずれかに所属させておき、各端末１０の画像処理においては、図５に示すように、自分の端末１０が属する端末グループ（以下、「自端末グループ」という）の画像を鏡像反転の対象とし、自分の端末１０が属しない端末グループ（以下、「相手グループ」という）の画像は鏡像反転の対象としない。自画像は、自端末グループ内の端末１０では鏡像反転されて表示され、相手グループ内の端末１０では鏡像反転をうけずに表示される。 Therefore, in this embodiment, the four terminals 10 belong to one of the two terminal groups, and in the image processing of each terminal 10, as shown in FIG. Hereinafter, an image of “own terminal group”) is a mirror image inversion target, and an image of a terminal group to which the terminal 10 does not belong (hereinafter referred to as “partner group”) is not a mirror image inversion target. The self-portrait is displayed with the mirror image reversed at the terminal 10 in the self-terminal group, and is displayed without being mirror-inverted at the terminal 10 in the partner group.

（２）端末１０の構成
図２及び図４に基づいて端末１０の構成について説明する。 (2) Configuration of Terminal 10 The configuration of the terminal 10 will be described with reference to FIGS.

端末１０は、モニタ１２、カメラ１４が接続され、ネットワークを介して他の３個の端末と接続されている。 The terminal 10 is connected to a monitor 12 and a camera 14 and is connected to other three terminals via a network.

端末１０内部には、画像合成部１６、３個の切り出し部１８、１８，１８と２個の鏡像反転部２０，２０とキャプチャー部２２を有している。なお、図４において、矢印の方向は画像データの流れを示している。 The terminal 10 includes an image composition unit 16, three cutout units 18, 18, 18, two mirror image inversion units 20, 20, and a capture unit 22. In FIG. 4, the direction of the arrow indicates the flow of image data.

カメラ１４で撮影した画像（すなわち、自画像）は、キャプチャー部２２を介して鏡像反転部２０に入力されて画像の左右が反転し、鏡像反転した画像は切り出し部１８に入力され背景画像から被写体のみが切り出される。この切り出し方法は背景技術で説明した技術を用いる。この被写体のみが切り出された画像が画像合成部１６に入力される。また、ネットワークを介して撮影した画像は他の３個の外部端末１０に送信される。 The image captured by the camera 14 (that is, the self-portrait) is input to the mirror image inversion unit 20 via the capture unit 22 and the left and right sides of the image are inverted, and the mirror image inverted image is input to the clipping unit 18 and only the subject from the background image Is cut out. This cutting method uses the technique described in the background art. An image obtained by cutting out only this subject is input to the image composition unit 16. In addition, images captured via the network are transmitted to the other three external terminals 10.

第１の外部端末１０は、自端末グループに属するものであり、第１の外部端末１０から送信された画像は、鏡像反転部２０、切り出し部１８を通して、鏡像反転され被写体のみが切り出された画像が画像合成部１６に入力される。 The first external terminal 10 belongs to its own terminal group, and an image transmitted from the first external terminal 10 is an image obtained by mirror-inverting the image through the mirror image reversing unit 20 and the clipping unit 18 and cutting out only the subject. Is input to the image composition unit 16.

第２の外部端末１０は、相手グループに属するものであり、第２の外部端末１０から送信された画像は、鏡像反転は行わず切り出し部１８で被写体のみが切り出された画像が画像合成部１６に入力される。 The second external terminal 10 belongs to the partner group, and the image transmitted from the second external terminal 10 is an image in which only the subject is cut out by the cutout unit 18 without performing mirror image reversal. Is input.

第３の外部端末１０は相手グループに属するものであり、この第３の外部端末１０から送信された画像は、鏡像反転は行わず、また、背景画像からの切り出しも行わず、そのままの画像が画像合成部１６に入力される。 The third external terminal 10 belongs to the partner group, and the image transmitted from the third external terminal 10 is not mirror-inverted and is not cut out from the background image. Input to the image composition unit 16.

画像合成部１６は、自端末グループの自端末１０と第１外部端末の鏡像反転され切り出された被写体と、相手グループの第２外部端末の鏡像反転されない切り出された被写体と、第３外部端末の背景画像を含む画像とを合成してモニタ１２に表示する。 The image composition unit 16 subjects the subject terminal 10 of the subject terminal group and the first external terminal to the cropped subject that is mirror-inverted, the subject that is the second external terminal of the counterpart group, and the subject that is not subject to mirror-inverted, and the third external terminal. The image including the background image is synthesized and displayed on the monitor 12.

なお、このグループ分けは、予め設定されているか、システムを動作するときに設定する。 This grouping is set in advance or is set when the system is operated.

（２）処理方法
本実施形態の画像処理方法のフローチャートを図１に示す。 (2) Processing Method A flowchart of the image processing method of this embodiment is shown in FIG.

各端末１０に通しの端末番号ｋをつけて管理する。端末番号ｋを１から端末数（上記の場合には端末数４）の各画像に対し、以下で説明するＳ１０２〜Ｓ１０７の処理を行い、得られた複数の出力画像を用いて画像合成を行い（Ｓ１０８）、表示手段（例えばディスプレイやプロジェクタ）で提示する（Ｓ１０９）。なお、各鏡像反転処理は各画像毎に独立であるため、並列処理を行うことも可能である。 Each terminal 10 is managed by attaching a terminal number k. The processing of S102 to S107, which will be described below, is performed on each image of terminal number k from 1 to the number of terminals (in the above case, the number of terminals is 4), and image synthesis is performed using the obtained plurality of output images. (S108), the information is presented by display means (for example, a display or a projector) (S109). In addition, since each mirror image inversion process is independent for each image, it is also possible to perform parallel processing.

ここで、Ｓ１０４とＳ１０５では、自端末の属する端末グループであれば、入力画像に対し鏡像反転を行い出力画像とする。そうでなければ、入力画像をそのまま出力画像とするものである。 Here, in S104 and S105, if the terminal group belongs to the own terminal, the input image is mirror-inverted to be an output image. Otherwise, the input image is directly used as the output image.

ここで具体例として、教師がホワイトボードを使いながら、生徒に対し授業を行う場合を考える。 As a specific example, consider a case where a teacher teaches a student while using a whiteboard.

この場合、教師の端末１０で撮影された画像は、ホワイトボードに書かれた文字を含む。したがって、生徒の端末１０においては鏡像反転されないことが望ましい。そこで本実施形態のもつ２グループ構成を生かし、第１のグループに教師を、第２のグループに生徒を所属させる。図４の構成では、第３外部端末１０が教師の端末であることが望ましい。 In this case, the image photographed by the teacher's terminal 10 includes characters written on the whiteboard. Therefore, it is desirable that the mirror image is not reversed at the student terminal 10. Therefore, taking advantage of the two-group configuration of this embodiment, teachers are assigned to the first group and students are assigned to the second group. In the configuration of FIG. 4, it is desirable that the third external terminal 10 is a teacher's terminal.

これにより、生徒の端末１０ではホワイトボードを含む教師の画像が鏡像反転されることはない。なお、教師の端末１０においては、教師自身の画像は鏡像反転の対象となるため、文字も鏡像反転される。この問題の解決方法は次の実施形態２で述べる。 Thereby, the image of the teacher including the white board is not mirror-inverted on the student terminal 10. In the teacher's terminal 10, since the teacher's own image is a mirror image inversion target, the characters are also mirror-inverted. A solution to this problem will be described in the second embodiment.

以上により、本実施形態では、３地点以上でもユーザにとって自然な画像の共有が実現できる。 As described above, in the present embodiment, natural image sharing for the user can be realized even at three or more locations.

（３）変更例
文字が画像中に存在しない場合は、グループ分けを行わずに全部の端末１０を１つのグループとみなしてもよく、この場合は全端末１０で鏡像画像を表示する。 (3) Modification Example When characters do not exist in the image, all terminals 10 may be regarded as one group without performing grouping. In this case, a mirror image is displayed on all terminals 10.

（実施形態２）
以下、本発明の実施形態２の画像処理方法について説明する。本実施形態では画像中の一部の領域を鏡像反転領域から除外するものであり、図６から図９を用いて説明する。 (Embodiment 2)
The image processing method according to the second embodiment of the present invention will be described below. In this embodiment, a part of the area in the image is excluded from the mirror image inversion area, and will be described with reference to FIGS.

鏡像反転を行わないで画面に合成する場合は、その画像は図６の５０１のようになり、文字は正しく表示される。 When the image is synthesized on the screen without mirror image reversal, the image is as shown by 501 in FIG. 6, and the characters are correctly displayed.

しかし、実施形態１で述べたように、鏡像画像の表示は、図７の６０１のようになり、文字は認識できない。 However, as described in the first embodiment, the display of the mirror image is as indicated by 601 in FIG. 7, and characters cannot be recognized.

そのため、従来では、例えば、電子的な仮想ホワイトボード（電子ホワイトボード）を用意しておき、ホワイトボードに書かれた文字や図形を、通信路を介して共有し表示画像に重ねて合成することで対応していた。しかし、電子機器に不慣れなユーザは、電子ホワイトボードに対し抵抗感があることも多い。また画像に対し無条件に鏡像反転を行っているため、電子ホワイトボード以外では文字を共有することができず、例えば紙に印刷された文字を見せたいとユーザが考えても、その部分は鏡像反転されたままである。 Therefore, conventionally, for example, an electronic virtual whiteboard (electronic whiteboard) is prepared, and characters and figures written on the whiteboard are shared via a communication channel and superimposed on a display image. It corresponded with. However, users who are unfamiliar with electronic devices often have a resistance to an electronic whiteboard. In addition, since the mirror image inversion is performed unconditionally on the image, the character cannot be shared except for the electronic whiteboard. For example, even if the user wants to see the character printed on paper, the portion is a mirror image. It remains inverted.

そこで、本実施形態では、この問題点を解決するものであり、以下、その画像処理方法を、図８と図９のフローチャートに基づいて説明する。 Therefore, the present embodiment solves this problem, and the image processing method will be described below based on the flowcharts of FIGS. 8 and 9.

鏡像反転した画像内の、左右対称な領域をユーザに選択させる方法を用意しておく。少なくとも２端末における画像を逐次入力した後、そのうちのいくつかの画像はそのまま画像合成入力とし、いくつかの画像は鏡像反転を行ってから画像合成入力とする。 A method is prepared for allowing the user to select a symmetric region in the mirrored image. After sequentially inputting images from at least two terminals, some of the images are used as image composition inputs, and some images are subjected to mirror image inversion before being used as image composition inputs.

鏡像反転（Ｓ８０２）された画像に対してユーザによって画像内の領域が選択されている場合は（Ｓ８０３）、鏡像反転を行った後の画像に対し指定領域のみ鏡像反転を行って（Ｓ８０４：再鏡像反転を行う代わりに指定領域のみ鏡像反転前の画像をはり付けても良い）、画像合成部１６に入力する。 If an area in the image has been selected by the user for the mirror-inverted image (S802) (S803), only the designated area is mirror-inverted for the image after mirror-inversion (S804: Re-run). Instead of performing mirror image reversal, an image before mirror image reversal may be pasted only in a designated area), and input to the image composition unit 16.

領域が選択されていない場合は、通常の鏡像反転のみを行う。 When the area is not selected, only normal mirror image inversion is performed.

その後、得られた鏡像画像及びを用いて図８に示すように画像合成（Ｓ８０５）及び画像提示（Ｓ８０６）を行う。 Thereafter, using the obtained mirror image and image synthesis (S805) and image presentation (S806) as shown in FIG.

これによって、選択された領域にユーザがいない限り、自画像は鏡像反転し、かつ文字の鏡像反転を避けることができる。 As a result, as long as there is no user in the selected area, the self-portrait is mirror-inverted, and the mirror image inversion of characters can be avoided.

本実施形態は、２個の端末であっても、多端末の構成であっても適用可能である。 This embodiment is applicable to two terminals or a multi-terminal configuration.

また、２つのグループに分ける構成を用いる場合は、通信回線を介して鏡像反転領域を共有すれば、鏡像反転領域を指定するユーザ操作は１箇所で行えばすむ。 Further, when using a configuration divided into two groups, if a mirror image inversion area is shared via a communication line, a user operation for designating the mirror image inversion area can be performed at one place.

本実施形態により、自画像等の鏡像画像においても、鏡像とすべきでない文字等の領域を自然にユーザに提示することが可能となる。 According to the present embodiment, even in a mirror image such as a self-portrait, it is possible to naturally present a region such as a character that should not be a mirror image to the user.

（実施形態３）
以下、本発明の実施形態３の画像処理方法について説明する。本実施形態では、合成画像の背景を含む画像を撮影する端末（以下、「背景側端末」という）において、被写体領域の算出及び重ねあわせを行うものであり、図１０を用いて説明する。なお、図４においては、背景側端末としては第３外部端末１０が該当する。 (Embodiment 3)
The image processing method according to the third embodiment of the present invention will be described below. In the present embodiment, a subject area is calculated and superimposed in a terminal that captures an image including a background of a composite image (hereinafter referred to as “background side terminal”), which will be described with reference to FIG. In FIG. 4, the third external terminal 10 corresponds to the background side terminal.

従来、背景側端末のユーザは常に他の端末のユーザによって重ね合わせられていた。しかし、例えば教師が生徒を指導するために使う場合は、教師が重ねあわせによって見えなくなると都合が悪いことも多い。従って、ユーザの指示によって、背景側端末のユーザを上に重ね合わせる機能を提供することは有用である。 Conventionally, the user of the background side terminal is always overlapped by the users of other terminals. However, for example, when a teacher is used to guide a student, it is often inconvenient if the teacher disappears by overlapping. Therefore, it is useful to provide a function of superimposing the user of the background side terminal on the basis of the user's instruction.

ここで、非特許文献１のようにクロマキー法を用いてシステムを構成する場合は、背景側端末では通常、単色の背景は用意しないため被写体を切り出せない。この理由は、単色の背景を合成することはユーザにあまり好まれないためである。 Here, when the system is configured using the chroma key method as in Non-Patent Document 1, the background-side terminal usually does not prepare a monochrome background, and thus the subject cannot be cut out. This is because it is not so much favored by the user to synthesize a monochrome background.

しかし、背景差分法を用いてシステムを構成する場合は、背景が単色でなくても被写体を切り出すことができるため、背景側端末の被写体を切り出すことができる。従って、背景側端末の被写体を他の端末で撮影した被写体の上に重ねて表示できる。これを利用して、ユーザの指示により背景側端末のユーザを上に重ね合わせる機能を提供する方法の一例を、図１０に示すフローチャートに従って説明する。 However, when the system is configured using the background subtraction method, the subject can be cut out even if the background is not a single color, and thus the subject of the background side terminal can be cut out. Therefore, the subject of the background side terminal can be displayed over the subject photographed by another terminal. An example of a method for providing a function of superimposing the user of the background side terminal on the basis of the user's instruction using this will be described according to the flowchart shown in FIG.

まず、別途指定された順序を重ねあわせ順序として設定する（Ｓ９０２）。例えば、端末番号１の画像上に、端末番号２、４、３の画像を順番に重ね合わせるなどである。 First, an order specified separately is set as an overlapping order (S902). For example, the images of terminal numbers 2, 4, and 3 are sequentially superimposed on the image of terminal number 1.

次に、ｋ番目の端末で撮影された画像に対し被写体の切り出しを行う（Ｓ９０４，Ｓ９０５）。これを、１から端末数までの各端末番号ｋに対して行う（Ｓ９０３〜Ｓ９０７）。なお、Ｓ９０３〜Ｓ９０７の処理は逐次処理であるが、各画像に対する被写体の切り出しは独立した処理であり、並列処理を行うことも可能である。 Next, the subject is cut out from the image captured by the k-th terminal (S904, S905). This is performed for each terminal number k from 1 to the number of terminals (S903 to S907). Note that the processes in S903 to S907 are sequential processes, but the extraction of the subject for each image is an independent process, and parallel processes can also be performed.

次に、背景画像に対し、前記重ね合わせ順序を用いて、切り出された被写体画像を重ねていく（Ｓ９０８）。このとき、背景側端末で撮影された画像も重ねることで、背景側端末の被写体を他の端末で撮影された被写体よりも上に重ねて合成することができ、先の状況の要求にこたえることができる。 Next, the cut-out subject image is superimposed on the background image using the superposition sequence (S908). At this time, by superimposing the images shot on the background side terminal, the subject on the background side terminal can be superimposed on the subject shot on the other terminal, and the previous situation can be met. Can do.

Ｓ９０２で設定する背景側端末の被写体と他の被写体との上下関係は別途用意した設定やボタン操作等で静的に決定することが可能である。また、被写体の領域（形状データ）から得られる被写体の大きさ（大きさとは例えば面積、バウンディングボックス）を用いてフレームごとに決定してもよい。例えば、被写体の大きさが大きいほど上に重ね合わせられるように重ねあわせ順序を決定すれば、ユーザがカメラに近づくほど上に重ねあわせられるため、奥行き情報を持たない２次元画像を用いていても奥行きを擬似的に実現することができ、システムが提供するコミュニケーション空間の自然さに寄与する
（実施形態４）
以下、本発明の実施形態４の画像処理方法について説明する。本実施形態では、別途撮影された参照画像を用いて背景差分法を行う方法について図１１と図１２に従って説明する。ここで、「参照画像」とは背景差分法を用いて、例えば背景と一緒に写った被写体の画像から被写体のみを切り出す場合に、その差分の基礎となる背景のみが写った画像をいう。 The vertical relationship between the subject of the background side terminal set in step S902 and other subjects can be determined statically by separately prepared settings or button operations. Alternatively, the size of the subject (size is, for example, area or bounding box) obtained from the subject area (shape data) may be determined for each frame. For example, if the overlay order is determined so that the subject is larger as the subject size is larger, the user is superimposed higher as the user gets closer to the camera. Therefore, even if a two-dimensional image having no depth information is used. Depth can be realized in a pseudo manner, contributing to the naturalness of the communication space provided by the system (Embodiment 4).
The image processing method according to the fourth embodiment of the present invention will be described below. In the present embodiment, a method for performing the background subtraction method using a separately captured reference image will be described with reference to FIGS. Here, the “reference image” refers to an image showing only the background that is the basis of the difference when the subject is cut out from the subject image taken together with the background using the background difference method.

図１１は、外部端末から画像を入力し、背景差分法及び画像合成を行う端末（自端末）のフローチャートを示し、図１２は入力画像を送信する外部端末のフローチャートを示す。 FIG. 11 shows a flowchart of a terminal (own terminal) that inputs an image from an external terminal and performs the background difference method and image synthesis, and FIG. 12 shows a flowchart of the external terminal that transmits the input image.

（１）自端末の処理
自端末は参照画像を保存する記憶する機能（参照画像メモリ）を持ち、図１１のフローチャートに示された画像処理を逐次行う。 (1) Processing of own terminal The own terminal has a function of storing a reference image (reference image memory) and sequentially performs the image processing shown in the flowchart of FIG.

画像処理においては、まず、外部端末からの画像及び画像付加情報を受信する（Ｓ１００２）。この画像付加情報は、通常の撮影画像であるか参照画像であるかを識別するための情報（例えば、０を撮影画像、１を参照画像とする２値の情報）を含むものとする。 In the image processing, first, an image and image additional information from an external terminal are received (S1002). This image additional information includes information for identifying whether the image is a normal captured image or a reference image (for example, binary information in which 0 is a captured image and 1 is a reference image).

自端末は受信された画像付加情報を参照して入力画像であるか参照画像であるかを調べる（Ｓ１００３）。 The own terminal refers to the received image additional information and checks whether it is an input image or a reference image (S1003).

画像が参照画像であったときには、参照画像メモリに保存する（Ｓ１００７）。既に参照画像がメモリに保存されていれば、それを更新する（Ｓ１００７）。 If the image is a reference image, it is stored in the reference image memory (S1007). If the reference image is already stored in the memory, it is updated (S1007).

画像が入力画像であったときには、参照画像メモリと入力画像との差分領域を検出し（Ｓ１００４）、他の画像との合成（Ｓ１００５）を行う。 When the image is an input image, a difference area between the reference image memory and the input image is detected (S1004), and synthesis with another image is performed (S1005).

（２）外部端末の処理
次に、外部端末における処理の流れを図１２に従って説明する。 (2) Processing of External Terminal Next, the flow of processing in the external terminal will be described with reference to FIG.

外部端末は画像を撮影する手段、及び参照画像の更新をユーザに指示させる手段を持っており、逐次画像の入力（Ｓ１１０２）を行っている。 The external terminal has means for capturing an image and means for instructing the user to update the reference image, and sequentially inputs images (S1102).

通信開始直後やユーザの指示による参照画像入力の指示（Ｓ１１０３）があったときには、画像付加情報に参照画像であるという情報を付与し（Ｓ１１０６）、画像及び画像付加情報を送信する。 Immediately after the start of communication or when there is a reference image input instruction (S1103) by a user instruction, information indicating that the image is a reference image is added to the image additional information (S1106), and the image and the image additional information are transmitted.

それ以外のときには、画像付加情報に入力画像であるという情報を付与し（Ｓ１１０４）、画像及び画像付加情報を送信する。 In other cases, information indicating that the image is an input image is added to the image additional information (S1104), and the image and the image additional information are transmitted.

（３）実施形態４の効果
一般のテレビ会議システムでは背景差分法のための参照画像を送信する手段を持っていないが、背景差分法は背景が変化するたびに参照画像の更新が必要となることが多い。そこで、このような構成にして外部端末はいつでも参照画像の更新を行えるようにすることで、カメラの移動や照明条件の著しい変化といった背景差分法に悪影響を与える現象が起きた場合に、参照画像の更新が行えるようになり、コミュニケーションの継続ができる。 (3) Effects of Embodiment 4 A general video conference system does not have means for transmitting a reference image for the background subtraction method, but the background subtraction method requires updating of the reference image every time the background changes. There are many cases. Therefore, the external terminal can update the reference image at any time in such a configuration, so that when the phenomenon that adversely affects the background subtraction method such as camera movement or significant change in lighting conditions occurs, the reference image Can be updated and communication can be continued.

（４）変更例
なお、被写体抽出を行う前の画像をいつでも見られるようにすれば、例えば背景差分法におけるパラメータ（例えば、前景であると判断するしきい値）の調整に有用である。 (4) Modification Example Note that if an image before subject extraction can be viewed at any time, it is useful for adjusting a parameter in the background subtraction method (for example, a threshold value for determining the foreground).

また、例えば参照画像の取り込みを行うといった端末の状態変化を送信し、外部端末の状況を受信側に表示しておけば、外部端末の被写体が参照画像の取り込みのために突然合成されなくなり受信側のユーザが戸惑うという問題を避けることができる。 For example, if the terminal status change such as capturing a reference image is transmitted and the status of the external terminal is displayed on the receiving side, the subject of the external terminal is not suddenly synthesized for capturing the reference image. The problem that the user is confused can be avoided.

また、機器の調整を画像送受信の開始時に行う場合、合成画像を共有する通信を行う前に、音声通信や合成前の画像を送受信することで、参照画像機器の調整をスムーズに行うことができる。 In addition, when device adjustment is performed at the start of image transmission / reception, the reference image device can be adjusted smoothly by transmitting / receiving voice communication or pre-combination images before performing communication for sharing a composite image. .

（実施形態５）
以下、本発明の実施形態５の画像処理方法について説明する。 (Embodiment 5)
The image processing method according to the fifth embodiment of the present invention will be described below.

背景差分で抽出した被写体の形状を利用して正確な輪郭の抽出を行うフラクタル輪郭抽出法を高速に画像処理する方法について図１３〜図１７を用いて説明する。 A fractal contour extraction method that performs accurate contour extraction using the shape of the subject extracted by background difference will be described with reference to FIGS. 13 to 17.

（１）本実施形態のフラクタル輪郭抽出法の内容
以下、図１４〜図１６に基づいて本実施形態のフラクタル輪郭抽出法の内容について説明する。 (1) Content of Fractal Contour Extraction Method of the Present Embodiment Hereinafter, the content of the fractal contour extraction method of the present embodiment will be described based on FIGS.

図１４、図１５におけるＸ１７０２は真の輪郭線、Ｘ１７０３は与えられた大まかな輪郭線とする。輪郭抽出処理では、大まかな輪郭線（すなわち、概略的な輪郭線）に基づいて複数の処理ブロックが配置される。Ｘ１７０１、Ｘ１８０１は処理ブロックの１つを表している。 14 and 15, X1702 is a true contour line, and X1703 is a given rough contour line. In the contour extraction process, a plurality of processing blocks are arranged based on a rough contour line (that is, a rough contour line). X1701 and X1801 represent one of the processing blocks.

図１４のＸ１７０１のように真の輪郭線が処理ブロックに含まれていれば、輪郭抽出処理は正しく行われる。一方、図１５のＸ１８０１のように真の輪郭線が処理ブロックに含まれていないと、輪郭抽出処理は失敗する。つまり、与えられた大まかな輪郭線に許容されるずれの程度は、処理ブロックが配置された場合に真の輪郭線が処理ブロックに含まれる程度である。 If a true contour line is included in the processing block as in X1701 in FIG. 14, the contour extraction processing is performed correctly. On the other hand, if the true contour line is not included in the processing block as in X1801 of FIG. 15, the contour extraction process fails. That is, the degree of deviation allowed for a given rough contour is such that the true contour is included in the processing block when the processing block is arranged.

従来のフラクタル輪郭抽出法では、輪郭抽出処理を毎回高い計算精度で行ってきた。しかし、前の輪郭抽出処理の結果を次の入力として再び輪郭抽出処理を行っているのであるから、前の輪郭抽出処理では次の輪郭抽出処理（対応範囲は前より小さいが、精度は前より高い）の対応範囲に入ってさえいれば良いため高い計算精度はいらない。最後の輪郭抽出処理だけは高い計算精度で行う必要があるが、フラクタル輪郭抽出法の特性（図１７）を考えると、最後の輪郭抽出処理で使う処理ブロックは小さいためいまいちよくわからないので加筆高い精度で計算を行ってもその計算量は少なくてすむ。 In the conventional fractal contour extraction method, contour extraction processing has been performed with high calculation accuracy each time. However, since the contour extraction processing is performed again using the result of the previous contour extraction processing as the next input, the next contour extraction processing in the previous contour extraction processing (corresponding range is smaller than the previous one, but the accuracy is higher than the previous one. High calculation accuracy is not required as long as it is within the corresponding range. Only the final contour extraction processing needs to be performed with high calculation accuracy. However, considering the characteristics of the fractal contour extraction method (Fig. 17), the processing blocks used in the final contour extraction processing are small, so it is not well understood. Even if the calculation is done with, the calculation amount is small.

このことを使えばフラクタル輪郭抽出法はより高速になる。 Using this, the fractal contour extraction method becomes faster.

まず、図１４に示す方法で、図１６のように大まかな輪郭線Ｘ１７０３が与えられたとする。その現段階で行う輪郭抽出処理の結果である次の大まかな輪郭線Ｘ１９０２に基づき、次の輪郭抽出処理の処理ブロックＸ１９０１が配置されたとする。この場合に、真の輪郭線が処理ブロックＸ１９０１に含まれていれば良い。処理ブロックＸ１９０１が真の輪郭線を含むように現段階の輪郭抽出処理の計算量を落とせば高速になる。 First, it is assumed that a rough outline X1703 is given by the method shown in FIG. 14 as shown in FIG. Assume that a processing block X1901 for the next contour extraction processing is arranged based on the next rough contour line X1902 that is the result of the contour extraction processing performed at the current stage. In this case, it is only necessary that the true contour line is included in the processing block X1901. If the calculation amount of the current contour extraction processing is reduced so that the processing block X1901 includes the true contour line, the processing speed is increased.

（３）フラクタル輪郭抽出法の処理のフローチャートによる説明
本実施形態のフラクタル輪郭抽出法の処理を図１３のフローチャートに基づいて説明する。 (3) Description of Flowchart of Fractal Contour Extraction Method Processing The processing of the fractal contour extraction method of this embodiment will be described based on the flowchart of FIG.

Ｓ１３０１において探索ステップ及び写像回数Ｍを、ブロックサイズに応じて適切に設定する。そして、ブロックマッチング（Ｓ１３０３）の探索をＳ画素（Ｓは自然数）毎に行うことや、縮小写像（Ｓ１３０４）の回数を減らすことにより、高速になる。 In step S1301, the search step and the number of mappings M are appropriately set according to the block size. Then, the search for block matching (S1303) is performed for each S pixel (S is a natural number), and the number of reduced mappings (S1304) is reduced, thereby increasing the speed.

但し、Ｓ１３０２〜Ｓ１３０５の処理が最後の処理である場合は、Ｓ１３０１では全探索（ステップ＝１）で誤差が１未満となるようなＭ（例えば縦横４画素であればＭ＝３とする）を設定する。 However, if the processing of S1302 to S1305 is the last processing, in S1301, M is set such that the error is less than 1 in the full search (step = 1) (for example, M = 3 if the vertical and horizontal pixels are 4 pixels). Set.

（３−１）ブロックマッチングの探索
まず、Ｓ１３０３におけるブロックマッチングの探索を高速に行うことについて説明する。 (3-1) Search for Block Matching First, the high speed search for block matching in S1303 will be described.

処理ブロックは与えられた大まかな輪郭線の中心におかれるから、与えられた大まかな輪郭線と真の輪郭線とのずれは、処理ブロックの大きさの１／２を超えてはいけない。 Since the processing block is centered on the given rough contour, the deviation between the given rough contour and the true contour should not exceed 1/2 of the size of the processing block.

また、現段階の処理ブロックと次の処理ブロックの大きさ（処理ブロックの各軸の長さ）の比は１／２である。真の輪郭線を次の処理ブロックのなかに含めるためには、現段階の輪郭抽出処理によって得られる輪郭線のずれが、現段階の処理ブロックの１／４以下になっていれば良い。つまり、現段階の処理ブロックのサイズをＢとしたときに、現段階の輪郭抽出処理における許容誤差はＢ／４である。 The ratio of the size of the current processing block to the next processing block (the length of each axis of the processing block) is ½. In order to include the true contour line in the next processing block, it is only necessary that the deviation of the contour line obtained by the contour extraction process at the current stage is ¼ or less of the processing block at the current stage. That is, when the size of the processing block at the current stage is B, the allowable error in the contour extraction process at the current stage is B / 4.

ペアレントブロックと処理ブロックの大きさの比がｒ（但し、ｒ＞１である）であるとき、ペアレントブロックの位置が１画素ずれると、得られる輪郭線はほぼ（１／（ｒ−１））画素ずれる。そして、このずれの値は、ペアレントブロックから処理ブロックへの写像の不動点が、ペアレントブロックを１画素ずらした場合の前記不動点のずれにほぼ等しくなることから求められる。なお、「ペアレントブロック」とは、背景技術で説明したように、各処理ブロックについて、その処理ブロックをＡ倍した大きさ（例えばＡ＝２．０）を持ち、最もその誤差（例えば画素毎の誤差絶対値の総和値）が小さいブロックをいう。 When the ratio of the size of the parent block to the processing block is r (where r> 1), if the position of the parent block is shifted by one pixel, the contour line obtained is approximately (1 / (r−1)). Pixel shift. The value of this shift is obtained because the fixed point of mapping from the parent block to the processing block is substantially equal to the shift of the fixed point when the parent block is shifted by one pixel. As described in the background art, the “parent block” has a size obtained by multiplying each processing block by A (for example, A = 2.0) and has the most error (for example, for each pixel). A block having a small sum of absolute errors.

Ｓ画素毎に探索した場合に、１画素毎に探索した場合のペアレントブロックに最も近いペアレントブロックが常に得られると仮定する。つまり、ペアレントブロックのずれが最大Ｓ／２画素であると仮定する。 When searching for each S pixel, it is assumed that the parent block closest to the parent block when searching for each pixel is always obtained. That is, it is assumed that the deviation of the parent block is a maximum of S / 2 pixels.

すると、（１／（ｒ−１））×（Ｓ／２）が許容されるずれを超えないようにＳを決めればよい。 Then, S may be determined so that (1 / (r−1)) × (S / 2) does not exceed an allowable deviation.

（３−２）縮小写像の処理
次に、Ｓ１３０４における縮小写像の処理について説明する。 (3-2) Reduced Mapping Processing Next, the reduced mapping processing in S1304 will be described.

本実施形態では、非特許文献３のように縮小写像が収束するまで行わずに、Ｍ回で打ち切ることで高速に行うものである。 In this embodiment, unlike the nonpatent literature 3, it does not carry out until a reduction | restoration mapping converges, but it carries out at high speed by truncating by M times.

縮小写像を１回行う毎に輪郭の誤差が（１／ｒ）になることにより、Ｍ回行うと輪郭の誤差は（１／ｒ）のＭ乗に初期許容誤差Ｂ／２を掛けた値になる。 The contour error becomes (1 / r) every time the reduced mapping is performed. Therefore, when the mapping is performed M times, the contour error is a value obtained by multiplying the Mth power of (1 / r) by the initial allowable error B / 2. Become.

従って、探索をＳ画素おきに行ったために発生する誤差と、縮小写像をＭ回で打ち切ったことによる誤差があわせてＢ／４以下である限り、輪郭抽出結果に影響を与えずに高速に輪郭抽出を行うことができる。この条件は、例えばＳ＝Ｂ／４、Ｍ＝３とすれば満たされる。 Therefore, as long as the error caused by performing the search every S pixels and the error caused by terminating the reduced mapping in M times are B / 4 or less, the contour is extracted at high speed without affecting the contour extraction result. Extraction can be performed. This condition is satisfied if, for example, S = B / 4 and M = 3.

（４）効果
以上により、処理ブロックのサイズＢが大きくてもＭを増やす必要はなく、また処理ブロックのサイズが大きいほどＳも大きくできることがわかる。 (4) Effect From the above, it can be seen that M does not need to be increased even if the size B of the processing block is large, and that S can be increased as the size of the processing block increases.

また、先に述べたように処理ブロックのサイズＢが大きいほど輪郭抽出処理に多くの時間を必要とするから、本実施形態によって処理ブロックのサイズＢが大きいときに必要な時間を短縮できれば、フラクタル輪郭抽出法を高速化できる。 Further, as described above, the larger the processing block size B is, the more time is required for the contour extraction processing. Therefore, if the time required when the processing block size B is large can be shortened according to this embodiment, the fractal can be reduced. The contour extraction method can be speeded up.

そして、本実施形態により、時間のかかる後処理であるフラクタル輪郭抽出法を高速に行うことができ、クロマキーに比べて計算量の多い処理を導入することによる円滑なコミュニケーションの阻害を解決することができる。 In addition, according to the present embodiment, the fractal contour extraction method, which is time-consuming post-processing, can be performed at high speed, and smooth communication hindrance due to the introduction of processing with a larger amount of calculation than chroma key can be solved. it can.

（５）変更例
上記では、２次元のフラクタル輪郭抽出法の高速化方法について説明したが、この方法は、例えばＭＲＩで得られる３次元画像、時系列に従う２次元画像である３次元の時空間画像、時系列に従う３次元画像である４次元の時空間画像のように３次元以上の画像であっても適用可能である。 (5) Modification Example In the above description, the speed-up method of the two-dimensional fractal contour extraction method has been described. This method is, for example, a three-dimensional space-time that is a three-dimensional image obtained by MRI or a two-dimensional image according to a time series. The present invention can also be applied to an image of three or more dimensions such as a four-dimensional spatiotemporal image that is a three-dimensional image according to an image or time series.

例えば、非特許文献４（竹島秀則，井田孝，堀修，松本信幸，「時空間画像の自己相似性を用いたオブジェクト輪郭の抽出」，電子情報通信学会技術研究報告（信学技報）、Vol.103、No.325、IE2003-62、Sept.2003.）の時空間の３次元のフラクタル輪郭抽出法に適用できる。 For example, Non-Patent Document 4 (Hidenori Takeshima, Takashi Ida, Osamu Hori, Nobuyuki Matsumoto, “Extraction of object contour using self-similarity of spatio-temporal image”, IEICE Technical Report (Science Technical Report), Vol.103, No.325, IE2003-62, Sept.2003.) It is applicable to the spatio-temporal three-dimensional fractal contour extraction method.

すなわち、非特許文献４では時間方向において、現段階の処理ブロックの大きさと次の処理ブロックの大きさが同じ場合がある。この場合は同様に考えると、探索をＳ画素おきに行ったために発生する誤差と、縮小写像をＭ回で打ち切ったことによる誤差があわせてＢ／２以下である限り、輪郭抽出結果に影響を与えずに高速に輪郭抽出を行えることがわかる。 That is, in Non-Patent Document 4, the size of the current processing block may be the same as the size of the next processing block in the time direction. In this case, if the same consideration is made, the contour extraction result will be affected as long as the error caused by performing the search every S pixels and the error caused by cutting the reduced map M times are equal to or less than B / 2. It can be seen that contour extraction can be performed at high speed without giving.

本発明は、例えば、複数の参加者が１つの合成画像を共有しながらコミュニケーションを行うための画像処理方法に好適である。 The present invention is suitable, for example, for an image processing method in which a plurality of participants communicate while sharing one composite image.

本発明の実施形態１の自端末グループに対してのみ鏡像反転を行う方法のフローチャートである。It is a flowchart of the method of performing mirror image inversion only with respect to the own terminal group of Embodiment 1 of this invention. ４つの端末からの画像に対し画像処理を行う場合の画像データの例である。It is an example of the image data in the case of performing image processing on images from four terminals. 本実施形態を利用したシステム構成例である。It is an example of a system configuration using this embodiment. 本実施形態を利用したシステムの端末の構成例である。It is a structural example of the terminal of the system using this embodiment. 鏡像反転の関係を示す説明図である。It is explanatory drawing which shows the relationship of mirror image inversion. 実施形態２の鏡像反転しない画面の例である。It is an example of the screen which does not invert the mirror image of Embodiment 2. 全体を鏡像反転した画面の例である。It is an example of the screen which reversed the whole image. 文字領域のみを鏡像反転の対象から除外した画面の例である。It is an example of the screen which excluded only the character area from the object of mirror image reversal. 実施形態３の一部の領域を鏡像反転の対象からはずす方法のフローチャートである。10 is a flowchart of a method of removing a part of a region from a target for mirror image inversion according to the third embodiment. 実施形態４の自端末の画像を上に重ねることが可能な画像処理方法のフローチャートである。It is a flowchart of the image processing method which can superimpose the image of the own terminal of Embodiment 4 on top. 付加情報を利用して参照画像と対象画像を分離して処理する方法である。In this method, a reference image and a target image are separated and processed using additional information. 参照画像と対象画像を区別するための情報を付加して送信する方法である。In this method, information for distinguishing between a reference image and a target image is added and transmitted. 実施形態５の高速なフラクタル輪郭抽出法のフローチャートである。10 is a flowchart of a high-speed fractal contour extraction method according to the fifth embodiment. 真の輪郭線が処理ブロックに含まれているときの説明図である。It is explanatory drawing when a true outline is contained in the processing block. 真の輪郭線が処理ブロックに含まれていないときの説明図である。It is explanatory drawing when a true outline is not contained in a processing block. 次の輪郭抽出処理の処理ブロックが配置された説明図である。It is explanatory drawing by which the processing block of the next outline extraction process was arrange | positioned. 処理ブロックのサイズが大きいほど、対応可能範囲は大きく、計算量は多く、得られる輪郭の精度は低くなるという特性を示す説明図である。It is explanatory drawing which shows the characteristic that the response | compatibility range is large, the amount of calculations is large, and the precision of the acquired outline becomes low, so that the size of a process block is large. 従来のシステムにおける画像処理方法のフローチャートである。It is a flowchart of the image processing method in the conventional system. 従来のシステムの構成例である。It is a structural example of the conventional system. 従来のフラクタル輪郭抽出法のフローチャートである。It is a flowchart of the conventional fractal outline extraction method.

Explanation of symbols

１０端末
１２モニタ
１４カメラ
１６画像合成部
１８切り出し部
２０鏡像反転部
２２キャプチャー部
DESCRIPTION OF SYMBOLS 10 Terminal 12 Monitor 14 Camera 16 Image composition part 18 Clipping part 20 Mirror image inversion part 22 Capture part

Claims

An image processing method for a terminal that is composed of three or more terminals connected via a network and that inputs and outputs images taken by the respective photographing means of the plurality of terminals via the network. In the image processing method in the terminal constituting the system,
The plurality of terminals are divided into a first terminal group to which the own terminal belongs and a second terminal group to which the own terminal does not belong, and the own terminal belonging to the first terminal group and one or more other An image inversion step for mirror inversion of all images input from the terminal of
A composite image is generated by superimposing and synthesizing each image of the first terminal group that has been mirror-inverted and an image input from one or more other terminals belonging to the second terminal group. A synthesis step to
An image processing method comprising:

A self-portrait input step for inputting a self-portrait photographed by the photographing means of the self-terminal;
An external terminal image input step of inputting each external terminal image captured by a plurality of external terminals via a network;
An image inversion step of mirror-inverting each image input from the own terminal belonging to the first terminal group including the own terminal and other external terminals;
A synthesis step of superposing and synthesizing each image of the first terminal group that has been mirror-inverted and an external terminal image of a second terminal group that does not belong to the first terminal group, and generating a synthesized image;
An image processing method comprising:

The image processing method according to claim 2, wherein all of the own terminal and the external terminal belong to the first terminal group.

In an image processing method in a terminal comprising a plurality of terminals connected via a network, and constituting a system for inputting and outputting images taken by the respective photographing means of the plurality of terminals to each other via the network,
An image reversing step of mirror inverting at least one of the images;
An image replacement step of replacing only the inversion processing exclusion area designated in the mirror image inverted image with an image not mirror-inverted;
A synthesis step of superimposing and synthesizing one image in which only the inversion processing exclusion area is replaced with an image that is not mirror-inverted and another image, and generating a synthesized image;
An image processing method comprising:

In an image processing method in a terminal comprising a plurality of terminals connected via a network, and constituting a system for inputting and outputting images taken by the respective photographing means of the plurality of terminals to each other via the network,
A first subject step for calculating a subject that is a subject area of one of the images;
A second subject step for calculating a subject that is a subject region of an image other than the one image;
A composition step of superimposing and synthesizing the subject of the one image and the subject of an image other than the one image on the one image in accordance with a predetermined superposition sequence;
An image processing method comprising:

The superposition order in the synthesis step is:
Calculate the size of the subject from the subject of each image,
Compare the calculated size of each subject,
The image processing method according to claim 5, wherein the determination is made based on a result of the comparison.

The own terminal in a system that is composed of one or a plurality of external terminals connected to the own terminal via a network and that inputs and outputs images taken by the respective photographing means of the plurality of terminals via the network In the image processing method of
An external terminal image input by an external terminal image captured by one external terminal among the external terminals; and
A reference image input step for inputting a reference image that is an image taken by the one external terminal and to which information that can be distinguished from the external terminal image is added;
A storage step of storing the input reference image as a storage reference image;
When the external terminal image is input, a subject area calculation step of detecting different parts of the stored reference image and the external terminal image and calculating the different parts as subject areas of the external terminal image;
An external terminal image captured by an external terminal other than the one external terminal, or a composite image obtained by superimposing the calculated subject area on the self-portrait captured by the self-terminal, and obtaining a composite image;
An image processing method comprising:

A first step of inputting N-dimensional image data composed of a set of N-dimensional (where N is a natural number of 2 or more) unit pixel data;
A second step of inputting N-dimensional shape data indicating a rough shape of the predetermined shape represented in the N-dimensional image data;
A third step of arranging an N-dimensional processing block having a predetermined size and shape at a boundary portion of the N-dimensional shape data;
Search processing for searching for an N-dimensional parent block most similar to the N-dimensional processing block in the N-dimensional image data based on a predetermined conversion method, and the searched N-dimensional parent block in the N-dimensional shape data A fourth step of repeating the replacement process of replacing the contents of the N-dimensional processing block with the contents of the M times, respectively,
A fifth step of outputting N-dimensional shape data obtained by repeating the M times as data of the predetermined shape;
Comprising
In the fourth step,
The size of the (k + 1) -th (where M> k ≧ 1) N-dimensional processing block is reduced from the size of the k-th N-dimensional processing block;
In the search process, the larger the (k + 1) -th N-dimensional processing block size, the rougher the search is performed.

Self-portrait input means for inputting a self-portrait photographed by the photographing means of the own terminal;
An external terminal image input means for inputting each external terminal image captured by a plurality of external terminals via a network;
Image inverting means for mirror-inverting each image input from the own terminal belonging to the first terminal group including the own terminal and other external terminals;
Combining means for superimposing and synthesizing each image of the first terminal group that is mirror-inverted and an external terminal image of a second terminal group that does not belong to the first terminal group, and generating a synthesized image;
An image processing apparatus comprising: