JP2006190207A

JP2006190207A - Image generating method, device and program, image processing method, device and program

Info

Publication number: JP2006190207A
Application number: JP2005003083A
Authority: JP
Inventors: Hidenori Takeshima; 秀則竹島; Takashi Ida; 孝井田; Takeshi Mita; 雄志三田; Koji Yamamoto; 晃司山本; Koichi Masukura; 孝一増倉; Yasunori Taguchi; 安則田口; Kenzo Isogawa; 賢造五十川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-01-07
Filing date: 2005-01-07
Publication date: 2006-07-20

Abstract

<P>PROBLEM TO BE SOLVED: To present an object image hidden in a moving image. <P>SOLUTION: A series of image frames are input, region information showing a region occupied by an object in the series of image frames is input, a target point in a first image frame of the series of image frames is input, and a hidden object region of the object corresponding to the target point is estimated in accordance with the region information, and a brightness value for the estimated hidden object region is found from the series of image frames to generate an image frame to be presented. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、例えば２次元のカメラで撮影された動画像において、一部の映像シーンで他の物体により隠蔽された被写体があるとき、被写体の隠蔽領域を推定して提示するための画像生成方法、画像生成装置、および画像生成プログラムならびに画像処理方法、画像処理装置、および画像処理プログラムに関する。 The present invention relates to an image generation method for estimating and presenting a concealment area of a subject when there is a subject concealed by another object in a part of a video scene, for example, in a moving image captured by a two-dimensional camera. The present invention relates to an image generation apparatus, an image generation program, an image processing method, an image processing apparatus, and an image processing program.

撮影された映像シーンにおいて隠蔽された被写体を見たいという要求がある。 There is a demand to view a subject hidden in a captured video scene.

この実現にあたり、従来、例えば下記非特許文献１に記載されているように、あらかじめ隠蔽された領域を含んだ映像を多数撮影して３次元空間に対応するデータを保持しておき、ユーザにより３次元空間中の位置（すなわち視点）を変化させる指示があると、これに応じて視点を変化させた３次元空間の画像を提示するようにしている。 In realizing this, conventionally, as described in Non-Patent Document 1 below, for example, a large number of videos including areas concealed in advance are captured and data corresponding to a three-dimensional space is stored. When there is an instruction to change the position (that is, the viewpoint) in the dimensional space, an image in the three-dimensional space with the viewpoint changed according to the instruction is presented.

下記非特許文献１に記載された画像提示処理の概略手順は次の通りである。隠蔽された領域をユーザが見るためには、先ず３次元空間における視点変更のユーザ指示を受け付ける。つまり、隠蔽された領域をあらわにするような視点に変更する。受け付けた指示に従い、画像提示処理プログラムは、保持している３次元画像情報に基づいて２次元表示画像を生成する。２次元表示画像の生成は、あらかじめ撮影しておいた多視点の画像から、上記ユーザ指示による視点にもっとも近い画像を探し出し、必要に応じて補間を行うことにより行われる。生成された２次元表示画像を提示することにより、ユーザは３次元画像において隠蔽された領域を見ることができる。
S. E. Chen, "QuickTime VR: an image-based approach to virtual environment navigation", Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (ACM SIGGRAPH 95), pp.29-38, 1995. The general procedure of the image presentation process described in Non-Patent Document 1 below is as follows. In order for the user to see the concealed area, first, a user instruction to change the viewpoint in the three-dimensional space is accepted. In other words, the viewpoint is changed so as to reveal the hidden area. In accordance with the received instruction, the image presentation processing program generates a two-dimensional display image based on the held three-dimensional image information. The generation of the two-dimensional display image is performed by searching for an image closest to the viewpoint instructed by the user from multi-viewpoint images taken in advance and performing interpolation as necessary. By presenting the generated two-dimensional display image, the user can see the hidden area in the three-dimensional image.
SE Chen, "QuickTime VR: an image-based approach to virtual environment navigation", Proceedings of the 22nd annual conference on Computer graphics and interactive techniques (ACM SIGGRAPH 95), pp.29-38, 1995.

３次元空間において隠蔽された被写体を画像として提示するにあたり、任意の視点に対応した画像情報が既に得られている場合に限っては、上記非特許文献１に記載のような従来技術は有効である。しかしながら、２次元の動画像撮影用カメラによって撮影されたような画像は視点が固定であったり、視点情報が欠落していたりする。つまり、３次元空間上の任意の視点に対応した画像が常に準備される訳ではない。したがって、ユーザによる視点変更の指示に応じて提示すべき画像を再構成することは本来的にできない。 When presenting an object hidden in a three-dimensional space as an image, the conventional technique as described in Non-Patent Document 1 is effective only when image information corresponding to an arbitrary viewpoint is already obtained. is there. However, an image such as that captured by a two-dimensional moving image capturing camera has a fixed viewpoint or lacks viewpoint information. That is, an image corresponding to an arbitrary viewpoint in the three-dimensional space is not always prepared. Therefore, it is inherently impossible to reconstruct an image to be presented in accordance with a viewpoint change instruction from the user.

そして、上記２次元の動画像撮影用カメラ（例えば防犯カメラなど）によって撮影された画像についても、隠蔽された被写体を画像として提示できる技術を実現することが要請されている。 Further, it is demanded to realize a technique capable of presenting a concealed subject as an image with respect to an image captured by the above-described two-dimensional moving image capturing camera (for example, a security camera).

そこで本発明は、２次元の動画像撮影用カメラによって撮影された画像のように視点が固定であったり、視点情報が欠落しているような場合であっても、隠蔽された被写体を画像として提示できるようにするための画像生成方法、画像生成装置、および画像生成プログラムならびに画像処理方法、画像処理装置、および画像処理プログラムを提供することを目的とする。 Therefore, the present invention uses a hidden subject as an image even when the viewpoint is fixed or the viewpoint information is missing, such as an image captured by a two-dimensional moving image capturing camera. An object is to provide an image generation method, an image generation apparatus, an image generation program, an image processing method, an image processing apparatus, and an image processing program for enabling presentation.

本発明の一観点に係る画像生成方法は、一連の画像フレームを入力するステップと、前記一連の画像フレーム内で物体が占める領域を示す領域情報を入力するステップと、前記一連の画像フレームにおける第１の画像フレーム内の注目点を入力するステップと、前記注目点に対応する物体の隠蔽物体領域を前記領域情報に基づいて推定するステップと、推定された隠蔽物体領域の輝度値を前記一連の画像フレームから求めることにより提示用画像フレームを生成するステップとを有する画像生成方法である。 An image generation method according to an aspect of the present invention includes: a step of inputting a series of image frames; a step of inputting region information indicating an area occupied by an object in the series of image frames; A step of inputting an attention point in one image frame, a step of estimating a concealing object region of an object corresponding to the attention point based on the region information, and a luminance value of the estimated concealing object region Generating a presentation image frame by obtaining from the image frame.

本発明によれば、動画像において隠蔽された被写体像を提示できるようになる。 According to the present invention, a subject image concealed in a moving image can be presented.

(用語の定義)
本発明の実施形態を説明するにあたり、以下の用語を定義する。 (Definition of terms)
In describing embodiments of the present invention, the following terms are defined.

２次元のユークリッド距離空間の部分集合を「画面」と称する。画面を構成する単位領域(画面に含まれる２次元の集合)を「画素」と称する。画素は、例えば画面を左右、上下それぞれ等間隔に区切られた領域とする。画面の区切り方はこれに限定されず、例えばゆがみがあっても良いし、正６角形を用いてハチの巣状に区切られていても良い。画素に関連付けられたベクトルを「輝度値」と称する。輝度値は、例えばデジタルカメラにおける３次元のＲＧＢ値やＹＵＶ値に相当し、色を表す。各画素に輝度値が関連付けられた画面全体を「画像」又は「静止画像」と称する。時系列順に並べられた複数の画像を「動画像」と称する。また、動画像中の一枚の画像を「画像フレーム」と称する。すなわち、動画像は一連の画像フレームから構成される。 A subset of the two-dimensional Euclidean metric space is referred to as a “screen”. A unit area (two-dimensional set included in the screen) constituting the screen is referred to as a “pixel”. The pixels are, for example, regions in which the screen is divided at equal intervals on the left and right and top and bottom. The method of dividing the screen is not limited to this. For example, the screen may be distorted or may be divided into a honeycomb shape using a regular hexagon. A vector associated with a pixel is referred to as a “luminance value”. The luminance value corresponds to, for example, a three-dimensional RGB value or YUV value in a digital camera, and represents a color. The entire screen in which the luminance value is associated with each pixel is referred to as “image” or “still image”. A plurality of images arranged in time series are referred to as “moving images”. One image in the moving image is referred to as an “image frame”. That is, a moving image is composed of a series of image frames.

入力された情報等に基づく画像処理によって未知の情報を決定することを「推定」と称する。画像内の各画素が被写体であるか背景であるかを区別するデータ（例えば０，１の２値、あるいは０〜１の範囲の小数値）を「マスク画像」と称する。また、画像内の各画素がどの領域に属するかをあらわすデータ（例えば画素ごとにＮ個の領域に対応する１、２、３、４、・・・、Ｎという領域番号を付与したデータ）を「ラベル画像」と称する。 Determining unknown information by image processing based on input information or the like is referred to as “estimation”. Data that distinguishes whether each pixel in the image is the subject or the background (for example, binary values of 0 and 1 or decimal values in the range of 0 to 1) is referred to as a “mask image”. In addition, data indicating which region each pixel in the image belongs to (for example, data assigned region numbers of 1, 2, 3, 4,..., N corresponding to N regions for each pixel). This is called “label image”.

図１は本発明の一実施形態に係る画像生成装置のブロック図である。同図に示すように、画像生成装置１００は、動画像記憶部１０１、領域情報記憶部１０２、隠蔽領域推定部１０３、画像生成部１０４を具備する。画像生成装置１００には表示装置１０５およびマウス１０６が接続されている。動画像記憶部１０１は動画像入力部１０７を介して入力された動画像の一連の画像フレームを記憶する。また、領域情報記憶部１０２は領域情報入力部１０８を介して入力された物体の領域情報を記憶する。この物体の領域情報は、入力された動画像の一連の画像フレームの各々において物体が占める領域を示す領域情報である。 FIG. 1 is a block diagram of an image generation apparatus according to an embodiment of the present invention. As illustrated in FIG. 1, the image generation apparatus 100 includes a moving image storage unit 101, a region information storage unit 102, a hidden region estimation unit 103, and an image generation unit 104. A display device 105 and a mouse 106 are connected to the image generation device 100. The moving image storage unit 101 stores a series of image frames of a moving image input via the moving image input unit 107. The area information storage unit 102 stores area information of an object input via the area information input unit 108. The area information of the object is area information indicating an area occupied by the object in each of a series of image frames of the input moving image.

ユーザは、表示装置１０５を観察しながらマウス１０６を操作することにより、一連の画像フレームにおける第１の画像フレーム内の注目点を入力することができる。隠蔽領域推定部１０３は、入力された注目点に対応する物体の隠蔽物体領域を、領域情報記憶部１０２に記憶された領域情報に基づいて推定する。画像生成装置１００は、推定された隠蔽物体領域の輝度値を動画像記憶部１０１に記憶された動画像データから求めることにより提示用画像フレームを生成する。生成された提示用画像フレームは例えば表示装置１０５により表示される。 The user can input a point of interest in the first image frame in a series of image frames by operating the mouse 106 while observing the display device 105. The hidden area estimation unit 103 estimates the hidden object area of the object corresponding to the input point of interest based on the area information stored in the area information storage unit 102. The image generation device 100 generates an image frame for presentation by obtaining the estimated luminance value of the hidden object region from the moving image data stored in the moving image storage unit 101. The generated presentation image frame is displayed by the display device 105, for example.

画像生成装置１００は、動画像記憶部１０１に記憶された一連の画像フレームに基づいて、物体の領域情報を推定する領域情報推定部を具備してもよい。物体の領域情報の推定については後に詳述する。 The image generation apparatus 100 may include a region information estimation unit that estimates region information of an object based on a series of image frames stored in the moving image storage unit 101. The estimation of the object area information will be described in detail later.

尚、本発明は、コンピュータを画像生成装置１００として機能させるプログラムとして実施することもできる。この場合、本発明に係るプログラムは、コンピュータ内のプログラム記憶装置に格納される。プログラム記憶装置は、例えば不揮発性半導体記憶装置や磁気ディスク装置等からなる。上記プログラムが図示しないＣＰＵからの制御でランダムアクセスメモリ（ＲＡＭ）に読み込まれ、同ＣＰＵにより実行されることにより、コンピュータを画像生成装置１００として機能させることができる。なお、このコンピュータには、各種コンピュータ資源を管理し、ファイルシステムや各種通信機能、ならびにグラフィカルユーザインタフェース（ＧＵＩ）等を提供するオペレーティングシステムも導入されている。 Note that the present invention can also be implemented as a program that causes a computer to function as the image generation apparatus 100. In this case, the program according to the present invention is stored in a program storage device in the computer. The program storage device is composed of, for example, a nonvolatile semiconductor storage device or a magnetic disk device. The computer can be caused to function as the image generation apparatus 100 by being read into a random access memory (RAM) under the control of a CPU (not shown) and executed by the CPU. Note that an operating system that manages various computer resources and provides a file system, various communication functions, a graphical user interface (GUI), and the like is also installed in this computer.

本発明の実施形態に係る画像生成方法の概要を、図２を参照しながら説明する。 An overview of an image generation method according to an embodiment of the present invention will be described with reference to FIG.

動画像において、隠された領域の画像が他のフレームに含まれていることはしばしばある。この例を図２に示す。図２は、被写体201と、当該視点から見た際の該被写体201に対する障害物202とが同時に撮影された動画像の画像フレームを示したものである。図２の中央のフレームにおいて、被写体201の中央付近は障害物202に阻まれて見ることができない。しかし、中央のフレームで見えない部分は、左のフレームや右のフレームでは見えている。例えば中央のフレームを注目フレームとするとき、本発明の実施形態に係る画像生成方法は、他のフレームで見えている画像から中央のフレームにおいて障害物２０２等によって見ることができない隠蔽物体領域を推定し、この中央のフレームについて、隠蔽物体領域の輝度値を求めることにより提示用画像フレームを生成するというものである。 In a moving image, an image of a hidden area is often included in another frame. An example of this is shown in FIG. FIG. 2 shows an image frame of a moving image in which the subject 201 and the obstacle 202 with respect to the subject 201 when viewed from the viewpoint are simultaneously photographed. In the center frame of FIG. 2, the vicinity of the center of the subject 201 is blocked by the obstacle 202 and cannot be seen. However, the part that cannot be seen in the center frame is visible in the left and right frames. For example, when the central frame is a frame of interest, the image generation method according to the embodiment of the present invention estimates a hidden object region that cannot be seen by the obstacle 202 or the like in the central frame from an image that is visible in another frame. The presentation image frame is generated by obtaining the luminance value of the hidden object region for the central frame.

(物体領域情報が既知の場合の処理手順)
動画像に物体領域情報が付与されており、それが利用可能である場合、該物体領域情報は領域情報入力部１０８を介して本装置１００に入力され、領域情報記憶部１０２に記憶される。以下、このように物体領域情報が既知である場合の処理手順を図３に示すフローチャートを参照しながら説明する。物体領域情報は、具体的には、画像中の物体の存在する領域をあらわすデータからなる。例えば図２に示した左フレームの被写体201の物体領域情報は、図４の黒領域701のように示すことができる。 (Processing procedure when object area information is known)
When object area information is given to a moving image and it can be used, the object area information is input to the apparatus 100 via the area information input unit 108 and stored in the area information storage unit 102. Hereinafter, the processing procedure when the object region information is known in this way will be described with reference to the flowchart shown in FIG. Specifically, the object area information includes data representing an area where an object exists in the image. For example, the object area information of the subject 201 in the left frame shown in FIG. 2 can be shown as a black area 701 in FIG.

図３に示すように、はじめに、注目点の入力、例えばユーザによる物体指定の入力を受理する(S101)。上述したように、この注目点の入力はマウス１０６等を用いて行うことができる。次に、領域情報記憶部１０２に記憶されている物体領域情報に基づいて、注目点を含む物体（指定物体）について、その隠蔽物体領域を複数のフレームに対して推定し(S102)、注目フレーム以外のフレームの物体領域の画像から注目するフレームにおける隠蔽物体領域の画像を求め、注目フレームの画像を更新する(S103)。最後に、隠蔽物体領域を含む、更新後の注目フレーム画像を表示する(S104)。なお、S104においては、被写体201を含む注目フレームの全体を表示してもよいし、隠蔽物体領域に相当する画像部分の輝度値が更新された被写体201のみからなる画像を抽出して提示してもよい。 As shown in FIG. 3, first, an input of a point of interest, for example, an input of object designation by a user is accepted (S101). As described above, the attention point can be input using the mouse 106 or the like. Next, based on the object area information stored in the area information storage unit 102, for the object (designated object) including the point of interest, the concealed object area is estimated for a plurality of frames (S102). The image of the concealed object region in the frame of interest is obtained from the image of the object region of other frames, and the image of the frame of interest is updated (S103). Finally, the updated attention frame image including the hidden object region is displayed (S104). Note that in S104, the entire frame of interest including the subject 201 may be displayed, or an image consisting only of the subject 201 with the updated luminance value of the image portion corresponding to the concealed object region may be extracted and presented. Also good.

(プレビュー表示)
物体領域情報を利用して、注目点の指定による画像選択の際にユーザに対してフィードバック(プレビュー)を提示することが好ましい。これを図５のフローチャートに沿って説明する。図６の左側に示すようにユーザがマウスカーソル３０１を画像中の物体上に移動させた場合を考える(クリックはされていない)。ここで、物体領域情報を参照することにより、マウスカーソル３０１に対応する物体領域を知ることができる(S601)。マウスカーソル３０１によってどの領域が選択されようとしているのかを図６の右側に示すように概略形状マーク401として画面に表示する(S602)。これは、MPEG-7のメタデータのように物体領域情報が概略のみを含む場合であっても適用可能である。なお、物体領域情報が付与されていない場合でも、ユーザに画像を提示する前に個々の物体領域を求めておけば同様の処理を実現可能である。 (Preview display)
It is preferable to present feedback (preview) to the user when selecting an image by designating a point of interest using object region information. This will be described with reference to the flowchart of FIG. Consider the case where the user moves the mouse cursor 301 over an object in the image as shown on the left side of FIG. 6 (not clicked). Here, the object region corresponding to the mouse cursor 301 can be known by referring to the object region information (S601). Which region is about to be selected by the mouse cursor 301 is displayed on the screen as a schematic shape mark 401 as shown on the right side of FIG. 6 (S602). This is applicable even when the object region information includes only an outline like MPEG-7 metadata. Even when object area information is not given, the same processing can be realized by obtaining individual object areas before presenting an image to the user.

(変形例) なお、本実施形態ではマウス１０６を用いる操作例としているが、マウスカーソル３０１を動かして決定する操作には、キーボードやジョイスティックやリモコン(上下左右への移動、決定にそれぞれキーボードのボタンを使用) 、音声認識(上下左右、決定を声で指示する、オブジェクトが既知の場合はあらかじめオブジェクトの順序を用意しておき、マウスカーソルを上下左右に移動させるかわりに次のオブジェクトへの移動を声で指示するなど)等を使用しても良い。 (Modification) In this embodiment, an operation example using the mouse 106 is used. However, for the operation to be determined by moving the mouse cursor 301, a keyboard, a joystick, or a remote controller (moving up / down / left / right, keyboard buttons for determination) ), Voice recognition (up / down / left / right, instructing the decision by voice, if the object is known, prepare the object order in advance and move the mouse cursor up / down / left / right to move to the next object. Etc.) may be used.

(隠された領域の推定方法)
図３のS102(注目点に対応する物体領域の推定)における具体的な処理手順を図７のフローチャートに従って説明する。物体領域情報が与えられると、図８の801に示すようにフレームごとの物体領域は既知となるが、隠された領域は背景と同じように扱われるために未知の領域である。例えば、図８において被写体801の下部が欠落していることはこの時点ではわからない。しかし、図８の被写体801と別のフレームの被写体(図４の被写体701)が同じ被写体であることは既知である。そこで、対象フレームの形状801(S1501)、参照フレームの形状(S1502)が与えられたら、被写体701に例えばアファイン変換(平行移動、回転などの線形変換による変形，図９参照)を施した変形被写体(S1503)すなわち変形領域情報を多数用意し（例えば−１６〜＋１６画素の範囲の平行移動、−１５度〜＋１５度までの回転、０．８０〜１．２０までのサイズ変更をそれぞれ別途定めた間隔で変化させたパターンを用意する）、それぞれの変形被写体と被写体801とを比較して類似度Ｈを算出し(S1504)、最も被写体との類似度Ｈが近いものを探す(S1505)。これにより、図４の被写体701と図８の被写体801とを対応付けるアファイン変換のパラメータが得られる。 (Hidden area estimation method)
A specific processing procedure in S102 (estimation of the object region corresponding to the point of interest) in FIG. 3 will be described with reference to the flowchart in FIG. When the object area information is given, the object area for each frame is known as indicated by reference numeral 801 in FIG. 8, but the hidden area is an unknown area because it is treated in the same manner as the background. For example, it is not known at this time that the lower part of the subject 801 is missing in FIG. However, it is known that the subject 801 in FIG. 8 and the subject in another frame (subject 701 in FIG. 4) are the same subject. Therefore, given the shape 801 (S1501) of the target frame and the shape (S1502) of the reference frame, the subject 701 is subjected to, for example, affine transformation (deformation by linear transformation such as translation and rotation, see FIG. 9). (S1503) That is, a large amount of deformation area information is prepared (for example, translation in the range of −16 to +16 pixels, rotation from −15 degrees to +15 degrees, and size change from 0.80 to 1.20 are separately defined. A pattern changed at intervals is prepared), and the similarity H is calculated by comparing each deformed subject and the subject 801 (S1504), and the one with the closest similarity H to the subject is searched (S1505). Thereby, affine transformation parameters for associating the subject 701 in FIG. 4 with the subject 801 in FIG. 8 are obtained.

このようにアファイン変換のパラメータは、参照フレームの領域から現在フレームの領域への変形をあらわすパラメータであって、例えば「前のフレームを左に１画素、下に４画素ずらし、１０度左回転したものを０．９倍に縮小」といったアファイン変換（他の変換も表現可能）を具体的に特定するパラメータである。本実施形態において上記のように求められたアファイン変換のパラメータによれば、現在フレームで欠落する部分が、参照フレームでどの部分に対応するかがわかる。すなわち、参照フレーム内の輝度値（コピー元）を決めることができる。 As described above, the affine transformation parameter is a parameter representing the transformation from the reference frame region to the current frame region. For example, “the previous frame is shifted by 1 pixel to the left and 4 pixels downward, and rotated 10 degrees to the left. This is a parameter that specifically specifies the affine transformation (reduction of other things by 0.9). According to the affine transformation parameters obtained as described above in the present embodiment, it can be determined which part of the reference frame corresponds to the part missing in the current frame. That is, the luminance value (copy source) in the reference frame can be determined.

領域情報を比較する類似度Ｈとしては、例えば参考文献１（Haifeng Xu et al., "Automatic Moving Object Extraction for Content-Based Applications", IEEE Transactions on Circuits and Systems for Video Technology, pp.796-812, vol.14, no.6, June 2004.）に記載の拡張ハウスドルフ距離や、変形被写体と被写体801で異なる領域の画素数とすることができる。 For example, Reference 1 (Haifeng Xu et al., “Automatic Moving Object Extraction for Content-Based Applications”, IEEE Transactions on Circuits and Systems for Video Technology, pp. 796-812, vol.14, no.6, June 2004.) or the number of pixels in different regions between the deformed subject and the subject 801.

ここで、拡張ハウスドルフ距離について説明する。はじめに、点と点の距離をユークリッド距離又は市街地距離として定義する。このとき、領域Ｘ上の１点ｘと領域Ｙの距離は「領域Ｙ上のすべての点ｙに対し点ｘとの距離を算出したとき、その最小値」となる。このとき、領域Ｘから領域Ｙへの拡張ハウスドルフ距離は、領域Ｘ上のＮ個の点に対して「領域Ｘ上の１点ｘと領域Ｙの距離」を求めれば、そのＫ番目（０≦Ｋ≦Ｎ）の値として求められる。領域Ｘから領域Ｙへの拡張ハウスドルフ距離と領域Ｙから領域Ｘへの拡張ハウスドルフ距離は同じとは限らず、別の値が得られることもある。特に、一方の領域Ｘが他方の領域Ｙに含まれている場合は、領域Ｘ（小さいほうの領域）から領域Ｙ（大きいほうの領域）への距離は０となる。この(方向を持つ)拡張ハウスドルフ距離を使い、例えばこの２つの距離の最大値、あるいはいずれか一方の値を用いて類似度とする。 Here, the extended Hausdorff distance will be described. First, the distance between points is defined as the Euclidean distance or the city area distance. At this time, the distance between one point x on the region X and the region Y is “the minimum value when the distances from the point x to all the points y on the region Y are calculated”. At this time, the extended Hausdorff distance from the region X to the region Y is the Kth (0) when the “distance between one point x on the region X and the region Y” is obtained for N points on the region X. ≦ K ≦ N). The extended Hausdorff distance from the region X to the region Y and the extended Hausdorff distance from the region Y to the region X are not necessarily the same, and different values may be obtained. In particular, when one region X is included in the other region Y, the distance from the region X (smaller region) to the region Y (larger region) is zero. Using this extended Hausdorff distance (with direction), for example, the maximum value of these two distances or one of the values is used as the similarity.

図９に示される黒領域は、物体領域に対応するラベルを表している。対象フレーム(図９のX03)に対して最も類似度の近いフレームを参照フレームX01、参照フレームX02から探し、X03の画像と重ねる(OR領域を算出する)と、X04、X05のように、それぞれ被写体の一部が欠落した画像を生成できる。これらを重ね合わせる(OR領域を算出する)と、障害物に隠される前の被写体の形状(X06)がわかる。得られた形状X06には存在するがX03には存在しない部分を、以後の復元の対象とする。この例はすべての参照フレームのOR領域を計算する場合であるが、対象フレームと個々の参照フレームを重ねた形状(X04、X05のような形状)が多数あると、それらすべてのOR領域の計算によりノイズ等を含む可能性が高くなる。この場合、あらかじめしきい値を決めておき、画素ごとに、対象フレームと個々の参照フレームを重ねた形状が何回出現するかをカウントし、その数がしきい値以上である場合にのみ物体領域であると判断することとすれば、ノイズの影響が抑えられるので好ましい。 A black area shown in FIG. 9 represents a label corresponding to the object area. When the frame having the closest similarity to the target frame (X03 in FIG. 9) is searched from the reference frame X01 and the reference frame X02 and overlapped with the image of X03 (OR region is calculated), X04 and X05, respectively, An image in which a part of the subject is missing can be generated. When these are overlapped (OR area is calculated), the shape (X06) of the subject before being hidden by the obstacle can be found. A portion that exists in the obtained shape X06 but does not exist in X03 is set as a target for subsequent restoration. In this example, the OR region of all reference frames is calculated. However, if there are many shapes (shapes such as X04 and X05) that overlap the target frame and individual reference frames, the calculation of all these OR regions is performed. This increases the possibility of including noise and the like. In this case, a threshold value is determined in advance, and for each pixel, the number of times a shape in which the target frame and individual reference frames are overlapped is counted. Only when the number is equal to or greater than the threshold value, the object It is preferable to determine that the region is a region because the influence of noise can be suppressed.

(隠された領域の別の推定方法)
なお、S102(注目点に対応する物体領域の推定)における物体領域の推定において、アファイン変換のパラメータを得るために拡張ハウスドルフ距離による類似度を用いる方法を説明したが、例えば参考文献２（金子敏充，堀修，「小領域のブロックマッチングを複数用いたロバストなオブジェクト追跡法」，電子情報通信学会論文誌Ｄ−ＩＩ，Ｖｏｌ．Ｊ８５−Ｄ−ＩＩ，Ｎｏ．７，ｐｐ．１１８８−１２００，２００２年７月）に記載された方法を用いることによっても、アファイン変換のパラメータを推定できる。また、変形の種類はアファイン変換に限定されない。例えば、２次の変換、つまりアファイン変形に加えてｘの２乗、ｙの２乗およびｘとｙの積を考慮した変換を用いても良い。 (Another method for estimating hidden areas)
In addition, in the estimation of the object area in S102 (estimation of the object area corresponding to the point of interest), the method of using the similarity based on the extended Hausdorff distance to obtain the affine transformation parameters has been described. For example, Reference 2 (Kaneko) Toshimitsu, Osamu Hori, “Robust Object Tracking Method Using Multiple Block Matching of Small Areas”, IEICE Transactions D-II, Vol. J85-D-II, No. 7, pp. 1188-1200, The parameters of the affine transformation can also be estimated by using the method described in July 2002). Further, the type of deformation is not limited to affine transformation. For example, in addition to the second-order transformation, that is, the affine deformation, a transformation considering the square of x, the square of y and the product of x and y may be used.

(欠落領域の画像の輝度値を求める方法)
図３のS103(画像フレームの物体領域を推定・更新)における具体的な処理手順を説明する。欠落領域の画像の輝度値を求めるには、S102のステップにより対応づけられた領域のうち、801に含まれないが変形被写体に含まれる領域の画像のみを元画像に重ね書きすれば良い。 (Method of obtaining the brightness value of the image of the missing area)
A specific processing procedure in S103 (estimation / update of the object region of the image frame) in FIG. 3 will be described. In order to obtain the luminance value of the image of the missing region, only the image of the region not included in 801 but included in the deformed subject among the regions associated in step S102 may be overwritten on the original image.

しかし、そのまま上書きを行うと、被写体801と変形被写体の境界付近に不自然さが残ってしまう。この不自然さは、境界部分を空間方向あるいは時間方向でぼかすことにより軽減できる。「空間方向でぼかす」とは、例えば、被写体801と変形被写体の境界の周囲１画素を対象として、それぞれの画素の輝度値をその画素を含む３×３画素の輝度値の平均値で置き換えることである。これにより境界付近が適度にぼやけることから、重ね書きによる不自然さが軽減される。「時間方向でぼかす」とは、例えば、複数フレームにおいて同様の方法で被写体801の輝度値を推定して複数の変形被写体を得て、被写体801とそれらの変形被写体の(各画素で隠されていない部分について)平均値を算出して輝度値を求めることである。 However, if overwriting is performed as it is, unnaturalness remains in the vicinity of the boundary between the subject 801 and the deformed subject. This unnaturalness can be reduced by blurring the boundary portion in the spatial direction or the time direction. “Blur in the spatial direction” means, for example, replacing the luminance value of each pixel with an average value of luminance values of 3 × 3 pixels including the pixel for one pixel around the boundary between the subject 801 and the deformed subject. It is. As a result, the vicinity of the boundary is moderately blurred, so that unnaturalness due to overwriting is reduced. “Blur in time direction” means, for example, that the luminance value of the subject 801 is estimated in a similar manner in a plurality of frames to obtain a plurality of deformed subjects, and the subject 801 and those deformed subjects (hidden by each pixel). The luminance value is obtained by calculating an average value (for a portion not present).

(物体領域情報が付与されていない場合の方法１：背景静止時)
物体領域情報が既知の場合の処理手順について以上説明したが、次に、物体領域情報が付与されておらず既知でない場合について説明する。物体領域情報が付与されていない場合でも、図３のフローチャートのS102よりも前に物体領域情報の推定を行うステップが追加される。ここで、背景が静止していると仮定するとき(これは例えば監視カメラで撮影した映像では適切な仮定である)、物体情報の推定は例えば次に説明する３段階のステップで行う。 (Method 1 when object area information is not given: When the background is stationary)
The processing procedure in the case where the object area information is known has been described above. Next, the case where the object area information is not given because it is not given will be described. Even when the object area information is not given, a step of estimating the object area information is added before S102 in the flowchart of FIG. Here, when it is assumed that the background is still (this is an appropriate assumption for a video photographed by a surveillance camera, for example), object information is estimated in three steps as described below, for example.

ステップ１：移動物体検出
ステップ２：複数の画像フレームでの同一物体の対応付け
ステップ３：各フレームにおける物体領域情報の算出
上記ステップ１、２には例えば参考文献３（藤吉弘亘，榎本暢芳，長谷川修，金出武雄，「アクティビティモニタリング−屋外監視映像の要約とＷＷＷ上表示・検索システム−」第７回画像センシングシンポジウム講演論文集，ｐｐ．４２３−４２８，Ｊｕｎｅ２００１）に記載された方法を利用できる。この参考文献３の方法では、動画像を入力とし、物体を含む矩形領域を得る。背景が静止していると仮定して移動物体を検出し、カルマンフィルタに基づく物体追跡により同一物体を対応付ける。この時点で物体ごとに、動画像(時空間画像)を３次元の画像とみなした場合の、３次元のチューブ状の領域が得られている。ステップ３では参考文献４（竹島秀則，井田孝，堀修，松本信幸，「時空間画像の自己相似性を用いたオブジェクト輪郭の抽出」，電子情報通信学会技術報告（信学技報）ＯＩＳ２００３−３７，ｐｐ．３９−４４，Ｓｅｐｔ．２００３．）又は参考文献５（T.Yoshida and T.Shimosato, "Motion Image Segmentation Using 3-d Watershed Algorithm", in Proceedings of IEEE International Conference on Image Processing 2001 vol II, pp.773-776.）に記載された方法により、時空間画像空間上の大まかな領域に基づく正確な領域を算出する。なおステップ３では、フレームごとに上記参考文献１に記載されたWatershedや、参考文献６（Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models", International Journal of Computer Vision, vol. 1, no. 4, pp. 321-331, 1987.）や、参考文献７（Takashi Ida and Yoko Sambonsugi, "Self-Affine Mapping System and Its Application to Object Contour Extraction", IEEE Transactions on Image Processing, Vol.9, No.11, November 2000.）に記載された方法を用いて大まかな形状から真の形状を得ても良い。このようにして物体領域情報が得られたら、図３のフローチャートのS102以降を実施できるようになる。 Step 1: Moving object detection
Step 2: Matching the same object in multiple image frames
Step 3: Calculation of object area information in each frame
Steps 1 and 2 include, for example, Reference 3 (Hiroyoshi Fujiyoshi, Yasuyoshi Enomoto, Osamu Hasegawa, Takeo Kanade, "Activity Monitoring-Summary of Outdoor Monitoring Video and Display / Search System on WWW-" 7th Image Sensing Symposium. The method described in the collection of lecture papers, pp. 423-428, June 2001) can be used. In the method of Reference 3, a moving image is input and a rectangular area including an object is obtained. A moving object is detected on the assumption that the background is stationary, and the same object is associated by object tracking based on the Kalman filter. At this time, a three-dimensional tube-like region is obtained for each object when the moving image (spatio-temporal image) is regarded as a three-dimensional image. In Step 3, Reference 4 (Hidenori Takeshima, Takashi Ida, Osamu Hori, Nobuyuki Matsumoto, “Extraction of Object Contour Using Self-Similarity of Spatio-Temporal Image”, IEICE Technical Report (Science Technical Report) OIS 2003- 37, pp. 39-44, Sep. 2003.) or Reference 5 (T. Yoshida and T. Shimosato, “Motion Image Segmentation Using 3-d Watershed Algorithm”, in Proceedings of IEEE International Conference on Image Processing 2001 vol II. , pp.773-776.), an accurate region based on a rough region in the spatiotemporal image space is calculated. In Step 3, for each frame, the Watershed described in Reference 1 and Reference 6 (Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models”, International Journal of Computer Vision, vol. 1, no. 4, pp. 321-331, 1987. and Reference 7 (Takashi Ida and Yoko Sambonsugi, "Self-Affine Mapping System and Its Application to Object Contour Extraction", IEEE Transactions on Image Processing, Vol. 9, No. 11, November 2000.) A true shape may be obtained from a rough shape by using the method described in No. 11, November 2000.). If the object area information is obtained in this way, S102 and subsequent steps in the flowchart of FIG. 3 can be performed.

(物体領域情報が付与されていない場合の方法２：背景静止の仮定なし)
物体領域情報が付与されていない場合の別の方法として、時空間画像を用いた例を説明する。この方法は次に説明する３ステップで構成され、背景が静止しているという仮定を必要としない。 (Method 2 when object area information is not given: no assumption of background stillness)
An example using a spatio-temporal image will be described as another method when object area information is not given. This method consists of the three steps described below and does not require the assumption that the background is stationary.

ステップ１：時空間画像に対して領域分割を行う。
ステップ２：各物体に対して画像フレームごとに特徴量を算出する(多くの場合、ステップ１では過剰分割されるが、この状態でステップ２を行う)。
ステップ３：ステップ２の結果から同一と考えられる物体を統合する。
上記ステップ１では、参考文献８（M. A. El Saban and B. S. Manjunath, "Video region segmentation by spatio-temporal watersheds", IEEE International Conference on Image Processing (ICIP), Barcelona, Spain, vol. 1, pp. 349-352, Sep. 2003.）や参考文献９（Y. Ueshige, Y. Kuroki, and T. Ohta, "Region Segmentation for Video by Using 3D-IFS Coding", in Proceedings of 22nd Picture Coding Symposium, pp.378-380.）に記載された時空間画像の領域分割を使い、動画像に対する領域分割を行う。ステップ２では、各物体に対して特徴量、例えばＲＧＢ色の平均値及び分散を計算する。ステップ３では、（Ｒ平均，Ｇ平均，Ｂ平均，Ｒ分散，Ｇ分散，Ｂ分散）の６つのパラメータに対してそれぞれ差分絶対値を求め、その重みつき総和(重みは別途定める)が別途定めたしきい値以下の場合に同一と判断し統合する。これによって、動画像中で物体ごとに分割された領域が得られる。このようにして物体領域情報が得られたら、先の場合と同様に、図３のフローチャートのS102以降を実施できるようになる。 Step 1: Region division is performed on the spatiotemporal image.
Step 2: A feature amount is calculated for each image frame for each object (in many cases, overdivision is performed in Step 1, but Step 2 is performed in this state).
Step 3: Integrate objects considered identical from the result of Step 2.
In step 1 above, reference 8 (MA El Saban and BS Manjunath, “Video region segmentation by spatio-temporal watersheds”, IEEE International Conference on Image Processing (ICIP), Barcelona, Spain, vol. 1, pp. 349-352 , Sep. 2003.) and Reference 9 (Y. Ueshige, Y. Kuroki, and T. Ohta, "Region Segmentation for Video by Using 3D-IFS Coding", in Proceedings of 22nd Picture Coding Symposium, pp.378-380 Using the space-time image region division described in.), The region division is performed on the moving image. In step 2, feature quantities such as an average value and variance of RGB colors are calculated for each object. In step 3, absolute difference values are obtained for each of the six parameters (R average, G average, B average, R variance, G variance, B variance), and a weighted sum (weight is determined separately) is separately determined. If they are below the threshold, they are determined to be the same and integrated. Thereby, a region divided for each object in the moving image is obtained. When the object region information is obtained in this way, the processing from S102 onward in the flowchart of FIG. 3 can be performed as in the previous case.

(マルチラベル縮小写像による3D-IFS領域分割 − 位置づけ)
上記参考文献９に示された時空間の領域分割は、静止画の手法である参考文献１０（特開平8-329255号公報（フラクタル領域分割））を時空間に拡張した拡大写像に基づくものであり、ノイズには強いが、拡大写像により得られる領域境界は不安定なものとなる。さらに、時空間画像全体を処理した後にK-Means法や周期点の分類で領域の決定を行うため、逐次処理が困難である。そこで、本発明の実施形態では、静止画を対象とする参考文献１１（特開2001-188910号公報（マルチラベルのフィッティング及び領域分割、以下、「2D-IFS領域フィッティング」と称する））に記載の方法を改良した新しい方法(以下、「3D-IFS領域フィッティング」と称する)を利用する。2D-IFS領域フィッティングは縮小写像を使うため領域境界は安定するが、この方法に関する時空間画像への適用を示唆する従来技術は存在しない。そこで本発明の実施形態では、動画像（時空間画像）を対象とするが上記参考文献１１と同様に拡大写像を使わない方法をとることで、時空間画像を用いる場合に必要な領域分割を安定に行う。さらに、Watershedとの組み合わせによる高精度化及び時空間画像の逐次処理の方法を与える。 (3D-IFS region segmentation by multi-label reduction mapping-positioning)
The spatio-temporal region segmentation shown in Reference 9 above is based on an enlarged map obtained by expanding Reference Literature 10 (Japanese Patent Laid-Open No. 8-329255 (fractal region segmentation)), which is a still image technique, into spatiotemporal space. Yes, it is strong against noise, but the region boundary obtained by the magnified mapping becomes unstable. Furthermore, since the region is determined by the K-Means method or periodic point classification after processing the entire spatio-temporal image, sequential processing is difficult. Therefore, in the embodiment of the present invention, it is described in Reference Document 11 (Japanese Patent Laid-Open No. 2001-188910 (multi-label fitting and area division, hereinafter referred to as “2D-IFS area fitting”)) for still images. A new method (hereinafter referred to as “3D-IFS region fitting”) obtained by improving the above method is used. Since 2D-IFS region fitting uses reduced mapping, the region boundary is stable, but there is no prior art suggesting application of this method to spatiotemporal images. Therefore, in the embodiment of the present invention, the region segmentation required when using a spatio-temporal image is performed by using a method that targets a moving image (a spatio-temporal image) but does not use an enlarged mapping as in the above-described Reference Document 11. Do it stably. Furthermore, a method of high accuracy and sequential processing of spatiotemporal images is provided by combination with Watershed.

(マルチラベル縮小写像による3D-IFS領域分割方法)
時空間画像は３次元の画素によって構成される３次元画像の一種である。以下、時空間画像の領域分割について、図１０のフローチャートに従って説明する(この手法は３次元画像にもそのまま適用できる)。はじめに、時空間画像及び初期ラベルを入力として与える(S1601)。次に、時空間の３次元で、３次元処理ボックスを配置する(S1602)。次に、それぞれの処理ボックスに対して、画像中で処理ボックス内の画像と相似な領域を探す(S1603)。S1603の探索はフラクタル符号化として知られ、例えば、図１１の時空間画像において、図１２のチャイルドボックス1201の処理ボックスが与えられたら、画像中で1202と相似に近い部分である図１２のペアレントボックス1202を探す処理のことである。これにより、それぞれの処理ボックス(チャイルドボックス)とそれに対応する相似な領域(ペアレントボックス)が求められる。ここで、ペアレントボックスの形は、ペアレントボックスからチャイルドボックスへの写像が縮小写像となるように決めておけばよいが、以下では、ペアレントボックスが、チャイルドボックスの各辺を２倍した形状に相当するものであるとして説明する。この場合、得られたペアレントボックスからチャイルドボックスへの写像は、４画素の内容を１画素に写像する（図１３参照）。ラベル画像に対してこの写像を適用する場合、写像後のラベル値を４画素の内容の多数決で決めるようにすると（２つのラベルが同数の場合はあらかじめ定めた基準で決める、例えば最大値や最小値を選ぶ）、上記参考文献１１と同様に、同一領域には同じラベルが付与されるようになる。なお、S1601の初期ラベルとしては、例えば全画素に独立したラベル値を与えればよい。また、別の領域分割手法の出力を3D-IFS領域フィッティングの入力としても良い。Watershedの出力を3D-IFS領域フィッティングの入力とする例は後述する。S1602の処理ボックスの配置については、フラクタル符号化のように時空間画像の画素全体を覆うように配置すれば良い。また、マスク画像を対象とした上記参考文献７や参考文献１２（特開2004-240913号公報（時空間フラクタル輪郭抽出法））に記載の方法を一般のラベル画像に拡張し、ラベル画像をスキャンしながら、注目画素の近傍に異なるラベル値を持ち、かつ、処理ボック
スが配置されていない場所に処理ボックスを配置しても良い。 (3D-IFS region segmentation method by multi-label reduction mapping)
A spatiotemporal image is a kind of three-dimensional image composed of three-dimensional pixels. Hereinafter, the region division of the spatio-temporal image will be described with reference to the flowchart of FIG. 10 (this method can be applied to a three-dimensional image as it is). First, a spatio-temporal image and an initial label are given as inputs (S1601). Next, a three-dimensional processing box is arranged in a three-dimensional space-time (S1602). Next, for each processing box, an area similar to the image in the processing box is searched for in the image (S1603). The search of S1603 is known as fractal coding. For example, if the processing box of the child box 1201 of FIG. 12 is given in the spatiotemporal image of FIG. 11, the parent of FIG. This is a process of searching for the box 1202. As a result, each processing box (child box) and a similar region (parent box) corresponding thereto are obtained. Here, the shape of the parent box can be determined so that the mapping from the parent box to the child box is a reduced map, but in the following, the parent box corresponds to a shape that doubles each side of the child box. It will be explained as being. In this case, the obtained mapping from the parent box to the child box maps the contents of 4 pixels to 1 pixel (see FIG. 13). When this mapping is applied to a label image, the label value after mapping is determined by majority decision of the contents of four pixels (if the two labels are the same, it is determined by a predetermined criterion, for example, the maximum value or the minimum In the same manner as in Reference Document 11, the same label is assigned to the same area. As an initial label in S1601, for example, an independent label value may be given to all pixels. Further, the output of another region dividing method may be used as an input for 3D-IFS region fitting. An example of using Watershed output as input for 3D-IFS region fitting will be described later. The processing boxes in S1602 may be arranged so as to cover the entire pixels of the spatiotemporal image as in fractal coding. In addition, the method described in Reference Document 7 and Reference Document 12 (Japanese Unexamined Patent Application Publication No. 2004-240913 (spatiotemporal fractal contour extraction method)) for mask images is extended to a general label image, and the label image is scanned. However, the processing box may be arranged at a place having a different label value in the vicinity of the target pixel and where the processing box is not arranged.

3D-IFS領域フィッティングの上記参考文献９にはない利点は、逐次処理が可能な点である。図１４に時間軸と処理対象とするフレームとの関係を示す。図１４で処理中のフレームと記載された部分が、上述した3D-IFS領域フィッティングの画像全体にほぼ対応する。この方法のフローチャートを図１５に示す。 An advantage of 3D-IFS region fitting that is not found in the above-mentioned reference 9 is that sequential processing is possible. FIG. 14 shows the relationship between the time axis and the frame to be processed. The portion described as the frame being processed in FIG. 14 substantially corresponds to the entire 3D-IFS region fitting image described above. A flowchart of this method is shown in FIG.

はじめに、先頭フレームを動画像の先頭に設定し(S2301)、終端フレームを例えば先頭フレーム＋２０フレームというように、別途定めたフレーム数だけずらしたフレームとする(S2302)。次に、図１４の1801を対象として3D-IFS領域フィッティングを行う(S2303)。動画像の終端まで処理を行ったか調べ(S2304)、終端でなければ先頭フレームを１フレームずらしたフレームとしてS2301以降のステップを繰り返す。具体的には、例えば先に図１４の1801を処理したのであれば、次に１フレームずらした1802を対象として3D-IFS領域フィッティングを行う。以下、1803、1804に示すように１フレームずつずらしながら同様に3D-IFS領域フィッティングを繰り返す。これにより、一度に画像処理を行うフレーム数が一定となるため、時間さえかければ入力フレーム数が多くても、限られたメモリで時間方向のつながりを考慮したラベル画像を得ることができる。 First, the head frame is set to the head of the moving image (S2301), and the end frame is a frame shifted by a separately defined number of frames, for example, head frame + 20 frames (S2302). Next, 3D-IFS region fitting is performed for 1801 in FIG. 14 (S2303). It is checked whether processing has been performed up to the end of the moving image (S2304). If it is not the end, the steps after S2301 are repeated with the first frame shifted by one frame. Specifically, for example, if 1801 in FIG. 14 is processed first, 3D-IFS region fitting is performed on 1802 shifted by one frame. Thereafter, 3D-IFS region fitting is repeated in the same manner while shifting by one frame as indicated by 1803 and 1804. As a result, the number of frames for which image processing is performed at a time becomes constant, so that even if the time is long enough, even if the number of input frames is large, it is possible to obtain a label image in consideration of the connection in the time direction with a limited memory.

ここで、3D-IFS領域フィッティングの効果を高める探索範囲限定法について、概要を説明する（ペアレントボックスとチャイルドボックスの相似比は1/2とする）。３次元のフラクタル符号化では、例えばペアレントボックスの候補として、チャイルドボックスを含むすべてのボックスを候補としている。図１６を例に説明する。図１６は、チャイルドボックス２４０１がフレーム９〜１４にあるとき、ペアレントボックス２４０２の候補としてフレーム４〜１４、フレーム５〜１５、・・・、９〜１９が探索の対象となり、その結果ペアレントボックス７〜１７が選ばれた状態を示している。これに対し、図１７のように常にチャイルドボックスの端フレームとペアレントボックスの端フレームが重なるように探索を行うと（以下、「探索範囲限定法」と称する）、つまりこの例では探索範囲を９〜１９のみに限定すると、以下の２つの効果を得ることができる。
（効果Ａ）ラベル数が増えない領域分割を実現できる。
（効果Ｂ）常に端フレームに対応したラベルデータが得られる。
ここで用いる領域分割は、隠された部分を他のフレームに対応付けるためのものである。したがって、対応付けにおいて余分な領域が発生しない手法は望ましく、上記効果Ａは重要である。はじめに効果Ａが得られる理由を説明する。探索範囲限定法により得られたチャイルドボックス・ペアレントボックスでは、S1604の写像を行う際に、フレーム４〜５がフレーム９に、フレーム６〜７がフレーム１０に、・・・、フレーム１３〜１４がフレーム１４に、それぞれコピーされる。したがって、フレーム１４以外のすべてのフレームは過去フレームからコピーされる。ここで、写像前にはフレーム１４のデータが存在しないと考えて、Ｓ１６０４の写像においてフレーム１４のラベル値を決める際に、フレーム１３・フレーム１４の４画素の多数決の代わりにフレーム１３の２画素の最大値（または最小値）で決めるようにすると上記効果Ａが得られる。あるいはＳ１６０４の前にフレーム１４にフレーム１３の内容をコピーしても同様に上記効果Ａが得られる。あるいは、同数の場合の決定方法として最小値（最大値）を用いているのであれば、フレーム１４のすべての画素に、フレーム１３のすべての画素よりも大きい（または小さい）未使用の異なるラベル値を割り当てても上記効果Ａが得られる。 Here, the outline of the search range limiting method for enhancing the effect of 3D-IFS region fitting will be described (the similarity ratio between the parent box and the child box is 1/2). In the three-dimensional fractal coding, for example, all boxes including child boxes are candidates as parent box candidates. An example will be described with reference to FIG. FIG. 16 shows that when the child box 2401 is in the frames 9 to 14, the frames 4 to 14, the frames 5 to 15,..., 9 to 19 are searched as candidates for the parent box 2402. -17 is shown in the selected state. On the other hand, as shown in FIG. 17, when the search is always performed so that the end frame of the child box and the end frame of the parent box overlap (hereinafter referred to as “search range limiting method”), in this example, the search range is 9 When limited to only -19, the following two effects can be obtained.
(Effect A) Area division in which the number of labels does not increase can be realized.
(Effect B) Label data corresponding to the end frame is always obtained.
The area division used here is for associating a hidden portion with another frame. Therefore, a method in which no extra area is generated in the association is desirable, and the effect A is important. First, the reason why the effect A is obtained will be described. In the child box / parent box obtained by the search range limiting method, when mapping in S1604, frames 4 to 5 are to frame 9, frames 6 to 7 to frame 10,. Each frame 14 is copied. Therefore, all the frames other than the frame 14 are copied from the past frame. Here, assuming that there is no data of the frame 14 before mapping, when determining the label value of the frame 14 in the mapping of S1604, the two pixels of the frame 13 instead of the majority of the four pixels of the frame 13 and the frame 14 are determined. If the maximum value (or the minimum value) is determined, the effect A can be obtained. Alternatively, even if the contents of the frame 13 are copied to the frame 14 before S1604, the effect A can be obtained in the same manner. Alternatively, if the minimum value (maximum value) is used as the determination method in the case of the same number, all the pixels in the frame 14 have different unused label values that are larger (or smaller) than all the pixels in the frame 13. The above-mentioned effect A can be obtained even if assigned.

次に、上記効果Ｂが得られる理由を説明する。探索範囲限定法を使えば、探索に必要な画像はチャイルドボックスの端フレームを超えることはない。したがって、端フレームにチャイルドボックスの端フレームが重なるように配置すれば、そのチャイルドボックスに対応したラベルデータが得られる。なお、探索範囲限定法そのものは、3D-IFS領域フィッティングに限定されず、入力がラベル画像ではなくマスク画像の場合(上記参考文献１２で述べられているトラッキング手法)にも適用可能である。探索範囲限定法では端フレームのマスク画像は既知のマスク画像からコピーされてくるために、上記参考文献１２で必要であった未来のフレームを予測する処理が不要になる。この予測処理は容易ではないことから、探索範囲限定法により予測不要となることは上記参考文献１２の方法に対しても有益である。 Next, the reason why the effect B is obtained will be described. If the search range limiting method is used, the image required for the search does not exceed the end frame of the child box. Therefore, if the end frame of the child box overlaps the end frame, label data corresponding to the child box can be obtained. Note that the search range limiting method itself is not limited to 3D-IFS region fitting, and can also be applied to the case where the input is not a label image but a mask image (the tracking method described in Reference Document 12 above). In the search range limiting method, the mask image of the end frame is copied from a known mask image, so that the process of predicting a future frame required in the reference document 12 is not necessary. Since this prediction process is not easy, it is also useful for the method of the above-mentioned reference 12 that prediction is not required by the search range limiting method.

マスク画像とラベル画像の両方（以下、領域情報画像）における探索範囲限定法の流れについて、図１０および図１８〜図２１を参照して説明する。なおこの例では、図１０に示されたラベル画像は領域情報画像と読み替えて説明する。S1601により処理すべき画像・領域情報画像が与えられる。図２１は時空間中の領域情報画像をあらわしており、2901は既知の被写体領域（ラベル画像での領域１）である。これをｘ軸に垂直な切断面で切断すると、その領域情報画像の領域境界は例えば図１８の2603のようになっている。ここで、2601は輝度値画像内の被写体である。このように、当初は領域情報画像の被写体と輝度値画像の被写体とは位置が大幅にずれている。チャイルドボックスの配置(S1602)により配置されたチャイルドボックスの１つを2602に示す。2602に対して探索範囲限定法で右端フレームが重なるようにペアレントボックスの探索(S1603)を行うと、図１９の2701が見つかる。この2602と2701の組を使って領域情報画像の写像(S1603)を適用すると、図２０の2801に示すように領域境界がより右端フレーム（２つのボックスが重なっているフレーム）に近いほうへ移動する。ここで、チャイルドボックスとペアレントボックスの相似比が1/2であるため、S1603を適用すると左側の領域がコピーされてくる。写像S1603を１回行うと、チャイルドボックスの左端フレームから（旧領域境界フレーム＋右端フレーム／２）フレーム（すなわち新領域境界フレーム）までは、旧領域境界フレーム以前のフレームからのコピーになる。新領域境界フレームから右端フレームまでは旧領域境界フレーム以前のフレームからのコピーにはならないが、先の例と同様、（１）多数決（または上記参考文献１２の方法）を行う際に新領域境界フレームから右端フレームまでのデータを除外し、（２）写像S1603の前に旧領域境界フレーム以前のデータを旧領域境界フレーム以降にコピーしておくようにすれば、右端フレームまでのすべてのフレームが旧領域境界フレームからのコピーとなり、探索範囲限定法の効果を生かすことができる。なお、処理を簡単にするために旧領域境界フレーム以降のデータをあらかじめ定めたパターン（例えば、被写体領域と背景領域がフレーム内の１画素ごとに交互にあらわれるパターン）やあらかじめ定めた単一領域で埋めても良い。 The flow of the search range limiting method in both the mask image and the label image (hereinafter referred to as region information image) will be described with reference to FIG. 10 and FIGS. In this example, the label image shown in FIG. 10 is described as a region information image. An image / region information image to be processed is given in S1601. FIG. 21 shows an area information image in space-time, and 2901 is a known subject area (area 1 in the label image). When this is cut along a cutting plane perpendicular to the x-axis, the area boundary of the area information image is, for example, 2603 in FIG. Here, reference numeral 2601 denotes a subject in the luminance value image. Thus, the position of the subject of the area information image and the subject of the luminance value image are greatly shifted at the beginning. One of the child boxes arranged in accordance with the arrangement of the child box (S1602) is shown in 2602. When the parent box search (S1603) is performed on 2602 so that the rightmost frames overlap with each other by the search range limiting method, 2701 in FIG. 19 is found. When the mapping of the area information image (S1603) is applied using the set of 2602 and 2701, the area boundary moves closer to the rightmost frame (frame where two boxes overlap) as shown by 2801 in FIG. To do. Here, since the similarity ratio between the child box and the parent box is ½, the area on the left side is copied when S1603 is applied. When mapping S1603 is performed once, the frame from the left end frame of the child box to the (old region boundary frame + right end frame / 2) frame (that is, the new region boundary frame) is a copy from the frame before the old region boundary frame. The copy from the frame before the old region boundary frame is not copied from the new region boundary frame to the rightmost frame, but, as in the previous example, (1) when the majority decision (or the method of Reference 12 above) is performed, the new region boundary frame If the data from the frame to the rightmost frame is excluded, and (2) the data before the old boundary frame is copied before the mapping S1603, all the frames up to the rightmost frame will be copied. It becomes a copy from the old region boundary frame, and the effect of the search range limiting method can be utilized. In order to simplify the processing, the data after the old region boundary frame is a predetermined pattern (for example, a pattern in which the subject region and the background region appear alternately for each pixel in the frame) or a predetermined single region. You can fill it.

なお、先の例では探索範囲限定法の探索範囲として、チャイルドボックスの時刻の端フレームとペアレントボックスの時刻の端フレームとが重なっている場合について述べたが、探索範囲限定法の適用範囲はこれに限定されない。一般に、ペアレントボックスの中心フレームとチャイルドボックスの中心フレームの差をあらかじめ定めた値に固定しておけば、所望のフレームのラベル画像（またはマスク画像、以下同じ）をその周囲のフレームのラベル画像から推定することができる。例えば、ペアレントボックスの中心フレームとチャイルドボックスの中心フレームが重なるように（ペアレントボックスの中心フレームが写像Ｓ１６０４によってチャイルドボックスの中心フレームに写像されるように）探索範囲を設定すれば、重なった中心フレームのラベル画像は先の図１６の例における端フレームのように、必ず他のフレームからコピーされる。 In the previous example, the case where the end frame of the child box time and the end frame of the parent box time overlap as the search range of the search range limiting method has been described. It is not limited to. Generally, if the difference between the center frame of the parent box and the center frame of the child box is fixed to a predetermined value, the label image of the desired frame (or mask image, the same applies hereinafter) is determined from the label images of the surrounding frames. Can be estimated. For example, if the search range is set so that the center frame of the parent box and the center frame of the child box overlap (the center frame of the parent box is mapped to the center frame of the child box by mapping S1604), the overlapped center frame The label image is always copied from another frame like the end frame in the example of FIG.

(Watershedでセグメンテーションされた画像のフィッティング)
Watershed（ウォーターシェッド）と3D-IFS領域フィッティングとを組み合わせると、より精度の高い領域分割を実現できる。この手法はWatershedと3D-IFS領域フィッティングの組み合わせだけでなく、Watershedと2D-IFS領域フィッティングの組み合わせに対しても同じように適用できる。説明の便宜上、２次元の場合について図示する。図２２は入力画像の例である。図２３は2D-IFS領域フィッティング、図２４はWatershedで領域分割を行った例である(白線は領域境界)。この例におけるそれぞれの領域境界の持つ性質を図２５に纏める。Watershedはテクスチャの多い場所、IFS領域フィッティングはテクスチャの少ない場所での過分割(領域境界が生成されてほしくない場所での分割)が認められる。2D-IFSフィッティングや3D-IFSフィッティングは他の領域分割結果を入力として受け取ることができるため、他の領域分割結果と組み合わせることは容易である。特に、WatershedのようにIFS領域フィッティングと異なる性質を持つ領域分割手法と組み合わせると、両方の長所を併せ持つ手法となる可能性がある。Watershedで得られたラベル画像を2D-IFSフィッティングの入力とした場合の結果を図２６に示す。個々の手法を単独で適用した場合と比べ、不要な領域境界が削減されている。同様に、Watershedと3D-IFS領域フィッティングとの組み合わせにおいては、Watershedを２次元又は３次元で行い、その結果に対して3D-IFS領域フィッティングを適用することで領域分割を実現する。 (Fitting an image segmented with Watershed)
By combining Watershed and 3D-IFS region fitting, more accurate region division can be realized. This method can be applied not only to the combination of Watershed and 3D-IFS region fitting, but also to the combination of Watershed and 2D-IFS region fitting. For convenience of explanation, a two-dimensional case is illustrated. FIG. 22 is an example of an input image. FIG. 23 shows a 2D-IFS region fitting, and FIG. 24 shows an example of region division by watershed (white lines are region boundaries). The properties of each region boundary in this example are summarized in FIG. Watershed is over-textured, and IFS region fitting is over-partitioned in regions with less texture (regional boundaries are not generated). Since 2D-IFS fitting and 3D-IFS fitting can receive other region segmentation results as an input, it is easy to combine with other region segmentation results. In particular, when combined with a region division method having properties different from IFS region fitting, such as Watershed, there is a possibility that the method has both advantages. FIG. 26 shows the result when the label image obtained by Watershed is used as an input for 2D-IFS fitting. Unnecessary region boundaries are reduced compared to the case where each method is applied alone. Similarly, in the combination of watershed and 3D-IFS region fitting, watershed is performed in two or three dimensions, and region division is realized by applying 3D-IFS region fitting to the result.

ここで用いたWatershedについては、例えば参考文献１３（L. Vincent and P. Soille, "Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, no.6, June 1991.）に記載されており、具体的には以下のステップからなる。
(ステップ1)各画素のエッジ強度を検出する。
(ステップ2)エッジ強度の極小点にシードを配置する。
(ステップ3)水位(しきい値)を例えばエッジ強度の最小値に初期設定する。
(ステップ4)ラベルを持たず、かつラベルの周囲の点であり、かつエッジ強度が水位以下である場合に、
(ステップ4-a)周囲のラベルが１種類であればそのラベルを設定する。 For example, Reference 13 (L. Vincent and P. Soille, “Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.13, no. .6, June 1991.), and specifically consists of the following steps.
(Step 1) The edge strength of each pixel is detected.
(Step 2) Place seeds at the minimum edge strength.
(Step 3) The water level (threshold value) is initialized to the minimum value of the edge strength, for example.
(Step 4) When there is no label, it is a point around the label, and the edge strength is below the water level,
(Step 4-a) If there are only one kind of surrounding labels, the labels are set.

(ステップ4-b)あるいは、周囲のラベルが２種類以上であれば、その画素を領域境界とする。 (Step 4-b) Or, if there are two or more kinds of surrounding labels, the pixel is set as a region boundary.

(ステップ5)全画素にラベル値が設定されるまで水位を上げながら(ステップ4)を繰り返す。 (Step 5) Repeat (Step 4) while raising the water level until label values are set for all pixels.

図２７は、Watershedと2D-IFSフィッティングを使った領域分割の手順を示すフローチャートである。 FIG. 27 is a flowchart showing the procedure of area division using Watershed and 2D-IFS fitting.

はじめに、Watershedのしきい値を設定する(S1701)。次に、先に述べたWatershedの(ステップ4)に従って領域成長する(S1702)。次に、先に述べたWatershedの(ステップ5)に従ってしきい値を上げながらS1702を繰り返す。これによりラベル画像が得られるので、これを2D-IFSフィッティングあるいは3D-IFSフィッティングの入力として、図１０のフローチャートにおけるS1602以下と同じ処理を行う。なお、ここでは２次元、３次元についてのみ述べたが、例えば３次元空間画像に時間を加えた４次元画像のように、４次元以上を対象としても良い。また、Watershedは各ラベルが隣接画素のみに広がるため、参考文献１４（T. Yoshida and T. Shimosato, "Motion Image Segmentation Using 3-D Watershed Algorithm", in Proceedings of IEEE International Conference on Image Processing 2001, pp.773-776, 2001.）に記載のように逐次計算も可能である(この参考文献１４では２値のラベルについて述べているが、多値のラベルについてもそのまま適用可能である)。したがって、Watershedと3D-IFS領域フィッティングを組み合わせた手法についても、Watershedを逐次に行いながら、その出力を逐次的に3D-IFS領域フィッティングの入力として与えれば、逐次処理が可能である。なお、参考文献１５（D. Zhong et al. “Video object model and segmentation for content-based video indexing” in Proceedings of IEEE Symposium on Circuits and Systems 1997, pp.1492-1495）のように、Watershedの後に領域統合を行ってもよい。この場合、Watershedの後に2Dや3DのIFSフィッティングを適用し、その後に領域統合をしてもよいし、IFSフィッティングを先に行っても良い。 First, a watershed threshold is set (S1701). Next, the region is grown according to the watershed (step 4) described above (S1702). Next, S1702 is repeated while raising the threshold according to the Watershed (step 5) described above. As a result, a label image is obtained, and this is used as input for 2D-IFS fitting or 3D-IFS fitting, and the same processing as S1602 and subsequent steps in the flowchart of FIG. 10 is performed. Although only 2D and 3D have been described here, 4D or more may be targeted, for example, a 4D image obtained by adding time to a 3D space image. In addition, since each label spreads only to adjacent pixels in Watershed, Reference 14 (T. Yoshida and T. Shimosato, “Motion Image Segmentation Using 3-D Watershed Algorithm”, in Proceedings of IEEE International Conference on Image Processing 2001, pp. (.773-776, 2001.), it is possible to perform sequential calculation as well (this reference 14 describes binary labels, but it can also be applied to multi-value labels as it is). Accordingly, a method combining watershed and 3D-IFS region fitting can be sequentially processed by sequentially providing the output as the input of 3D-IFS region fitting while sequentially performing watershed. As shown in Reference 15 (D. Zhong et al. “Video object model and segmentation for content-based video indexing” in Proceedings of IEEE Symposium on Circuits and Systems 1997, pp.1492-1495) Integration may be performed. In this case, 2D or 3D IFS fitting may be applied after Watershed, and then region integration may be performed, or IFS fitting may be performed first.

以上に述べたように、改良された領域分割を用いると、２次元の動画像撮影用カメラによって撮影された画像に対して、ユーザの指示により画像の隠れた領域を表示する際に、これをより精度よく表示させることができるようになる。具体的には、例えば被写体が車であったときに、クリックによって表示される部分がその欠落部の一部のみとなったり、クリックで指示された領域がすべて表示されているとみなされてしまうために隠れた領域が表示されないといった問題が改善される。 As described above, when the improved region segmentation is used, when the hidden region of the image is displayed by the user's instruction with respect to the image captured by the two-dimensional moving image capturing camera, this is used. It becomes possible to display more accurately. Specifically, for example, when the subject is a car, the part displayed by clicking becomes only a part of the missing part, or it is considered that all the areas instructed by clicking are displayed. Therefore, the problem that the hidden area is not displayed is improved.

以上説明した本発明の実施形態によれば、２次元の動画像撮影用カメラによって撮影された画像に対して、ユーザの指示により画像の隠れた領域を表示させ、本発明の方法を搭載した機器の操作性を向上させることができる。 According to the embodiment of the present invention described above, a device in which a hidden region of an image is displayed according to a user instruction on an image captured by a two-dimensional moving image capturing camera, and the method of the present invention is mounted. The operability can be improved.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の一実施形態に係る画像生成装置のブロック図1 is a block diagram of an image generation apparatus according to an embodiment of the present invention. 本発明の実施形態に係る画像生成方法の概要を説明するための図であって、入力動画像中の画像フレームの例を示す図The figure for demonstrating the outline | summary of the image generation method which concerns on embodiment of this invention, Comprising: The figure which shows the example of the image frame in an input moving image 物体領域情報が既知である場合の処理手順を示すフローチャートFlow chart showing processing procedure when object area information is known 図２に示された動画像の左の画像フレームに対応する物体領域情報を示す図The figure which shows the object area | region information corresponding to the left image frame of the moving image shown by FIG. 選択領域のユーザへのフィードバック提示の処理手順を示すフローチャートThe flowchart which shows the processing procedure of the feedback presentation to the user of the selection area フィードバック提示の例を示す図Diagram showing examples of feedback presentation 注目点に対応する物体領域の推定の処理手順を示すフローチャートA flowchart showing a processing procedure for estimating an object region corresponding to a point of interest 図２に示された動画像の中央の画像フレームに対応する物体領域情報を示す図The figure which shows the object area | region information corresponding to the center image frame of the moving image shown by FIG. 隠された領域の推定方法を説明するための図The figure for explaining the estimation method of the hidden area マルチラベル縮小写像による3D-IFS領域分割方法の処理手順を示すフローチャートFlow chart showing processing procedure of 3D-IFS region segmentation method by multi-label reduction mapping 時空間画像の一例を示す図The figure which shows an example of a spatiotemporal image 時空間画像において、チャイルドボックスの処理ボックスと、ペアレントボックスを示す図Diagram showing child box processing box and parent box in spatio-temporal image チャイルドボックスとペアレントボックスとの相似関係を示す図Diagram showing similarity between child box and parent box 処理中フレームを時間軸に沿って示す図Diagram showing the frame being processed along the time axis 3D-IFS領域フィッティングにおける逐次処理の手順を示すフローチャートFlow chart showing the sequential processing procedure in 3D-IFS region fitting チャイルドボックスとペアレントボックスの選択範囲を示す図Diagram showing selection range of child box and parent box チャイルドボックスとペアレントボックスの選択範囲を示す図Diagram showing selection range of child box and parent box 探索範囲限定法を説明するための図Diagram for explaining the search range limiting method 探索範囲限定法を説明するための図Diagram for explaining the search range limiting method 探索範囲限定法を説明するための図Diagram for explaining the search range limiting method 探索範囲限定法を説明するための図であって、時空間中の領域情報画像を示す図It is a figure for demonstrating the search range limitation method, Comprising: The figure which shows the area | region information image in space time 入力画像の一例を示す図Diagram showing an example of an input image 図２２の入力画像について、2D-IFS領域フィッティングを行った結果を示す図The figure which shows the result of having performed 2D-IFS area | region fitting about the input image of FIG. 図２２の入力画像について、Watershedで領域分割を行った結果を示す図The figure which shows the result of having divided the area | region by Watershed about the input image of FIG. IFS領域フィッティングとWatershedとについて、領域境界の持つ性質の比較結果を示す図The figure which shows the comparison result of the property which the area boundary has regarding IFS area fitting and watershed Watershedで得られたラベル画像を2D-IFSフィッティングの入力とした場合の処理結果を示す図The figure which shows the processing result when the label image obtained by Watershed is used as the input of 2D-IFS fitting Watershedと2D-IFSフィッティングを使った領域分割の手順を示すフローチャートFlow chart showing the procedure of region segmentation using Watershed and 2D-IFS fitting

Explanation of symbols

１００…画像生成装置；
１０１…動画像記憶部；
１０２…領域情報記憶部；
１０３…隠蔽領域推定部；
１０４…画像生成部；
１０５…表示装置；
１０６…マウス；
１０７…動画像入力部；
１０８…領域情報入力部 100: Image generation device;
101 ... moving image storage unit;
102 ... area information storage unit;
103 ... Hidden region estimation unit;
104: Image generation unit;
105 ... display device;
106 ... mouse;
107: moving image input unit;
108 ... area information input section

Claims

Inputting a series of image frames;
Inputting region information indicating a region occupied by an object in the series of image frames;
Inputting a point of interest in a first image frame in the series of image frames;
Estimating a hidden object region of an object corresponding to the attention point based on the region information;
Generating an image frame for presentation by obtaining a luminance value of the estimated hidden object region from the series of image frames.

The image generation method according to claim 1, further comprising a step of estimating the region information from the series of image frames instead of the step of inputting the region information.

Estimating a schematic shape of the hidden object region;
3. The image generation method according to claim 1, further comprising a step of presenting the estimated schematic shape on the first image frame.

Estimating a luminance value of the hidden object region for a plurality of image frames;
The image generation method according to claim 1, further comprising: determining one luminance value from a plurality of luminance values estimated for a plurality of image frames.

Inputting first area information indicating an area occupied by an object in the first image frame;
Inputting second area information indicating an area occupied by the object in the second image frame;
Generating a plurality of modified second region information by performing different conversions while changing the conversion parameters for the second region information;
Calculating a similarity between the first area information and each of the plurality of modified second area information;
Selecting the conversion parameter based on the calculated similarity;
An image processing method for outputting selected conversion parameters as information for associating the same object between frames.

The step of calculating the similarity includes
Either the urban area distance or Euclidean distance from the point on the first area information to the modified second area information, or the urban area distance or Euclidean distance from the point on the second area information to the modified first area information A distance calculating step for calculating
Executing the distance calculating step for N points on the first area information or N points on the second area information;
The image processing method according to claim 5, further comprising: outputting a Kth (0 ≦ K ≦ N) value of the distances from the N points obtained by the distance calculating step as the similarity.

An image processing method for estimating region information of an object according to claim 1,
Providing a three-dimensional label image giving information for distinguishing regions for each three-dimensional pixel of the spatiotemporal image based on the series of image frames;
Placing a three-dimensional processing box in the spatiotemporal image;
Obtaining a parent box corresponding to a region having a volume larger than that of the three-dimensional processing box and having similar image data with respect to the three-dimensional processing box;
Mapping the three-dimensional label image from the parent box to the three-dimensional processing box.

8. The step of switching the frame of interest by inputting a spatio-temporal image and a three-dimensional label image having an arbitrary frame of interest as a head frame and a frame shifted by a predetermined number of frames from the head frame as an end frame. Image processing method.

inputting n-dimensional image data (n is an integer of 2 or more);
An area dividing step of generating an n-dimensional label image by area division on the n-dimensional image data;
Placing an n-dimensional processing box in the n-dimensional image data;
Obtaining a parent box for the n-dimensional processing box having a measure larger than the n-dimensional box and having similar image data;
Mapping the n-dimensional label image from the parent box to the n-dimensional box.

The region dividing step includes:
Inputting the n-dimensional image data and its edge strength;
Inputting labels for some pixels of the n-dimensional image data;
Setting an initial value of the threshold T of the edge strength;
A label region growing step for setting the label if the label is set in an adjacent pixel for each pixel in which the label is not set and the edge strength is equal to or less than a threshold T;
The image processing method according to claim 9, further comprising the step of repeating the label region growing step while satisfying a separately defined condition while increasing the threshold value T.

The image processing method according to claim 10, wherein the edge strength is an absolute value of a value obtained by a weighted sum of a pixel of interest and surrounding pixels.

Arbitrary attention frame is the first frame,
The image processing method according to claim 9, wherein the frame of interest is switched with n-dimensional image data and an n-dimensional label image having a frame shifted by a predetermined number of frames from the first frame as inputs.

A first step of inputting data in which an arbitrary frame of interest in spatio-temporal image data in which image frames of moving images are arranged in time series is set as a start frame, and a frame shifted by a predetermined number of frames from the start frame is set as a end frame When,
A spatio-temporal space corresponding to either a spatio-temporal label image that gives information for distinguishing a region or a spatio-temporal mask image that gives information for distinguishing between a subject region and a background region for each pixel in the spatio-temporal image data A second step of preparing an area information image;
A third step of arranging a child box in the spatiotemporal image data;
A fourth step for determining a parent box having a volume larger than that of the child box and having a similar space-time image data in the child box;
A fifth step of mapping the three-dimensional region information image from the parent box to the child box;
Performing the first to fifth steps while switching the frame of interest;
In the fourth step, the frame that is a candidate for the parent box is limited to a frame in which a difference between a center frame of the parent box and a center frame of the child box is a separately determined value. .

Means for inputting a series of image frames;
Means for inputting region information indicating a region occupied by an object in the series of image frames;
Means for inputting a point of interest in a first image frame in the series of image frames;
Means for estimating a concealed object region of an object corresponding to the attention point based on the region information;
An image generation apparatus comprising: means for generating a presentation image frame by obtaining an estimated luminance value of the concealed object region from the series of image frames.

Means for inputting first area information indicating an area occupied by the object in the first image frame;
Means for inputting second area information indicating an area occupied by the object in the second image frame;
Means for generating a plurality of modified second area information by performing different conversions while changing the conversion parameters for the second area information;
Means for calculating similarity between the first area information and each of the plurality of modified second area information;
Means for selecting the conversion parameter based on the calculated similarity;
An image processing apparatus comprising: means for outputting the selected conversion parameter as information for associating the same object between frames.

means for inputting n-dimensional image data (n is an integer of 2 or more);
Area dividing means for generating an n-dimensional label image by area division on the n-dimensional image data;
Means for arranging an n-dimensional processing box in the n-dimensional image data;
Means for obtaining a parent box whose area is similar to that of the n-dimensional processing box and whose image data is similar to the n-dimensional processing box;
An image processing apparatus comprising: means for mapping the n-dimensional label image from the parent box to the n-dimensional box.

A procedure for entering a series of image frames;
A procedure for inputting region information indicating a region occupied by an object in the series of image frames;
Entering a point of interest in a first image frame in the series of image frames;
A procedure for estimating a concealed object region of an object corresponding to the attention point based on the region information;
An image generation program for causing a computer to execute a procedure of generating a presentation image frame by obtaining an estimated luminance value of a hidden object region from the series of image frames.

Inputting first area information indicating an area occupied by the object in the first image frame;
A procedure for inputting second area information indicating an area occupied by the object in the second image frame;
A procedure for generating a plurality of modified second area information by performing different conversions while changing conversion parameters for the second area information;
Calculating a similarity between the first region information and each of the plurality of modified second region information;
Selecting the conversion parameter based on the calculated similarity;
An image processing program for causing a computer to execute a procedure for outputting selected conversion parameters as information for associating the same object between frames.

a procedure for inputting n-dimensional image data (n is an integer of 2 or more);
An area division procedure for generating an n-dimensional label image by area division on the n-dimensional image data;
Placing an n-dimensional processing box in the n-dimensional image data;
A procedure for obtaining a parent box whose area is similar to that of the n-dimensional processing box and whose image data is similar to the n-dimensional processing box;
An image processing program for causing a computer to execute a procedure for mapping the parent box to the n-dimensional box on the n-dimensional label image.