JP2000082145A

JP2000082145A - Object extraction device

Info

Publication number: JP2000082145A
Application number: JP11001891A
Authority: JP
Inventors: Yoko Sanbonsugi; 陽子三本杉; Toshiaki Watanabe; 敏明渡邊; Takashi Ida; 孝井田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-01-07
Filing date: 1999-01-07
Publication date: 2000-03-21
Anticipated expiration: 2019-01-07
Also published as: JP3612227B2

Abstract

PROBLEM TO BE SOLVED: To precisely extract/track a target object from a moving image without being affected by an extra surrounding movement other than the target object. SOLUTION: Rectangles R(i-1), R(i) and R(i+1) surrounding an object are set to each of temporally continuous frames f(i-1), f(i) and f(i+1). By an interframe difference between a current frame f(i) and a first reference frame f(f-1) and an interframe difference between the current frame f(i) and a second reference frame f(i+1), difference pictures fd(i-1, i) and fd(i, i+1) can be obtained. Regarding each of a polygon Rd(i-1, i)=R(i-1) or R(i) and a polygon Rd(i, i+1)=R(i) or R(i+1), a background area is decided and remaining areas are selected as an object are candidate. By taking intersection of these object area candidates, an object area O(i) of the current frame f(i) can be extracted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は画像の物体抽出装置
に関し、特に入力動画像から目的とする物体の位置を検
出して動物体の追跡／抽出を行う物体抽出装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an object extracting apparatus for an image, and more particularly, to an object extracting apparatus which detects a position of a target object from an input moving image to track / extract a moving object.

【０００２】[0002]

【従来の技術】従来より、動画像中の物体を追跡／抽出
するためのアルゴリズムが考えられている。これは、様
々な物体と背景が混在する画像からある物体だけを抽出
するための技術である。この技術は、動画像の加工や編
集に有用であり、例えば、動画像から抽出した人物を別
の背景に合成することなどができる。2. Description of the Related Art Conventionally, algorithms for tracking / extracting an object in a moving image have been considered. This is a technique for extracting only a certain object from an image in which various objects and a background are mixed. This technique is useful for processing and editing moving images, and for example, can combine a person extracted from a moving image with another background.

【０００３】物体抽出に使用される方法としては、時空
間画像領域分割（越後、飯作、「ビデオモザイクのため
の時空間画像領域分割」、１９９７年電子情報通信学会
情報・システムソサイエティ大会、Ｄ−１２−８１、
ｐ．２７３、１９９７年９月）を利用した領域分割技術
が知られている。As a method used for object extraction, spatio-temporal image area division (Echigo and Iisaku, “spatio-temporal image area division for video mosaic”, 1997 IEICE Information and System Society Conference, D. 12-81,
p. 273, September 1997) is known.

【０００４】この時空間画像領域分割を用いた領域分割
方法では、動画像の１フレーム内のカラーテクスチャに
よる小領域の分割を行ない、フレーム間の動きの関係を
使ってその領域を併合する。フレーム内の画像を分割す
る際には、初期分割を与える必要があり、それによって
分割結果が大きく左右されるという問題がある。そこ
で、これを逆に利用して、この時空間画像領域分割を用
いた領域分割法では、別のフレームで初期分割を変え
て、結果的に異なる分割結果を得て、フレーム間の動き
で矛盾する分割を併合するという手法をとっている。In the area dividing method using the spatio-temporal image area dividing, a small area is divided by a color texture in one frame of a moving image, and the areas are merged using a motion relation between frames. When dividing an image in a frame, it is necessary to give an initial division, which has a problem that the division result is largely affected. Therefore, by utilizing this in reverse, in this region segmentation method using spatiotemporal image segmentation, the initial segmentation is changed in another frame, resulting in a different segmentation result, and contradiction due to the movement between frames. The method of merging the divisions is performed.

【０００５】しかし、この手法を動画像中の物体の追跡
および抽出にそのまま適用すると、動きベクトルが、目
的とする動物体以外の余分な動きに影響されてしまい、
信頼度が十分でないことが多く、誤った併合を行なう点
が問題となる。However, if this technique is directly applied to tracking and extraction of an object in a moving image, the motion vector is affected by extra motion other than the intended moving object,
In many cases, the reliability is not sufficient, and the problem is that erroneous merging is performed.

【０００６】また、特開平８−２４１４１４号公報に
は、複数の動物体検出装置を併用した動物体検出・追跡
装置が開示されている。この従来の動物体検出・追跡装
置は、例えば監視カメラを用いた監視システムなどに用
いられるものであり、入力動画像から動物体を検出して
その追跡を行う。この動物体検出・追跡装置において
は、入力動画像は、画像分割部、フレーム間差分型動物
体検出部、背景差分型動物体検出部、動物体追跡部にそ
れぞれ入力される。画像分割部では、入力動画像が予め
定められた大きさのブロックに分割される。分割結果
は、フレーム間差分型動物体検出部、および背景差分型
動物体検出部にそれぞれ送られる。フレーム間差分型動
物体検出部では、分割結果毎に、フレーム間差分を用い
て入力動画像中の動物体が検出される。この場合、フレ
ーム間差分を取る際のフレーム間隔は、動物体の移動速
度に影響されずにその動物体を検出できるようにするた
めに、背景差分型動物体検出部の検出結果に基づいて設
定される。背景差分型動物体検出部では、これまでに入
力された動画像を用いて分割結果毎に作成した背景画像
と動物体との差分を取ることにより、動物体が検出され
る。統合処理部では、フレーム間差分型動物体検出部お
よび背景差分型動物体検出部それぞれの検出結果が統合
されて、動物体の動き情報が抽出される。各フレームで
物体を抽出した後、動物体追跡部では、フレーム間にお
いて対応する動物体同士の対応付けが行われる。Further, Japanese Patent Application Laid-Open No. Hei 8-241414 discloses a moving object detecting / tracking device using a plurality of moving object detecting devices in combination. This conventional moving object detection and tracking device is used, for example, in a monitoring system using a monitoring camera, and detects and tracks a moving object from an input moving image. In this moving object detection / tracking apparatus, an input moving image is input to an image dividing unit, an inter-frame difference type moving object detection unit, a background difference type moving object detection unit, and a moving object tracking unit. In the image dividing unit, the input moving image is divided into blocks of a predetermined size. The division result is sent to the inter-frame difference type moving object detection unit and the background difference type moving object detection unit. The inter-frame difference type moving object detection unit detects the moving object in the input moving image using the inter-frame difference for each division result. In this case, the frame interval at the time of obtaining the inter-frame difference is set based on the detection result of the background difference type moving object detection unit so that the moving object can be detected without being affected by the moving speed of the moving object. Is done. The background difference type moving object detection unit detects the moving object by taking the difference between the background image created for each division result and the moving object using the moving image input so far. The integration processing unit integrates the detection results of the inter-frame difference type moving object detection unit and the background difference type moving object detection unit, and extracts motion information of the moving object. After extracting the object in each frame, the moving object tracking unit associates the corresponding moving objects between the frames.

【０００７】この構成においては、フレーム間差分のみ
ならず、背景差分をも用いて動物体の検出を行っている
ため、フレーム間差分だけを用いる場合に比べ検出精度
は高くなる。しかし、入力動画像全体を対象としてその
画像の中から動きのある物体をフレーム間差分および背
景差分によって検出する仕組みであるため、フレーム間
差分および背景差分それぞれの検出結果は、目的とする
動物体以外の余分な動きに影響されてしまい、背景に複
雑な動きがある画像ではうまく目的とする動物体を抽出
・追跡できないという問題がある。In this configuration, since the moving object is detected using not only the inter-frame difference but also the background difference, the detection accuracy is higher than when only the inter-frame difference is used. However, since the moving object is detected from the entire input moving image by using the inter-frame difference and the background difference, the detection results of the inter-frame difference and the background difference are different from each other. However, there is a problem that a target moving object cannot be extracted and tracked well in an image having a complicated movement in the background.

【０００８】また、別の物体抽出技術としては、複数の
フレームを用いてまず背景画像を生成し、その背景画像
と入力画像の画素値の差分が大きい領域を物体として抽
出する方法も知られている。As another object extraction technique, there is also known a method of first generating a background image using a plurality of frames, and extracting a region where the pixel value difference between the background image and the input image is large as an object. I have.

【０００９】この背景画像を用いる物体抽出の既存の技
術一例が、例えば、特開平８−５５２２２号公報の「移
動物体検出装置および背景抽出装置ならびに放置物体検
出装置」に開示されている。An example of an existing technique for object extraction using this background image is disclosed, for example, in Japanese Patent Application Laid-Open No. 8-55222, entitled "Moving Object Detection Device, Background Extraction Device, and Unattended Object Detection Device".

【００１０】現処理フレームの画像信号は、１フレーム
分の画像を蓄えるフレームメモリ、第１の動き検出手
段、第２の動き検出手段、スイッチに入力される。フレ
ームメモリからは、１フレーム前の画像信号が読み出さ
れ、第１の動き検出手段に入力される。一方、背景画像
を保持するために用意されたフレームメモリからは、そ
の時点までに生成されている背景画像信号が読み出さ
れ、第２の動き検出手段と、スイッチに入力される。第
１の動き検出手段と第２の動き検出手段では、各々入力
される２つの画像信号の差分値などを用いて物体領域お
よび物体領域が抽出され、いずれも論理演算回路に送ら
れる。論理演算回路では、入力される２つの画像の論理
積がとられ、それが最終的な物体領域として出力され
る。また、物体領域は、スイッチにも送られる。スイッ
チでは、物体領域によって、物体領域に属する画素につ
いては、背景画素信号が選択され、逆に、物体領域に属
さない画素については、現処理フレームの画像信号が選
択され、上書き信号としてフレームメモリに送られ、フ
レームメモリの画素値が上書きされる。[0010] The image signal of the currently processed frame is input to a frame memory for storing an image of one frame, first motion detecting means, second motion detecting means, and a switch. The image signal of the previous frame is read out from the frame memory and input to the first motion detecting means. On the other hand, the background image signal generated up to that point is read from the frame memory prepared for holding the background image, and is input to the second motion detecting means and the switch. The first motion detecting means and the second motion detecting means extract an object area and an object area using a difference value between two input image signals, and send them to the logic operation circuit. In the logical operation circuit, the logical product of the two input images is calculated, and the logical product is output as a final object region. The object area is also sent to the switch. In the switch, a background pixel signal is selected for a pixel belonging to the object region, and conversely, for a pixel not belonging to the object region, an image signal of the current processing frame is selected, and the image signal is stored in the frame memory as an overwrite signal. Sent to overwrite the pixel values in the frame memory.

【００１１】この手法では、特開平８−５５２２２号公
報に示されている様に、処理が進行するにつれて、次第
に背景画像が正しくなっていき、やがては、物体が正し
く抽出されるようになる。しかし、動画像シーケンスの
初めの部分においては背景画像に物体が混入しているた
めに、物体の抽出精度が悪い。また、物体の動きが小さ
い場合には、いつまでたっても、その物体の画像が背景
画像の中に残り、抽出精度は高くならない。In this method, as shown in Japanese Patent Application Laid-Open No. 8-55222, the background image gradually becomes correct as the processing proceeds, and the object is eventually correctly extracted. However, at the beginning of the moving image sequence, the object is mixed with the background image, and the object extraction accuracy is poor. Further, when the movement of the object is small, the image of the object remains in the background image forever, and the extraction accuracy does not increase.

【００１２】[0012]

【発明が解決しようとする課題】上述したように、従来
の物体抽出／追跡方法では、入力動画像全体を対象とし
てその画像の中から動きのある物体を検出する仕組みで
あるため、目的とする動物体以外の余分な動きに影響を
受けてしまい、目的の動物体を精度良く抽出・追跡する
ことができないという問題があった。As described above, the conventional object extraction / tracking method has a mechanism for detecting a moving object from an entire input moving image from the image. There has been a problem that the target moving object cannot be accurately extracted and tracked because it is affected by extra motion other than the moving object.

【００１３】また、背景画像を用いる物体抽出法では、
動画像シーケンスの初めの部分において抽出精度が悪
く、また、物体の動きが小さい場合には、いつまでも背
景画像が完成しないために抽出精度が良くならないとい
う問題点があった。In the object extraction method using a background image,
If the extraction accuracy is low at the beginning of the moving image sequence and the movement of the object is small, there is a problem that the extraction accuracy is not improved because the background image is not completed forever.

【００１４】本発明は、目的の物体以外の周囲の余分な
動きに影響を受けずにその物体の抽出／追跡を精度良く
行うことが可能な動画像の物体抽出装置を提供すること
を目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a moving image object extracting apparatus capable of accurately extracting / tracking an object without being affected by extra movement around the object other than the object. I do.

【００１５】また、本発明は、背景画像を精度良く決定
できるようにして、物体の動きの大小によらずに、且つ
動画像シーケンスの初めの部分も最後の部分と同様に高
い抽出精度が得られる物体抽出装置を提供することを目
的とする。Further, the present invention makes it possible to determine a background image with high accuracy, and to obtain high extraction accuracy regardless of the size of the motion of an object and at the beginning of a moving image sequence as well as at the end. It is an object of the present invention to provide a device for extracting an object.

【００１６】[0016]

【課題を解決するための手段】本発明は、物体抽出対象
となる現フレームと、この現フレームに対し時間的に異
なる第１の参照フレームとの差分に基づいて、前記現フ
レームと前記第１の参照フレームに共通の第１の背景領
域を決定し、前記現フレームと、この現フレームに対し
時間的に異なる第２の参照フレームとの差分に基づい
て、前記現フレームと前記第２の参照フレームに共通の
第２の背景領域を決定する背景領域決定手段と、前記現
フレームの図形内画像の中で、前記第１の背景領域と前
記第２の背景領域のどちらにも属さない領域を、物体領
域として抽出する手段と、静止している物体領域を検出
する物体静止検出手段とを具備する物体抽出装置を提供
する。According to the present invention, the current frame and the first frame are extracted based on a difference between a current frame from which an object is to be extracted and a first reference frame temporally different from the current frame. A first background region common to the reference frame of the current frame, and based on a difference between the current frame and a second reference frame temporally different from the current frame, the current frame and the second reference frame are determined. Background area determining means for determining a second background area common to the frame; and an area not belonging to either the first background area or the second background area in the in-graphic image of the current frame. And an object extraction device comprising: means for extracting an object area; and object stillness detection means for detecting a stationary object area.

【００１７】この物体抽出装置においては、物体抽出対
象の現フレーム毎に二つの参照フレームが用意され、そ
の現フレームと第１の参照フレームとの間の第１の差分
画像により、現フレームと第１の参照フレームとで共通
に用いられている第１の共通背景領域が決定され、また
現フレームと第２の参照フレームとの間の第２の差分画
像により、現フレームと第２の参照フレームとで共通に
用いられている第２の共通背景領域が決定される。第１
および第２のどちらの差分画像にも現フレーム上の物体
領域が共通に含まれているため、第１の共通背景領域と
第２の共通背景領域のどちらにも属さない領域の中で、
現フレームの図形内画像に含まれる領域を検出すること
により、現フレーム上の物体領域が抽出される。この物
体領域が静止物体に相当する場合には、前の物体領域と
現物体領域とに差分が存在しないとき静止物体領域が検
出される。In this object extraction device, two reference frames are prepared for each current frame from which an object is to be extracted, and the current frame and the first frame are obtained by a first difference image between the current frame and the first reference frame. A first common background area commonly used for the first and second reference frames is determined, and a second difference image between the current frame and the second reference frame is used to determine a current frame and a second reference frame. And a second common background area commonly used is determined. First
Since both the second difference image and the second difference image commonly include the object region on the current frame, in the region that does not belong to either the first common background region or the second common background region,
By detecting a region included in the in-figure image of the current frame, an object region on the current frame is extracted. When this object region corresponds to a stationary object, a stationary object region is detected when there is no difference between the previous object region and the current object region.

【００１８】このようにして、時間的に異なる参照フレ
ームに基づいて決定された複数の共通背景領域にいずれ
にも属さない領域を抽出対象物体と決定して物体の追跡
を行うことにより、目的の物体以外の周囲の余分な動き
に影響を受けずに、目的の物体を精度良く抽出・追跡す
ることが可能となる。In this way, by determining an area that does not belong to any of the plurality of common background areas determined based on temporally different reference frames as an object to be extracted and tracking the object, The target object can be accurately extracted and tracked without being affected by the extra surrounding movement other than the object.

【００１９】また、前記第１および第２の各参照フレー
ムと前記現フレームとの間で背景の動きが相対的に零と
なるように、前記各参照フレームまたは現フレームの背
景の動きを補正する背景補正手段をさらに具備すること
が好ましい。この背景補正手段を図形設定手段の入力
段、あるいは背景領域決定手段の入力段のいずれかに設
けることにより、例えばカメラをパンした時などのよう
に背景映像が連続するフレーム間で徐々に変化するよう
な場合であってもそれらフレーム間で背景映像を擬似的
に一定にすることができる。よって、現フレームと第１
または第２の参照フレームとの差分を取ることによっ
て、それらフレーム間で背景を相殺することが可能とな
り、背景変化に影響されない共通背景領域検出処理およ
び物体領域抽出処理を行うことができる。背景補正手段
は動き補償処理によって実現できる。The background movement of each of the reference frames or the current frame is corrected so that the background movement between the first and second reference frames and the current frame is relatively zero. It is preferable to further include a background correction unit. By providing this background correction means in either the input stage of the graphic setting device or the input stage of the background region determining device, the background image gradually changes between successive frames, for example, when the camera is panned. Even in such a case, the background video can be pseudo-constant between the frames. Therefore, the current frame and the first
Alternatively, by taking the difference from the second reference frame, the background can be canceled between those frames, and the common background area detection processing and the object area extraction processing that are not affected by the background change can be performed. The background correction means can be realized by a motion compensation process.

【００２０】また、前記背景領域決定手段は、前記現フ
レームと前記第１または第２の参照フレームとの差分画
像の中で、前記現フレームの図形内画像または前記第１
または第２の参照フレームの図形内画像に属する領域の
輪郭線近傍における各画素の差分値を検出する手段と、
前記輪郭線近傍の各画素の差分値を用いて、前記共通の
背景領域と判定すべき差分値を決定する手段とを具備
し、この決定された差分値を背景／物体領域判定のため
のしきい値として使用して、前記差分画像から前記共通
の背景領域を決定するように構成することが好ましい。
このように輪郭線近傍における各画素の差分値に着目す
ることにより、差分画像全体を調べることなく、容易に
しきい値を決定することが可能となる。The background region determining means may include, in a difference image between the current frame and the first or second reference frame, an image in a figure of the current frame or the first image.
Or means for detecting a difference value of each pixel in the vicinity of an outline of a region belonging to the image in the figure of the second reference frame;
Means for determining a difference value to be determined as the common background area using a difference value of each pixel in the vicinity of the contour line, and determining the determined difference value for a background / object area determination. Preferably, the common background area is determined from the difference image by using the threshold value as a threshold value.
By focusing on the difference value of each pixel in the vicinity of the contour as described above, it is possible to easily determine the threshold value without checking the entire difference image.

【００２１】また、前記図形設定手段は、前記参照フレ
ームの図形内画像を複数のブロックに分割する手段と、
複数のブロックそれぞれについて、前記入力フレームと
の誤差が最小となる前記入力フレーム上の領域を探索す
る手段と、探索された複数の領域を囲む図形を前記入力
フレームに設定する手段とから構成することが好まし
い。これにより、初期設定された図形の形状や大きさに
よらず、対象となる入力フレームに最適な新たな図形を
設定することが可能となる。The figure setting means includes means for dividing the image in the figure of the reference frame into a plurality of blocks;
Means for searching for an area on the input frame that minimizes an error with respect to the input frame for each of the plurality of blocks; and means for setting a figure surrounding the searched areas to the input frame. Is preferred. This makes it possible to set a new graphic that is optimal for the target input frame, regardless of the initially set shape and size of the graphic.

【００２２】また、本発明は、既に物体領域を抽出した
フレームから、物体抽出対象となる現フレーム上の物体
の位置または形状を予測する予測手段と、この予測手段
によって予測された現フレーム上の物体の位置または形
状に基づいて、前記背景領域決定手段によって使用すべ
き前記第１および第２の参照フレームを選択する手段と
をさらに具備する。Also, the present invention provides a predicting means for predicting the position or shape of an object on a current frame from which an object is to be extracted from a frame from which an object area has already been extracted, Means for selecting the first and second reference frames to be used by the background area determining means based on the position or shape of the object.

【００２３】このように、使用すべき参照フレームとし
て適切なフレームを選択することにより、常に良好な抽
出結果を得ることが可能となる。As described above, by selecting an appropriate frame as a reference frame to be used, it is possible to always obtain a good extraction result.

【００２４】ここで、Ｏ_i，Ｏ_j，Ｏ_currをそれぞれ参
照フレームｆ_i，ｆ_j及び抽出対象の現フレームｆcurr
の物体とすると、正しく物体の形状を抽出するための最
適な参照フレームｆ_i，ｆ_jとは、（Ｏ_i∩Ｏ_j）⊆ Ｏ_curr を満たすフレーム、つまり、Ｏ_i，Ｏ_jの交わり部分が
Ｏ_curr内に属するようなフレームｆ_i，ｆ_jである。Here, O _i , O _j , and O _curr are respectively referred to as reference frames f _i , f _j and a current frame f _curr to be extracted.
, The optimal reference frames f _i and f _j for correctly extracting the shape of the object are frames satisfying (O _i ∩O _j ) ⊆O _curr , that is, the intersection of O _i and O _j frame f _i but as belonging to the O _curr, is f _j.

【００２５】また、本発明は、互いに異なる方法によっ
て物体抽出を行う複数の物体抽出手段を設け、これら物
体抽出手段を選択的に切替ながら物体抽出を行うことを
特徴とする。この場合、現フレームと、この現フレーム
とは時間的にずれた少なくとも２つの参照フレームそれ
ぞれの差分を用いて物体抽出を行う第１の物体抽出手段
と、フレーム間予測を使用して既に物体抽出が行われた
フレームから現フレームの物体領域を予測することによ
り物体抽出を行う第２の物体抽出手段とを組み合わせて
使用することが望ましい。これにより、物体が部分的に
静止していて参照フレームとの差分が検出できないとき
でも、フレーム間予測を用いた物体抽出手段によってそ
れを補うことが可能となる。Further, the present invention is characterized in that a plurality of object extracting means for extracting objects by different methods are provided, and the object is extracted while selectively switching these object extracting means. In this case, first object extraction means for extracting an object using a difference between the current frame and at least two reference frames which are temporally shifted from the current frame, and object extraction using inter-frame prediction. It is desirable to use in combination with a second object extracting means for extracting an object by estimating an object area of the current frame from the frame on which the image processing has been performed. Thereby, even when the object is partially stationary and a difference from the reference frame cannot be detected, the difference can be compensated for by the object extracting means using inter-frame prediction.

【００２６】また、複数の物体抽出手段を設けた場合に
は、物体抽出対象となる現フレームから、その少なくと
も一部の領域についての画像の特徴量を抽出する手段を
さらに具備し、前記抽出された特徴量に基づいて、前記
複数の物体抽出手段を切り替えることが好ましい。In the case where a plurality of object extracting means are provided, the apparatus further comprises means for extracting, from the current frame from which the object is to be extracted, an image feature of at least a part of the area. It is preferable that the plurality of object extracting means be switched based on the characteristic amount.

【００２７】例えば、背景の動きがあるかどうか予めわ
かるならば、その性質を使った方がよい。背景の動きが
ある場合は、背景動き補償を行なうが、完全に補償でき
るとは限らない。複雑な動きをするフレームではほとん
ど補償できないこともある。このようなフレームは、背
景動き補償の補償誤差量によって予め選別できるので、
参照フレーム候補にしないなどの工夫が可能である。し
かし、背景の動きがない場合は、この処理は不必要であ
る。別の物体が動いていると、誤った背景動き補償を行
なったり、参照フレーム候補から外れたりして参照フレ
ーム選択条件に最適なフレームであっても選ばれず、抽
出精度が落ちることがあるからである。また、一つの画
像中にも多様な性質が混在していることがある。物体の
動きやテクスチャも部分的に異なり、同じ追跡・抽出方
法及び装置やパラメータではうまく抽出できないことが
ある。従って、ユーザが画像中の特殊な性質を持つ一部
を指定したり、画像中の違いを自動的に特徴量として検
出し、その特徴量に基づいて、例えばフレーム内のブロ
ック単位などで部分的に追跡・抽出方法を切替えて抽出
したり、パラメータを変更した方がよい。For example, if it is known in advance whether there is a background movement, it is better to use that property. If there is background motion, background motion compensation is performed, but it cannot always be completely compensated. In some cases, frames that have complicated movements can hardly compensate. Since such a frame can be selected in advance by the amount of compensation error of the background motion compensation,
It is possible to devise, for example, not to use as a reference frame candidate. However, if there is no background movement, this process is unnecessary. If another object is moving, wrong frame motion compensation may be performed, or the frame may be deviated from the reference frame candidate, so that the optimal frame for the reference frame selection condition may not be selected, and the extraction accuracy may be reduced. is there. Also, various properties may be mixed in one image. The movement and texture of the object are also partially different, and the same tracking / extracting method, apparatus, and parameters may not be able to extract well. Therefore, the user designates a part having a special property in the image, or automatically detects a difference in the image as a feature amount, and based on the feature amount, for example, a partial block unit in a frame or the like. It is better to switch the tracking / extraction method to extract or change the parameters.

【００２８】このようにして、画像の特徴量に基づい
て、複数の物体抽出手段を切替えれば、様々な画像中の
物体の形状を精度良く抽出することが可能になる。In this way, if a plurality of object extracting means are switched based on the feature amount of an image, it becomes possible to extract the shapes of objects in various images with high accuracy.

【００２９】また、現フレームと、この現フレームと時
間的にずれた少なくとも２つの参照フレームとの差分を
用いた第１の物体抽出手段と、フレーム間予測を使用し
た第２の物体抽出手段とを組み合わせて使用する場合に
は、第２の物体抽出手段による予測誤差が所定の範囲内
であるときは、第２の物体抽出手段による抽出結果が物
体領域として使用され、予測誤差が所定の範囲を越えた
ときは第１の物体抽出手段による抽出結果が物体領域と
して使用されるように、予測誤差量に基づいて、フレー
ム内のブロック単位で第１および第２の物体抽出手段を
選択的に切り替えて使用することが望ましい。A first object extracting means using a difference between the current frame and at least two reference frames temporally shifted from the current frame, and a second object extracting means using inter-frame prediction. When the prediction error by the second object extraction unit is within a predetermined range, the extraction result by the second object extraction unit is used as an object region, and the prediction error is Is exceeded, the first and second object extraction means are selectively selected on a block-by-block basis based on the prediction error amount so that the result of extraction by the first object extraction means is used as an object area. It is desirable to use it by switching.

【００３０】また、第２の物体抽出手段は、参照フレー
ムと物体抽出対象となる現フレームとの間のフレーム間
隔が所定フレーム以上あくように入力フレーム順とは異
なる順序でフレーム間予測を行うことを特徴とする。こ
れにより、入力フレーム順でフレーム間予測を順次行う
場合に比べフレーム間の動き量が大きくなるため、予測
精度を向上でき、結果的に抽出精度を高めることが可能
となる。The second object extracting means performs inter-frame prediction in a different order from the input frame order so that a frame interval between the reference frame and the current frame from which the object is to be extracted is longer than a predetermined frame. It is characterized by. As a result, the amount of motion between frames is larger than when inter-frame prediction is sequentially performed in the order of input frames, so that prediction accuracy can be improved, and as a result, extraction accuracy can be improved.

【００３１】すなわち、フレームの間隔によっては動き
が小さ過ぎたり、複雑過ぎて、フレーム間予測による形
状予測手法では対応できないことがある。従って、例え
ば形状予測の誤差が閾値以下にならない場合は、予測に
用いる抽出済みフレームとの間隔をあけることにより、
予測精度が上がり、結果的に抽出精度が向上する。ま
た、背景に動きがある場合は、参照フレーム候補は抽出
フレームとの背景動きを求め補償するが、背景の動きが
フレームの間隔によっては小さ過ぎたり複雑過ぎたりし
て、背景動き補償が精度良くできない場合がある。この
場合もフレーム間隔をあけることによって動き補償精度
を上げることができる。このようにして抽出フレームの
順序を適応的に制御すれば、より確実に物体の形状を抽
出することが可能になる。That is, depending on the frame interval, the motion is too small or too complicated, and the shape prediction method based on inter-frame prediction may not be compatible. Therefore, for example, if the error of the shape prediction does not fall below the threshold, by spacing the extracted frame used for prediction,
The prediction accuracy increases, and as a result, the extraction accuracy improves. If there is motion in the background, the reference frame candidate calculates and compensates for the background motion with the extracted frame, but the background motion is too small or too complicated depending on the frame interval, and the background motion compensation is performed with high accuracy. It may not be possible. Also in this case, the motion compensation accuracy can be increased by providing a frame interval. By adaptively controlling the order of the extraction frames in this way, it is possible to more reliably extract the shape of the object.

【００３２】また、本発明は、動画像データと、その動
画像データを構成する複数フレーム内の所定フレーム上
における物体領域を表すシェイプデータとを入力し、そ
のシェイプデータを用いて前記動画像データから物体領
域を抽出する物体抽出装置において、前記動画像データ
が記録されている記憶装置から前記動画像データを読み
出し、前記シェイプデータを動き補償することにより、
前記読み出した動画像データを構成する各フレーム毎に
シェイプデータを生成する手段と、前記生成されたシェ
イプデータによって決定される各フレームの背景領域の
画像データを背景メモリ上に逐次上書きすることによっ
て、前記動画像データの背景画像を生成する手段と、前
記動画像データが記録されている記憶装置から前記動画
像データを再度読み出し、前記読み出した動画像データ
を構成する各フレーム毎に前記背景メモリ上に蓄積され
ている背景画像の対応する画素との差分を求め、差分の
絶対値が所定のしきい値よりも大きい画素を物体領域と
して決定する手段とを具備する。Further, according to the present invention, moving image data and shape data representing an object region on a predetermined frame among a plurality of frames constituting the moving image data are input, and the moving image data is input using the shape data. In an object extraction device that extracts an object region from, by reading the moving image data from the storage device in which the moving image data is recorded, by motion-compensating the shape data,
Means for generating shape data for each frame constituting the read moving image data, and by sequentially overwriting the image data of the background area of each frame determined by the generated shape data on the background memory, Means for generating a background image of the moving image data, and reading the moving image data again from a storage device in which the moving image data is recorded, and reading the moving image data on the background memory for each frame constituting the read moving image data Means for determining a difference between a corresponding pixel of the background image stored in the image data and a pixel having an absolute value of the difference larger than a predetermined threshold value as an object area.

【００３３】この物体抽出装置においては、記憶装置か
らの動画像データを読み出す１回目のスキャン処理に
て、背景画像が背景メモリ上に生成される。次いで、２
回目のスキャン処理が行われ、１回目のスキャンで完成
された背景画像を用いた物体領域の抽出が行われる。こ
のようにして、動画像データが記憶装置に蓄積されてい
ることを利用して動画像データを２回スキャンすること
により、動画像シーケンスの最初から十分に高い精度で
物体領域を抽出することが可能となる。In this object extraction device, a background image is generated on a background memory in the first scanning process for reading out moving image data from a storage device. Then 2
A second scan process is performed, and an object region is extracted using the background image completed in the first scan. In this manner, by scanning the moving image data twice using the fact that the moving image data is stored in the storage device, the object region can be extracted with sufficiently high accuracy from the beginning of the moving image sequence. It becomes possible.

【００３４】また、本発明は、前記各フレームのシェイ
プデータによって決定される物体領域と、前記背景画像
との差分の絶対値に基づいて決定される物体領域のいず
れかを物体抽出結果として選択的に出力する手段をさら
に具備する。画像によっては、１スキャン目で得たシェ
イプデータによって決定される物体領域の方が、２スキ
ャン目で、背景画像との差分を利用して得た物体領域よ
りも抽出精度が高い場合がある。したがって、１スキャ
ン目で得た物体領域と２スキャン目で得た物体領域とを
選択的に出力できるようにすることにより、さらに抽出
精度の向上を図ることが可能となる。Further, according to the present invention, one of an object area determined based on the shape data of each frame and an object area determined based on an absolute value of a difference between the background image and the background image is selectively obtained as an object extraction result. Means for outputting to For some images, the object area determined by the shape data obtained in the first scan may have higher extraction accuracy in the second scan than the object area obtained by using the difference from the background image in the second scan. Therefore, by selectively outputting the object region obtained in the first scan and the object region obtained in the second scan, the extraction accuracy can be further improved.

【００３５】また、本発明は、動画像データと、この動
画像データを構成する複数のフレーム内の所定フレーム
上の物体領域を表すシェイプデータとを入力し、前記シ
ェイプデータが与えられたフレームあるいは既にシェイ
プデータを求めたフレームを参照フレームとして使用す
ることにより、前記各フレームのシェイプデータを逐次
求めていく物体抽出装置であって、現処理フレームをブ
ロックに分割する手段と、前記ブロック毎に、画像デー
タの図柄が相似であり、且つ面積が現処理ブロックより
も大きい相似ブロックを前記参照フレームから探索する
手段と、前記参照フレームから相似ブロックのシェイプ
データを切り出して縮小したものを、前記現処理フレー
ムの各ブロックに貼り付ける手段と、前記貼り付けられ
たシェイプデータを現処理フレームのシェイプデータと
して出力する手段とを具備する。Further, according to the present invention, moving image data and shape data representing an object region on a predetermined frame among a plurality of frames constituting the moving image data are input, and a frame or a frame given the shape data is input. An object extraction device that sequentially obtains shape data of each frame by using a frame for which shape data has already been obtained as a reference frame, and a unit that divides a current processing frame into blocks, and for each of the blocks, Means for searching the reference frame for a similar block having a pattern similar to that of the image data and having a larger area than the current processing block; and extracting and reducing the shape data of the similar block from the reference frame to the current processing. Means for attaching to each block of the frame, and the attached shape data And means for outputting a shape data of the current processing frame.

【００３６】この物体抽出装置においては、物体抽出対
象の現フレームの各ブロック毎に、画像データ（テクス
チャ）の図柄が相似であり、且つ面積が現処理ブロック
よりも大きい相似ブロックの探索処理と、探索された相
似ブロックのシェイプデータを切り出して縮小したもの
を現処理フレームのブロックに貼り付ける処理とが行わ
れる。このように現処理ブロックよりも大きい相似ブロ
ックのシェイプデータを縮小して張り付けることによ
り、シェイプデータで与えられる物体領域の輪郭線がず
れていてもそれを正しい位置に補正することが可能とな
る。したがって、例えばユーザがマウスなどで最初のフ
レーム上の物体領域の輪郭を大まかになぞったものをシ
ェイプデータとして与えるだけで、以降の入力フレーム
全てにおいて物体領域を高い精度で抽出することが可能
となる。In this object extracting apparatus, a search process for a similar block whose image data (texture) pattern is similar and whose area is larger than the current processing block is performed for each block of the current frame from which the object is to be extracted. A process of cutting out the reduced shape data of the searched similar block and pasting it to the block of the current processing frame is performed. In this way, by shrinking and pasting the shape data of a similar block larger than the current processing block, even if the contour line of the object area given by the shape data is shifted, it can be corrected to the correct position. . Therefore, for example, by simply giving, as shape data, a user roughly tracing the outline of the object region on the first frame with a mouse or the like, the object region can be extracted with high accuracy in all subsequent input frames. .

【００３７】また、本発明は、画像データと、その画像
の物体領域を表すシェイプデータを入力し、そのシェイ
プデータを用いて前記画像データから物体領域を抽出す
る物体抽出装置において、前記シェイプデータの輪郭部
分にブロックを設定し、各ブロック毎に、前記画像デー
タの図柄が相似であり、且つ前記ブロックよりも大きい
相似ブロックを同じ画像の中から探索する手段と、前記
各ブロックのシェイプデータを各々の前記相似ブロック
のシェイプデータを縮小したもので置き換える手段と、
前記置き換えを所定の回数だけ繰り返す手段と、前記置
き換えを繰り返されたシェイプデータを補正されたシェ
イプデータとして出力する手段とを具備する。According to the present invention, there is provided an object extracting apparatus for inputting image data and shape data representing an object region of the image, and extracting the object region from the image data using the shape data. Means for setting blocks in the outline portion, for each block, searching for a similar block in which the pattern of the image data is similar and larger than the block from the same image; and Means for replacing the shape data of the similar block with a reduced one;
Means for repeating the replacement for a predetermined number of times; and means for outputting the shape data, for which the replacement has been repeated, as corrected shape data.

【００３８】このようにフレーム内のブロックマッチン
グによって、相似ブックを用いた置き換え処理を行うこ
とにより、シェイプデータによって与えられる輪郭線を
正しい位置に補正することが可能となる。また、フレー
ム内のブロックマッチングであるので相似ブックの探索
および置き換えを同一ブロックについて繰り返し行うこ
とができ、これにより補正精度をさらに高めることが可
能となる。As described above, by performing the replacement process using the similar book by the block matching in the frame, it becomes possible to correct the contour line given by the shape data to a correct position. In addition, since block matching is performed within a frame, similar book search and replacement can be repeatedly performed for the same block, thereby further improving the correction accuracy.

【００３９】[0039]

【発明の実施の形態】図１には、本発明の第１実施形態
に係る動画像の物体追跡／抽出装置の全体の構成が示さ
れている。この物体追跡／抽出装置は、入力動画像信号
から目的とする物体の動きを追跡するためのものであ
り、初期図形設定部１と、物体追跡・抽出部２とから構
成されている。初期図形設定部１は、外部からの初期図
形設定指示信号ａ０に基づいて、追跡／抽出対象となる
目的の物体を囲むような図形を入力動画像信号ａ１に対
して初期設定するために使用され、初期図形設定指示信
号ａ０によって例えば長方形、円、楕円などの任意の形
状の図形が目的の物体を囲むように入力動画像信号ａ１
の初期フレーム上に設定される。初期図形設定指示信号
ａ０の入力方法としては、例えば入力動画像信号ａ１を
表示する画面上に利用者がペンやマウスなどのポインテ
ィングデバイスを用いて図形そのものを直接書き込んだ
り、あるいはそれらポインティングデバイスを用いて入
力する図形の位置や大きさを指定するなどの手法を用い
ることができる。これにより、目的の物体が現れる初期
フレーム画像上において、追跡／抽出対象となる物体を
外部から容易に指示することが可能となる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows an overall configuration of a moving picture object tracking / extracting apparatus according to a first embodiment of the present invention. This object tracking / extracting device is for tracking the movement of a target object from an input moving image signal, and includes an initial figure setting unit 1 and an object tracking / extracting unit 2. The initial figure setting unit 1 is used for initializing a figure surrounding a target object to be tracked / extracted with respect to the input moving image signal a1, based on an external figure setting instruction signal a0 from outside. The input moving image signal a1 is generated by the initial figure setting instruction signal a0 such that a figure having an arbitrary shape such as a rectangle, a circle, and an ellipse surrounds a target object.
Is set on the initial frame. As an input method of the initial figure setting instruction signal a0, for example, a user directly writes a figure itself using a pointing device such as a pen or a mouse on a screen displaying the input moving image signal a1, or uses these pointing devices. For example, a method of designating the position and size of a figure to be input can be used. As a result, it is possible to easily specify the object to be tracked / extracted from the outside on the initial frame image where the target object appears.

【００４０】また、ユーザによる図形入力ではなく、図
形の初期設定は、通常のフレーム画像を解析する処理に
よって例えば人や動物の顔、体の輪郭などを検出し、そ
れを囲むように図形を自動設定することによっても実現
できる。Instead of inputting a figure by the user, the figure is initially set by, for example, detecting a face or body contour of a person or an animal by processing for analyzing a normal frame image, and automatically setting the figure so as to surround it. It can also be realized by setting.

【００４１】物体追跡・抽出部２は、初期図形設定部１
で設定された図形内に含まれる図形内画像を基準として
物体の追跡および抽出を行う。この場合、動物体の追跡
・抽出処理では、図形で指定された物体に着目してその
物体の動きが追跡される。従って、目的とする動物体以
外の周囲の余分な動きに影響を受けずに目的とする動物
体の抽出／追跡を行える。The object tracking / extracting section 2 includes an initial figure setting section 1
The tracking and extraction of the object are performed based on the image in the figure included in the figure set in the step. In this case, in the moving object tracking / extraction process, the movement of the object is tracked by focusing on the object specified by the figure. Therefore, the target moving object can be extracted / tracked without being affected by extra surrounding movement other than the target moving object.

【００４２】図２には、物体追跡・抽出部２の好ましい
構成の一例が示されている。FIG. 2 shows an example of a preferred configuration of the object tracking / extracting section 2.

【００４３】この物体追跡・抽出部は、図示のように、
メモリ（Ｍ）１１，１４、図形設定部１１、背景領域決
定部１２、および物体抽出部１３から構成されている。This object tracking / extracting unit, as shown in FIG.
It comprises memories (M) 11 and 14, a figure setting unit 11, a background area determination unit 12, and an object extraction unit 13.

【００４４】図形設定部１１は、これまでに入力および
図形設定した任意のフレームを参照フレームとして使用
しながら入力フレームに対して順次図形を設定するため
に使用される。この図形設定部１１は、現フレーム画像
１０１、参照フレームの図形内画像およびその位置１０
３と、参照フレームの物体抽出結果１０６を入力し、現
フレームの任意の図形で囲まれた領域内を表す画像１０
２を出力する。すなわち、図形設定部１１による図形設
定処理では、参照フレームの図形内画像１０３と現フレ
ーム画像１０１との相関に基づいて、参照フレームの図
形内画像１０３との誤差が最小となる現フレーム画像上
の領域が探索され、その領域を囲むような図形が現フレ
ーム画像１０１に対して設定される。設定する図形は、
長方形、円、楕円、エッジで囲まれた領域、など、何で
も良い。以下では、簡単のために長方形の場合について
述べる。また、図形設定部１１の具体的な構成について
は、図５を参照して後述する。なお、物体を囲む図形を
用いない場合は、図形内画像は全画像とし、位置を入出
力する必要がない。The graphic setting section 11 is used to sequentially set a graphic for an input frame while using an arbitrary frame which has been input and set as a reference frame. The figure setting unit 11 includes a current frame image 101, an image in a figure of a reference frame, and its position 10
3 and an object extraction result 106 of the reference frame are input, and an image 10 representing the inside of an area surrounded by an arbitrary figure of the current frame is input.
2 is output. That is, in the graphic setting process by the graphic setting unit 11, based on the correlation between the image 103 in the graphic of the reference frame and the current frame image 101, the error in the image 103 in the graphic of the reference frame is minimized. An area is searched, and a figure surrounding the area is set for the current frame image 101. The shape to set is
Anything may be used, such as a rectangle, a circle, an ellipse, and an area surrounded by an edge. Hereinafter, a case of a rectangle will be described for simplicity. The specific configuration of the graphic setting unit 11 will be described later with reference to FIG. When a figure surrounding an object is not used, the image in the figure is an entire image, and there is no need to input and output positions.

【００４５】メモリ１０には、これまでに入力および図
形設定されたフレームが少なくとも３つ程度保持され
る。保持される情報は、図形設定されたフレームの画
像、設定された図形の位置や形状、図形内の画像などで
ある。また、入力フレームの画像全体ではなく、その図
形内画像だけを保持するようにしても良い。The memory 10 holds at least about three frames which have been input and set as graphics. The information held includes the image of the frame in which the figure is set, the position and shape of the set figure, the image in the figure, and the like. Further, not the entire image of the input frame but only the image in the figure may be held.

【００４６】背景領域決定部１２は、物体抽出対象の現
フレーム毎にその現フレームとは時間的に異なるフレー
ムの中の少なくとも二つの任意のフレームを参照フレー
ムとして使用し、各参照フレーム毎に現フレームとの差
分を取ることによってそれら各参照フレームと現フレー
ム間の共通の背景領域を決定する。この背景領域決定部
１２は、メモリ１０で保持された、現フレームの任意の
図形内画像およびその位置１０２と、少なくとも２つの
フレームの任意の図形内画像およびその位置１０３と、
該少なくとも２つのフレームの物体抽出結果１０６を入
力し、現フレームと該少なくとも２つのフレームそれぞ
れの図形内画像との共通の背景領域１０４を出力する。
すなわち、参照フレームとして第１および第２の２つの
フレームを使用する場合には、現フレームと第１の参照
フレームとの間のフレーム間差分を取ることなどによっ
て得られた第１の差分画像により、現フレームと第１の
参照フレームのどちらにおいても背景領域として用いら
れている共通の第１の背景領域が決定されると共に、現
フレームと第２の参照フレームとの間のフレーム間差分
を取ることなどによって得られた第２の差分画像によ
り、現フレームと第２の参照フレームのどちらにおいて
も背景領域として用いられている共通の第２の背景領域
が決定されることになる。この背景領域決定部１２の具
体的な構成は、図４を参照して後述する。また、背景メ
モリを使って、共通の背景を得る手法もある。The background area determination unit 12 uses at least two arbitrary frames among frames temporally different from the current frame for each current frame from which an object is to be extracted as reference frames, and uses the current frame for each reference frame. The common background area between each of these reference frames and the current frame is determined by taking the difference from the frame. The background region determination unit 12 stores an arbitrary in-graphic image of the current frame and its position 102 and an arbitrary in-graphic image of at least two frames and its position 103 held in the memory 10;
The object extraction result 106 of the at least two frames is input, and a common background area 104 of the current frame and the in-graphic image of each of the at least two frames is output.
That is, when the first and second two frames are used as reference frames, the first difference image obtained by taking an inter-frame difference between the current frame and the first reference frame or the like is used. A common first background area used as a background area is determined in both the current frame and the first reference frame, and an inter-frame difference between the current frame and the second reference frame is calculated. The common second background area used as the background area in both the current frame and the second reference frame is determined based on the second difference image obtained as a result. The specific configuration of the background area determination unit 12 will be described later with reference to FIG. There is also a method of using a background memory to obtain a common background.

【００４７】なお、物体を囲む図形を用いない場合は、
図形内画像は全画像とし、位置を入出力する必要がな
い。When a figure surrounding an object is not used,
The images in the figure are all images, and there is no need to input and output positions.

【００４８】物体抽出部１３は、背景領域決定部１２に
て決定された共通の背景領域を用いて現フレームの図形
内画像から物体領域のみを抽出するために使用され、該
現フレームと少なくとも２つのフレームそれぞれとの共
通の背景領域１０４を入力し、現フレームの物体抽出結
果１０６を出力する。第１および第２の差分画像のどち
らにも現フレーム上の物体領域が共通に含まれているた
め、第１の共通背景領域と第２の共通背景領域のどちら
にも属さない領域の中で、現フレームの図形内画像に含
まれる領域を検出することにより、現フレーム上の物体
領域が抽出される。これは、共通の背景領域以外の領域
が物体領域の候補となることを利用している。つまり、
第１の差分画像上においては第１の共通背景領域以外の
領域が物体領域候補となり、第２の差分画像上において
は第２の共通背景領域以外の領域が物体領域候補となる
ので、２つの物体領域候補の重複する領域を、現フレー
ムの物体領域と判定することができる。物体抽出結果１
０６としては、物体領域の位置や形状を示す情報を使用
することができる。また、その情報を用いて実際に現フ
レームから物体領域の画像を取り出すようにしても良
い。The object extracting unit 13 is used to extract only the object region from the image in the figure of the current frame by using the common background region determined by the background region determining unit 12. A common background area 104 for each of the three frames is input, and an object extraction result 106 of the current frame is output. Since the first and second difference images both include the object region on the current frame in common, a region that does not belong to either the first common background region or the second common background region. , An object region on the current frame is extracted by detecting a region included in the in-figure image of the current frame. This utilizes the fact that areas other than the common background area are object area candidates. That is,
On the first difference image, regions other than the first common background region are object region candidates, and on the second difference image, regions other than the second common background region are object region candidates. The overlapping region of the object region candidates can be determined as the object region of the current frame. Object extraction result 1
As 06, information indicating the position and shape of the object area can be used. Further, an image of the object area may be actually extracted from the current frame using the information.

【００４９】メモリ１４は、少なくとも２つの物体抽出
結果を保持し、既に抽出されている結果をフイードバッ
クして抽出精度をあげるために用いられる。The memory 14 holds at least two object extraction results, and is used to feed back already extracted results to improve extraction accuracy.

【００５０】ここで、図８を参照して、本実施形態で用
いられる物体抽出・追跡処理の方法について説明する。Here, with reference to FIG. 8, a method of object extraction / tracking processing used in the present embodiment will be described.

【００５１】ここでは、時間的に連続する３つのフレー
ムｆ（ｉ−１），ｆ（ｉ），ｆ（ｉ＋１）を用いて、現
フレームｆ（ｉ）から物体を抽出する場合を例示して説
明する。Here, a case where an object is extracted from the current frame f (i) using three temporally continuous frames f (i-1), f (i), f (i + 1) will be exemplified. explain.

【００５２】まず、前述の図形設定部１１によって図形
設定処理が行われる。３つのフレームｆ（ｉ−１），ｆ
（ｉ），ｆ（ｉ＋１）についてもそれぞれ任意の参照フ
レームを使用することにより図形設定処理が行われ、そ
のフレーム上の物体を囲むように長方形Ｒ（ｉ−１），
Ｒ（ｉ），Ｒ（ｉ＋１）が設定される。なお、長方形の
図形Ｒ（ｉ−１），Ｒ（ｉ），Ｒ（ｉ＋１）は位置およ
び形状の情報であり、画像として存在するものではな
い。First, the figure setting section 11 performs a figure setting process. Three frames f (i-1), f
For each of (i) and f (i + 1), a figure setting process is performed by using an arbitrary reference frame, and a rectangle R (i−1),
R (i) and R (i + 1) are set. Note that rectangular figures R (i-1), R (i), and R (i + 1) are information on positions and shapes, and do not exist as images.

【００５３】次に、背景領域決定部１２にて共通背景領
域が決定される。Next, a common background area is determined by the background area determination unit 12.

【００５４】この場合、まず、現フレームｆ（ｉ）と第
１の参照フレームｆ（ｉ−１）との間のフレーム間差分
が取られ、第１の差分画像ｆｄ（ｉ−１，ｉ）が求めら
れる。同様にして、現フレームｆ（ｉ）と第２の参照フ
レームｆ（ｉ＋１）との間のフレーム間差分も取られ、
第２の差分画像ｆｄ（ｉ，ｉ＋１）が求められる。In this case, first, an inter-frame difference between the current frame f (i) and the first reference frame f (i-1) is obtained, and the first difference image fd (i-1, i) is obtained. Is required. Similarly, the inter-frame difference between the current frame f (i) and the second reference frame f (i + 1) is also taken,
A second difference image fd (i, i + 1) is obtained.

【００５５】第１の差分画像ｆｄ（ｉ−１，ｉ）を得る
ことにより、現フレームｆ（ｉ）と第１の参照フレーム
ｆ（ｉ−１）とで共通の画素値を持つ部分については画
素値が相殺されるためその画素の差分値は零となる。し
たがって、フレームｆ（ｉ−１）とｆ（ｉ）の背景がほ
ぼ同様であれば、基本的には、第１の差分画像ｆｄ（ｉ
−１，ｉ）には、長方形Ｒ（ｉ−１）の図形内画像と、
長方形Ｒ（ｉ）の図形内画像とのＯＲに相当する画像が
残ることになる。この残存画像を囲む図形は、図示のよ
うに、多角形Ｒｄ（ｉ−１，ｉ）＝Ｒ（ｉ−１）ＯＲ
Ｒ（ｉ）となる。現フレームｆ（ｉ）と第１の参照フ
レームｆ（ｉ−１）の共通の背景領域は、多角形Ｒｄ
（ｉ−１，ｉ）内の実際の物体領域（ここでは、２つの
丸を一部重ねた結果得られる８の字の形状をした領域）
以外の全領域となる。By obtaining the first difference image fd (i-1, i), the portion having a common pixel value between the current frame f (i) and the first reference frame f (i-1) is obtained. Since the pixel values are canceled, the difference value of the pixel becomes zero. Therefore, if the backgrounds of the frames f (i-1) and f (i) are almost the same, basically, the first difference image fd (i)
-1, i) include an image in the figure of the rectangle R (i-1),
An image corresponding to the OR of the rectangle R (i) with the image in the figure remains. As shown in the figure, the figure surrounding the remaining image is a polygon Rd (i-1, i) = R (i-1) OR
R (i). The common background area of the current frame f (i) and the first reference frame f (i-1) is a polygon Rd
Actual object area in (i-1, i) (here, an area in the shape of a figure 8 obtained by partially overlapping two circles)
All areas other than.

【００５６】また、第２の差分画像ｆｄ（ｉ，ｉ＋１）
についても、長方形Ｒ（ｉ）の図形内画像と、長方形Ｒ
（ｉ＋１）の図形内画像とのＯＲに相当する画像が残る
ことになる。この残存画像を囲む図形は、図示のよう
に、多角形Ｒｄ（ｉ，ｉ＋１）＝Ｒ（ｉ）ＯＲＲ
（ｉ＋１）となる。現フレームｆ（ｉ）と第２の参照フ
レームｆ（ｉ＋１）の共通の背景領域は、多角形Ｒｄ
（ｉ，ｉ＋１）内の実際の物体領域（ここでは、２つの
丸を一部重ねた結果得られる８の字の形状をしたもの）
以外の全領域となる。The second difference image fd (i, i + 1)
Also, the image in the figure of the rectangle R (i) and the rectangle R
An image corresponding to the OR with the (i + 1) in-graphic image remains. As shown in the figure, the figure surrounding the remaining image is a polygon Rd (i, i + 1) = R (i) OR R
(I + 1). The common background area of the current frame f (i) and the second reference frame f (i + 1) is a polygon Rd
The actual object area in (i, i + 1) (here, the figure of 8 is obtained by partially overlapping two circles)
All areas other than.

【００５７】この後、第１の差分画像ｆｄ（ｉ−１，
ｉ）から現フレームｆ（ｉ）と第１の参照フレームｆ
（ｉ−１）の共通の背景領域を決定する処理が行われ
る。Thereafter, the first difference image fd (i−1,
i) to the current frame f (i) and the first reference frame f
The processing of determining the common background area of (i-1) is performed.

【００５８】共通背景領域・物体領域の判定のためのし
きい値となる差分値が必要になる。これは、ユーザーが
与えてもよいし、画像のノイズや性質を検出して自動設
定してもよい。その場合、一画面で一つのしきい値でな
くとも、画像中の部分的な性質に応じて部分的に決定し
てもよい。画像の性質は、エッジの強さや差分画素の分
散などが考えれれる。また、物体を追跡する図形を用い
て求めることもできる。A difference value is required as a threshold value for determining a common background area / object area. This may be provided by the user, or may be automatically set by detecting noise or properties of the image. In this case, the threshold value may not be one for one screen but may be partially determined according to a partial property in the image. The properties of the image may include the strength of the edge and the variance of the difference pixels. Further, it can also be obtained by using a figure for tracking an object.

【００５９】この場合、共通背景領域／物体領域の判定
のためのしきい値となる差分値が求められ、しきい値以
下の差分値を持つ画素の領域が共通背景領域として決定
される。このしきい値は、第１の差分画像ｆｄ（ｉ−
１，ｉ）の多角形Ｒｄ（ｉ−１，ｉ）の外側の一ライ
ン、つまり多角形Ｒｄ（ｉ−１，ｉ）の輪郭線上に沿っ
た各画素の差分値のヒストグラムを用いて決定すること
ができる。ヒストグラムの横軸は画素値（差分値）、縦
軸はその差分値を持つ画素数である。たとえば、画素数
が多角形Ｒｄ（ｉ−１，ｉ）からなる枠線上に存在する
全画素数の半分となるような差分値が、前述のしきい値
として決定される。このようにしきい値を決定すること
により、第１の差分画像ｆｄ（ｉ−１，ｉ）全体にわた
って画素値の分布を調べることなく、容易にしきい値を
決定することが可能となる。In this case, a difference value serving as a threshold value for judging the common background area / object area is obtained, and a pixel area having a difference value equal to or smaller than the threshold value is determined as the common background area. This threshold value is equal to the first difference image fd (i−
The determination is made using the histogram of the difference value of each pixel along one line outside the polygon Rd (i-1, i) of (1, i), that is, the contour of the polygon Rd (i-1, i). be able to. The horizontal axis of the histogram is the pixel value (difference value), and the vertical axis is the number of pixels having the difference value. For example, a difference value such that the number of pixels is half of the total number of pixels existing on the frame formed by the polygon Rd (i-1, i) is determined as the above-described threshold. By determining the threshold value in this manner, it is possible to easily determine the threshold value without checking the distribution of pixel values over the entire first difference image fd (i-1, i).

【００６０】次に、このしきい値を用いて、第１の差分
画像ｆｄ（ｉ−１，ｉ）の多角形Ｒｄ（ｉ−１，ｉ）内
における共通背景領域が決定される。共通背景領域以外
の領域はオクルージョンを含んだ物体領域となる。これ
により、多角形Ｒｄ（ｉ−１，ｉ）内の領域は背景領域
と物体領域に２分され、背景領域の画素値が“０”、物
体領域の画素値が“１”の２値画像に変換される。Next, a common background area in the polygon Rd (i-1, i) of the first difference image fd (i-1, i) is determined using the threshold value. Areas other than the common background area are object areas including occlusion. As a result, the region in the polygon Rd (i-1, i) is divided into a background region and an object region, and the pixel value of the background region is "0" and the pixel value of the object region is "1". Is converted to

【００６１】第２の差分画像ｆｄ（ｉ，ｉ＋１）につい
ても、同様にして、現フレームｆ（ｉ）と第２の参照フ
レームｆ（ｉ＋１）の共通の背景領域を決定する処理が
行われ、多角形Ｒｄ（ｉ，ｉ＋１）内の領域が画素値
“０”の背景領域と、画素値“１”の物体領域に変換さ
れる。Similarly, for the second difference image fd (i, i + 1), a process of determining a common background area between the current frame f (i) and the second reference frame f (i + 1) is performed. The area within the polygon Rd (i, i + 1) is converted into a background area with a pixel value of “0” and an object area with a pixel value of “1”.

【００６２】この後、物体抽出部１３による物体抽出力
が行われる。After that, the object extracting unit 13 performs the object extracting power.

【００６３】ここでは、第１および第２の差分画像との
間で、多角形Ｒｄ（ｉ−１，ｉ）内の２値画像と多角形
Ｒｄ（ｉ，ｉ＋１）内の２値画像とのＡＮＤ処理を画素
毎に行う演算処理が行われ、これによってオクルージヨ
ン入りの物体の交わりが求められ、現フレームｆ（ｉ）
上の物体Ｏ（ｉ）が抽出される。Here, between the first and second difference images, the binary image in polygon Rd (i-1, i) and the binary image in polygon Rd (i, i + 1) are compared. An arithmetic operation for performing an AND operation for each pixel is performed, whereby the intersection of the objects containing the occlusion is determined, and the current frame f (i)
The upper object O (i) is extracted.

【００６４】なお、ここでは、フレーム差分画像内の物
体領域以外の他の全ての領域を共通背景領域として求め
る場合について説明したが、各フレームから図形内画像
だけを取り出し、フレーム上での各図形内画像の位置を
考慮してそれら図形内画像同士の差分演算を行うように
してもよく、この場合には、図形外の背景領域を意識す
ることなく、多角形Ｒｄ（ｉ−１，ｉ）内、および多角
形Ｒｄ（ｉ，ｉ＋１）内の共通背景領域だけが決定され
ることになる。Here, a case has been described where all the regions other than the object region in the frame difference image are obtained as the common background region. However, only the in-figure image is extracted from each frame and each figure on the frame is extracted. The difference calculation between the images in the figure may be performed in consideration of the position of the image in the figure. In this case, the polygon Rd (i−1, i) can be calculated without being conscious of the background area outside the figure. And only the common background region within polygon Rd (i, i + 1).

【００６５】このように、本実施形態では、１）現フレームとこの現フレームに対し時間的に異なる
第１および第２の少なくとも２つの参照フレームそれぞ
れとの差分画像を求めることにより、現フレームと第１
参照フレーム間の図形内画像のＯＲと、現フレームと第
２参照フレーム間の図形内画像のＯＲとを求め、２）それら図形内画像のＯＲ処理により得られた差分画
像をＡＮＤ処理し、これによって、現フレームの図形内
画像から目的の物体領域を抽出するという、図形内画像
に着目したＯＲＡＮＤ法による物体抽出が行われる。As described above, in the present embodiment, 1) calculating the difference image between the current frame and the first and second at least two reference frames that are temporally different from the current frame, First
The OR of the image in the figure between the reference frames and the OR of the image in the figure between the current frame and the second reference frame are obtained. 2) The difference image obtained by the OR processing of the images in the figure is AND-processed. Thus, the object extraction is performed by the ORAND method focusing on the image in the figure, that is, extracting the target object region from the image in the figure of the current frame.

【００６６】また、現フレームと２つの参照フレームと
の時間関係は前述の例に限らず、例えば、現フレームｆ
（ｉ）に対して時間的に先行する２つのフレームｆ（ｉ
−ｍ），ｆ（ｉ−ｎ）を参照フレームとして使用した
り、時間的に連続する２つのフレームｆ（ｉ＋ｍ），ｆ
（ｉ＋ｎ）を参照フレームとして使用することも可能で
ある。The time relationship between the current frame and the two reference frames is not limited to the above example.
Two frames f (i) that temporally precede (i)
-M), f (in) as reference frames, or two temporally continuous frames f (i + m), f
(I + n) can also be used as a reference frame.

【００６７】例えば、図８において、フレームｆ（ｉ−
１），ｆ（ｉ）を参照フレームとして使用し、これら参
照フレームそれぞれとフレームｆ（ｉ＋１）との差分を
取って、それら差分画像に対して同様の処理を行えば、
フレームｆ（ｉ＋１）から物体を抽出することができ
る。For example, in FIG. 8, the frame f (i-
1) and f (i) are used as reference frames, the difference between each of these reference frames and the frame f (i + 1) is calculated, and the same processing is performed on the difference images.
An object can be extracted from the frame f (i + 1).

【００６８】図３には、物体追跡・抽出部２の第２の構
成例が示されている。FIG. 3 shows a second configuration example of the object tracking / extracting section 2.

【００６９】図２の構成との主な違いは、背景動き削除
部２１が設けられている点である。この背景動き削除部
２１は、各参照フレームと現フレームとの間で背景の動
きが相対的に零となるように背景の動きを補正するため
に使用される。The main difference from the configuration of FIG. 2 is that a background motion deletion unit 21 is provided. The background motion deletion unit 21 is used to correct the background motion so that the background motion is relatively zero between each reference frame and the current frame.

【００７０】以下、図３の装置について、具体的に説明
する。Hereinafter, the apparatus shown in FIG. 3 will be specifically described.

【００７１】背景動き削除部２１は、現フレーム２０１
と時間的にずれた少なくとも２つのフレームの任意の図
形内画像およびその位置２０６を入力し、時間的にずれ
た少なくとも２つのフレームの背景の動きを削除した画
像２０２を出力する。この背景動き削除部２１の具体的
な構成例については、図６で後述する。The background motion deleting unit 21 outputs the current frame 201
And an arbitrary in-graphic image of at least two frames shifted in time and its position 206 are input, and an image 202 in which background motion of at least two frames shifted in time is deleted is output. A specific configuration example of the background motion deletion unit 21 will be described later with reference to FIG.

【００７２】図形設定部２２は、図２の図形設定部１１
に対応し、現フレーム２０１と、該背景の動きを削除し
た少なくとも２つの画像２０２と、画像２０２の物休抽
出結果２０６を入力し、現フレームおよび該少なくとも
２つの画像２０２の、任意の図形に囲まれた領域内を表
す画像２０３を出力する。The graphic setting unit 22 is provided with the graphic setting unit 11 shown in FIG.
And inputs a current frame 201, at least two images 202 from which the background motion has been deleted, and a rest extraction result 206 of the image 202, and converts the current frame and the at least two images 202 into arbitrary figures. An image 203 representing the inside of the enclosed area is output.

【００７３】メモリ２６は、任意の図形内画像とその位
置を保持する。The memory 26 holds an image in an arbitrary figure and its position.

【００７４】背景領域決定部２３は、図２の背景領域決
定部１２に対応し、該任意の図形内画像およびその位置
２０３と、画像２０２の物体抽出結果２０６を入力し、
現フレームと該少なくとも２つの画像２０２との共通の
背景領域２０４を出力する。物体抽出部２４は、図２の
物体抽出部１３に対応し、該現フレームと少なくとも２
つの画像との共通の背景領域２０４を入力し、現フレー
ムの物体抽出結果２０５を出力する。メモリ２５は、少
なくとも２つの物体抽出結果を保持する。これは、図２
のメモリ１４に相当する。The background area determining section 23 corresponds to the background area determining section 12 in FIG. 2, and inputs the arbitrary figure image and its position 203 and the object extraction result 206 of the image 202.
Output a common background area 204 between the current frame and the at least two images 202. The object extracting unit 24 corresponds to the object extracting unit 13 in FIG.
A common background area 204 with one image is input, and an object extraction result 205 of the current frame is output. The memory 25 holds at least two object extraction results. This is shown in FIG.
Of the memory 14.

【００７５】このように背景動き削除部２１を設けるこ
とにより、例えばカメラをパンした時などのように背景
映像が連続するフレーム間で徐々に変化するような場合
であってもそれらフレーム間で背景映像を擬似的に一定
にすることができる。よって、現フレームと参照フレー
ムとの差分を取った時に、それらフレーム間で背景を相
殺することが可能となり、背景変化に影響されない共通
背景領域検出処理および物体領域抽出処理を行うことが
できる。By providing the background motion removing unit 21 in this manner, even if the background image gradually changes between consecutive frames, such as when the camera is panned, the background motion is deleted between those frames. The image can be made pseudo-constant. Therefore, when the difference between the current frame and the reference frame is obtained, the background can be canceled between the frames, and the common background area detection processing and the object area extraction processing that are not affected by the background change can be performed.

【００７６】なお、背景動き削除部２１を背景領域決定
部２３の入力段に設け、これにより参照フレームの背景
の動きを現フレームに合わせて削除するようにしても良
い。Note that the background motion deleting unit 21 may be provided at the input stage of the background area determining unit 23 so that the background motion of the reference frame is deleted according to the current frame.

【００７７】図４（ａ）には、背景領域決定部１２（ま
たは２３）の具体的な構成利一例が示されている。FIG. 4A shows an example of a specific configuration of the background area determination unit 12 (or 23).

【００７８】変化量検出部３１は、現フレームと前述の
第１および第２の参照フレームとの差分を取るために使
用され、現フレームと、時間的にずれたフレームの任意
の図形内画像およびその位置３０２と、時間的にずれた
フレームの物体抽出結果３０１を入力し、現フレームと
時間的にずれたフレームの任意の図形内画像間の変化量
３０３を出力する。変化量は、例えば、フレーム間の輝
度差分や色の変化、オプティカルフロー、などを用いる
ことができる。時間的にずれたフレームの物体抽出結果
を使えば、フレーム間で物体が変化しない場合でも物体
は抽出できる。例えば、変化量をフレーム間差分とする
と、物体に属するフレーム間差分ゼロの部分は、物体が
静止しているということなので、時間的にずれたフレー
ムの物体抽出結果と同じになる。The change amount detecting section 31 is used to obtain a difference between the current frame and the above-mentioned first and second reference frames. The position 302 and the object extraction result 301 of the frame shifted in time are input, and the amount of change 303 between any in-figure images of the frame shifted in time from the current frame is output. As the amount of change, for example, a luminance difference between frames, a change in color, an optical flow, or the like can be used. If an object extraction result of a frame shifted in time is used, an object can be extracted even if the object does not change between frames. For example, assuming that the amount of change is an inter-frame difference, the portion of the inter-frame difference of zero belonging to the object is the same as the object extraction result of a temporally shifted frame because the object is stationary.

【００７９】代表領域決定部３２では、現フレームの任
意の図形内画像およびその位置３０２を入力し、任意の
図形内画像の背景を代表領域３０４として出力する。代
表領域は、任意の図形内で最も背景が多いと予想される
領域を選ぶ。例えば、図８で説明した差分画像上の図形
の輪郭線などのように、図形内の最も外側に帯状の領域
を設定する。図形は物体を囲むように設定されるので、
背景となる可能性が高い。The representative area determining section 32 inputs an arbitrary graphic image of the current frame and its position 302, and outputs the background of the arbitrary graphic image as a representative area 304. As the representative area, an area that is expected to have the most background in any figure is selected. For example, a band-like region is set on the outermost side in the figure, such as the outline of the figure on the difference image described in FIG. Since the shape is set to surround the object,
It is likely to be a background.

【００８０】背景変化量決定部３３では、代表領域３０
４と、該変化量３０３を入力し、背景を判定する変化量
３０５を出力する。背景変化量の決定は、図８で説明し
たように代表領域の差分値の変化量のヒストグラムをと
り、例えば、全体の両素数の半分（過半数）以上の画素
数に相当する変化量、つまり差分値をもつ領域を背景領
域と決定する。In the background change amount determining section 33, the representative area 30
4 and the change amount 303, and outputs a change amount 305 for determining the background. The background change amount is determined by taking a histogram of the change amount of the difference value of the representative area as described with reference to FIG. 8 and, for example, a change amount corresponding to a number of pixels equal to or more than half (the majority) of both prime numbers, ie, the difference amount. An area having a value is determined as a background area.

【００８１】代表領域の背景決定部３４では、背景の変
化量３０５を入力し、代表領域の背景３０６を判定し、
出力する。代表領域の背景領域の決定は、先に決定した
背景変化量かどうかで判定する。背景領域決定部３５で
は、変化量３０３と、背景判定のしきい値３０５と、代
表領域の背景３０６を入力し、代表領域以外の領域の背
景３０７を出力する。代表領域以外の背景領域の決定
は、代表領域から成長法で行う。例えば、決定済みの画
素と図形の内部方向に隣接する未決定画素が、背景変化
量と一致すれば、背景と決定する。背景と隣接しない画
素や、背景変化量と一致しない画素は、背景以外と判定
される。また、単純に先に決定した背景変化量かどうか
で判定してもよい。このようにして、差分画像上の図形
の輪郭線から内周に向かって絞り込みを行うことによ
り、図形内画像の中でどこまでが背景領域であるかを決
定することができる。The representative area background determination unit 34 inputs the amount of change 305 of the background and determines the background 306 of the representative area.
Output. The background region of the representative region is determined based on the background change amount determined previously. The background area determination unit 35 inputs the amount of change 303, the threshold value 305 for background determination, and the background 306 of the representative area, and outputs the background 307 of an area other than the representative area. The background area other than the representative area is determined from the representative area by a growth method. For example, if an undetermined pixel adjacent to the determined pixel in the internal direction of the figure matches the background change amount, the background is determined. Pixels that are not adjacent to the background or pixels that do not match the background change amount are determined to be other than the background. Alternatively, the determination may be made simply based on whether or not the background change amount is determined beforehand. In this way, by narrowing down from the outline of the figure on the difference image toward the inner periphery, it is possible to determine how far the background area is in the image within the figure.

【００８２】また、逆に、図形の輪郭線から外側に向か
って図形からはみ出した物体領域を検出する。例えば、
背景以外と決定された画素と図形の外側方向に隣接売る
未決定画素が、背景変化量と一致しなければ、背景以外
と決定する。背景以外の画素と隣接しない画素や背景変
化量と一致する画素は背景と判定される。このようにし
て差分画像上の図形の輪郭線から外側に向かって広げる
ことにより、図形外の画像のどこまでが背景外領域であ
るかを決定することができる。この場合は、図形外でも
変化量を求める必要があるので、任意の図形を数画素太
くして充分物体がはみ出さない図形を新たに設定して、
その内部で変化量を求めるか、単純にフレーム全体で変
化量を求めてもよい。また、図形内部だけ変化量を求め
ておき、図形外部の判定の時に随時変化量を求めながら
上記処理を行ってもよい。当然、物体が図形をはみ出さ
ない場合は、例えば、輪郭線上に背景以外の画素がない
場合は、図形外部の処理を行う必要がない。On the other hand, an object area which protrudes from the figure outward from the contour of the figure is detected. For example,
If an unsettled pixel adjacent to the pixel determined to be other than the background in the outward direction of the figure does not match the background change amount, it is determined to be other than the background. Pixels that are not adjacent to pixels other than the background or pixels that match the background change amount are determined to be background. In this way, by expanding outward from the outline of the figure on the difference image, it is possible to determine how far the image outside the figure is the area outside the background. In this case, since it is necessary to obtain the change amount even outside the figure, a figure which does not protrude sufficiently from the object is newly set by thickening an arbitrary figure by several pixels, and
The change amount may be obtained inside the frame, or the change amount may be obtained simply for the entire frame. Alternatively, the amount of change may be determined only inside the figure, and the above-described processing may be performed while determining the amount of change at any time when determining the outside of the figure. Of course, when the object does not protrude from the figure, for example, when there is no pixel other than the background on the contour, there is no need to perform processing outside the figure.

【００８３】ところで、現フレームと参照フレームとの
間で物体または物体の一部が静止している場合、現フレ
ームと参照フレームとの差分が検出されず、物体の形状
が正しく抽出されない場合がある。そこで、既に抽出さ
れている参照フレームを使って現フレームの物体を検出
する方法を図４（ｂ）を参照して説明する。When the object or a part of the object is stationary between the current frame and the reference frame, the difference between the current frame and the reference frame may not be detected, and the shape of the object may not be correctly extracted. . Therefore, a method of detecting an object in the current frame using the already extracted reference frame will be described with reference to FIG.

【００８４】図４（ｂ）は、静止物体領域検出部３７を
備えた背景領域決定部１２（または２３）を示してい
る。これによると、変化量検出部３１は現フレームの図
形内画像及びその位置と、時間的にずれた少なくとも２
つのフレームの任意の図形内画像及びその位置３１１を
入力とし、現フレームと参照フレームの図形内画像の変
化量３１３を検出する。FIG. 4B shows the background area determining section 12 (or 23) including the stationary object area detecting section 37. According to this, the change amount detection unit 31 compares the image in the figure of the current frame and its position with at least two
An arbitrary in-figure image of one frame and its position 311 are input, and a change amount 313 between the in-figure image of the current frame and the reference frame is detected.

【００８５】形状予測部３６は、現フレームの図形内画
像及びその位置と、時間的にずれた少なくとも２つのフ
レームの任意の図形内画像及びその位置３１１と、既に
抽出されたフレームの画像及び物体形状３１７とを入力
とし、現フレームと時間的にずれたフレームのうち未だ
物体が抽出されていないフレームについては物体の形状
３１２を予測し、出力する。The shape estimating unit 36 calculates the image in the figure of the current frame and its position, the image in the figure and the position 311 of at least two frames which are temporally shifted, the image and the object of the frame already extracted. With the shape 317 as an input, an object shape 312 is predicted and output for a frame in which an object has not yet been extracted among frames temporally shifted from the current frame.

【００８６】静止物体領域決定部３７は予測した物体形
状３１２と、参照フレームと現フレームとの変化量３１
３と、既に抽出されたフレームの物体形状３１７を入力
とし、少なくとも２つのフレームから現フレームに対し
て静止している物体領域３１４を決定する。The stationary object region determining unit 37 calculates the predicted object shape 312 and the change amount 31 between the reference frame and the current frame.
3 and the object shape 317 of the already extracted frame, and an object region 314 that is stationary with respect to the current frame is determined from at least two frames.

【００８７】背景領域決定部３５は少なくとも２つのフ
レームに関する、現フレームに対して静止している物体
領域３１４と、参照フレームと現フレームとの変化量３
１３とを入力とし、少なくとも２つのフレームと現フレ
ームとのそれぞれの共通の背景領域３１６を決定し、出
力する。The background area determination unit 35 determines the object area 314 that is stationary with respect to the current frame and the amount of change 3 between the reference frame and the current frame for at least two frames.
13 as input, a common background area 316 for each of at least two frames and the current frame is determined and output.

【００８８】まず、参照フレームの物体が抽出されてい
る場合は、現フレームで参照フレームとのフレーム間差
分がゼロの領域について、参照フレームでは同じ位置の
領域が物体の一部であれば、現フレームでのその領域は
静止した物体の一部として抽出できる。逆に、その領域
が参照フレームでは、背景の一部であれば、現フレーム
でのその領域は背景となる。First, when the object of the reference frame has been extracted, if the inter-frame difference between the reference frame and the current frame is zero, if the region at the same position in the reference frame is a part of the object, the current frame is extracted. That region in the frame can be extracted as part of a stationary object. Conversely, if the region is a part of the background in the reference frame, the region in the current frame becomes the background.

【００８９】しかし、参照フレームの物体が未だ抽出さ
れていない場合、静止した物体または物体の一部は上記
方法では抽出できない。その場合、既に物体が抽出され
ている他のフレームを用いて、未だ抽出されていない参
照フレームの物体形状を予測して、物体の一部であるか
どうかを判定することができる。予測の方法は、画像の
符号化でよく用いられるブロックマッチング法やアフェ
イン変換法などを用いる。However, if the object of the reference frame has not been extracted yet, a stationary object or a part of the object cannot be extracted by the above method. In that case, the object shape of the reference frame that has not been extracted can be predicted using another frame from which the object has already been extracted, and it can be determined whether or not the frame is a part of the object. As a prediction method, a block matching method, an affine transformation method, or the like often used in image coding is used.

【００９０】一例として、図１３で示すようなブロック
マッチング法が考えられる。このようにして物体の形状
が予測されれば、フレーム間差分が検出されない領域に
ついて静止した物体の一部か、背景かを判定することが
可能となる。As an example, a block matching method as shown in FIG. 13 can be considered. If the shape of the object is predicted in this manner, it is possible to determine whether the region where no inter-frame difference is detected is a part of the stationary object or the background.

【００９１】物体を囲む図形を用いない場合は、図形内
画像は全画像とし、位置を入出力する必要がない。この
形状予測は、参照フレームを選択する場合の形状予測と
同じものを使うことができる。また、別の物体抽出法と
切り換える実施例では、別の物体抽出法で得た物体形状
を用いることができる。When a figure surrounding an object is not used, the image in the figure is an entire image, and there is no need to input and output positions. For this shape prediction, the same shape prediction as that used when a reference frame is selected can be used. Further, in the embodiment in which switching is performed with another object extraction method, an object shape obtained by another object extraction method can be used.

【００９２】図５には、図形設定部１１（または２２）
の具体的な構成の一例が示されている。FIG. 5 shows the figure setting section 11 (or 22).
Is shown as an example of the specific configuration of the above.

【００９３】分割部４１では、現フレームと時間的にず
れたフレームの任意の図形内画像とその位置４０２を入
力し、分割された該画像４０３を出力する。任意の図形
内画像の分割は、２等分、４等分、でもよいし、エッジ
を検出し、エッジにそった分割をおこなってもよい。以
下、簡単に、２等分とし、分割された図形は、ブロック
と呼ぶことにする。動き検出部４２では、該分割された
任意の図形内画像とその位置４０３と、現フレームの任
意の図形内画像とその位置４０１を入力し、該分割され
た画像の動きと誤差４０４を出力する。ここでは、ブロ
ックが現フレームに対応する位置を、誤差が最小になる
よう探索し、動きと誤差を求める。分割判定部４３で
は、該動きと誤差４０４と、時間的にずれたフレームの
物体抽出結果４０７を入力し、時間的にずれたフレーム
の任意の図形内画像を分割するか否かの判定結果４０６
と、分割しない場合、動き４０５を出力する。ここで
は、時間的にずれたフレームの物体抽出結果が分割され
たブロック内に含まれていなければ、そのブロックは図
形内から削除する。そうでなけれは、求めた誤差から、
誤差が閾値以上であれば、更に分割し、動きを求め直
す。そうでなければ、ブロックの動きを決定する。図形
決定部４４は、動き４０５を入力とし、現フレームの図
形内画像とその位置４０７を出力する。ここでは、図形
決定部４４は、各ブロックの、現フレームへの位置対応
を求め、対応した位置のブロックを全て含むように、新
しい図形を決定する。新しい図形は、全ブロックの結び
つきも良く、全てのブロックを含むような長方形や円で
もよい。The division unit 41 inputs an arbitrary figure image of a frame temporally shifted from the current frame and its position 402, and outputs the divided image 403. The division of an arbitrary in-figure image may be divided into two equal parts, four equal parts, or an edge may be detected and division along the edge may be performed. Hereinafter, simply, the figure is divided into two, and the divided figure is called a block. The motion detector 42 receives the divided arbitrary intra-graphic image and its position 403, the arbitrary intra-graphic image of the current frame and its position 401, and outputs the motion and error 404 of the divided image. . Here, the position of the block corresponding to the current frame is searched so as to minimize the error, and the motion and the error are obtained. The division determining unit 43 receives the motion and the error 404 and the object extraction result 407 of the temporally shifted frame, and determines whether or not to divide an arbitrary in-figure image of the temporally shifted frame.
If no division is performed, a motion 405 is output. Here, if the object extraction result of the frame shifted in time is not included in the divided block, the block is deleted from the figure. Otherwise, from the error found,
If the error is equal to or larger than the threshold, the motion is further divided and the motion is obtained again. Otherwise, determine the motion of the block. The figure determining unit 44 receives the movement 405 as an input, and outputs an image in the figure of the current frame and its position 407. Here, the figure determining unit 44 obtains the position correspondence of each block to the current frame, and determines a new figure so as to include all the blocks at the corresponding positions. The new figure may be a rectangle or a circle that includes all the blocks and has a good connection between all the blocks.

【００９４】このようにして、参照フレームの図形内画
像を複数のブロックに分割し、複数のブロックそれぞれ
について現フレームとの誤差が最小となる領域を探索
し、そして探索された複数の領域を囲む図形を現フレー
ムに設定することにより、初期設定された図形の形状や
大きさによらず、図形設定対象となる入力フレームに対
して最適な形状の新たな図形を設定することが可能とな
る。As described above, the image in the figure of the reference frame is divided into a plurality of blocks, a region where the error from the current frame is minimized for each of the plurality of blocks, and the plurality of searched regions are surrounded. By setting a graphic as the current frame, it is possible to set a new graphic having an optimal shape for the input frame to be set, regardless of the initially set shape and size of the graphic.

【００９５】なお、図形設定に用いる参照フレームは既
に図形が設定されていて且つ現フレームと時間的にずれ
たフレームであればよく、通常の符号化技術で前方向予
測と後方向予測とが使用されていることと同様、現フレ
ームよりも時間的に後のフレームを図形設定のための参
照フレームとして用いることも可能である。The reference frame used for setting the figure may be a frame in which a figure has already been set and which is temporally shifted from the current frame, and the forward prediction and the backward prediction can be performed by ordinary coding techniques. Similarly, a frame that is temporally later than the current frame can be used as a reference frame for setting a figure.

【００９６】図６には、背景動き削除部２１の具体的な
構成の一例が示されている。FIG. 6 shows an example of a specific configuration of the background motion deletion section 21.

【００９７】代表背景領域設定部５１は、時間的にずれ
た任意の図形内画像とその位置５０１を入力とし、代表
背景領域５０３を出力する。代表背景領域とは、任意の
図形内のグローバルな動き、つまり図形内における背景
の動きを代表して表す領域で、例えば、任意の図形を長
方形にした場合、図７に示すような、長方形を囲む数画
素幅の帯状の枠領域を設定する。また、図形内の外側の
数画素を使っても良い。動き検出部５２では、現フレー
ム５０２と、該代表背景領域５０３を入力し、動き５０
４を出力する。先の例を用いると、長方形周囲の帯状の
枠領域の現フレームに対する動きを検出する。枠領域を
ーつの領域として検出しても良い。また、図７のよう
に、複数のブロックに分割して動きを求め、各々の平均
動きを出力しても良いし、最も多い動きを出力しても良
い。The representative background area setting section 51 receives an arbitrary image in a figure shifted in time and its position 501 as an input, and outputs a representative background area 503. The representative background region is a region that represents the global movement in an arbitrary figure, that is, the movement of the background in the figure. For example, when an arbitrary figure is made into a rectangle, a rectangle as shown in FIG. A band-like frame area of several pixels width is set. Also, several pixels outside the figure may be used. The motion detection unit 52 receives the current frame 502 and the representative background area 503 and
4 is output. Using the above example, the motion of the band-shaped frame area around the rectangle with respect to the current frame is detected. The frame area may be detected as one area. Further, as shown in FIG. 7, the motion may be obtained by dividing the motion into a plurality of blocks, and the average motion of each motion may be output, or the motion with the largest motion may be output.

【００９８】動き補償部５３では、該時間的にずれたフ
レーム５０１と、該動き５０４を入力とし、動き補償画
像５０５を出力する。先に求めた動きを使って、時間的
にずれたフレームの動きを現フレームに合わせて削除す
る。動き補償は、ブロックマッチング動き補償またはア
フィン変換を使った動き補償でもよい。The motion compensator 53 receives the temporally shifted frame 501 and the motion 504, and outputs a motion compensated image 505. Using the motion obtained previously, the motion of the frame shifted in time is deleted according to the current frame. The motion compensation may be block matching motion compensation or motion compensation using affine transformation.

【００９９】なお、動き削除は、図形内の背景に対して
のみならず、フレーム全体に対して行うようにしても良
い。Note that the motion may be deleted not only for the background in the figure but also for the entire frame.

【０１００】以上のように、本実施形態においては、
（１）物体の輪郭ではなく、その物体を大まかに囲む図
形を用いて物体を追跡していくこと、（２）現フレーム
に任意の図形を設定し、現フレームと少なくとも２つの
フレームそれぞれの図形内画像との共通の背景領域を決
定し、現フレームの物体を抽出すること、（３）該時間
的にずれた少なくとも２つのフレームの背景の動きを削
除すること、（４）任意の図形内画像間の変化量を検出
し、代表領域を決定し、現フレームと少なくとも２つの
フレームの図形内画像とその位置の、背景に対応する変
化量を決定し、変化量と代表領域との関係から背景かど
うかを判定すること、（５）図形内画像を分割し、任意
の図形内画像又は分割された図形内画像の一部の動きを
検出し、任意の図形内画像又は分割された図形内画像の
一部を分割するか否かを判定し、現フレームの任意の図
形内画像と位置を決定すること、（６）背景を代表する
領域を設定し、背景の動きを検出し、該時間的にずれた
フレームの背景の動きを削除した画像を作ることによ
り、目的の物体以外の周囲の余分な動きに影響を受けず
に、且つ比較的簡単な処理で、目的の物体を精度良く抽
出・追跡することが可能となる。As described above, in the present embodiment,
(1) Tracking an object using a figure that roughly surrounds the object instead of the contour of the object; (2) Setting an arbitrary figure in the current frame, and setting a figure for each of the current frame and at least two frames Determining a common background area with the inner image and extracting the object of the current frame; (3) removing background motion of at least two frames shifted in time; The amount of change between images is detected, a representative region is determined, and the amount of change in the image in the figure of the current frame and at least two frames and its position corresponding to the background is determined. Judging whether it is the background, (5) dividing the image in the figure, detecting the movement of a part of the image in the figure or the divided image in the figure, and detecting the image in the figure or the divided figure Whether to split part of the image And (6) setting an area representing the background, detecting the background motion, and detecting the background motion of the temporally shifted frame. By creating an image in which is deleted, the target object can be accurately extracted and tracked by relatively simple processing without being affected by extra movement around the target object.

【０１０１】また、本実施形態の物体抽出・追跡処理の
手順はソフトウェア制御によって実現することもでき
る。この場合でも、基本的な手順は全く同じであり、図
形の初期設定を行った後に、入力フレームに対して順次
図形設定処理を行い、この図形設定処理と並行してある
いは図形設定処理完了後に、背景領域決定処理、物体抽
出処理を行えばよい。Further, the procedure of the object extraction / tracking processing of the present embodiment can be realized by software control. Even in this case, the basic procedure is exactly the same, and after performing the initial setting of the figure, the figure setting processing is sequentially performed on the input frame, and in parallel with this figure setting processing or after the completion of the figure setting processing, The background area determination processing and the object extraction processing may be performed.

【０１０２】次に、本発明の第２実施形態を説明する。Next, a second embodiment of the present invention will be described.

【０１０３】前述の第１実施形態では、ＯＲＡＮＤ法に
よる一つの物体抽出手段のみを備えていたが、入力画像
によってはその手段だけでは十分な抽出性能が得られな
いこともある。また、第１実施形態のＯＲＡＮＤ法で
は、物体抽出対象となる現フレームと、この現フレーム
に対し時間的に異なる第１の参照フレームとの差分に基
づいて共通の背景を設定し、また別の現フレームと時間
的に異なる第２の参照フレームとの差分に基づいて共通
の背景を設定している。しかし、この第１及び第２の参
照フレームの選択方法は特に与えられていない。第１及
び第２の参照フレームの選択によっては、物体の抽出結
果に大きな差が生まれ、良好な結果を得られないことが
あり得る。In the above-described first embodiment, only one object extracting means based on the ORAND method is provided. However, depending on an input image, sufficient extracting performance may not be obtained by only that means. In the ORAND method according to the first embodiment, a common background is set based on a difference between a current frame from which an object is to be extracted and a first reference frame that is temporally different from the current frame. A common background is set based on a difference between the current frame and a second reference frame that is temporally different. However, there is no specific method for selecting the first and second reference frames. Depending on the selection of the first and second reference frames, there may be a large difference in the object extraction result, and good results may not be obtained.

【０１０４】そこで、本第２実施形態では、入力画像に
よらずに物体を高精度に抽出できるように第１実施形態
を改良している。Therefore, in the second embodiment, the first embodiment is improved so that an object can be extracted with high accuracy regardless of the input image.

【０１０５】まず、図９のブロック図を用いて、第２実
施形態に係る物体追跡／抽出装置の第１の構成例につい
て説明をする。First, a first configuration example of the object tracking / extracting apparatus according to the second embodiment will be described with reference to the block diagram of FIG.

【０１０６】以下では、第１実施形態の物体追跡／抽出
部２に対応する構成のみについて説明する。In the following, only the configuration corresponding to the object tracking / extracting section 2 of the first embodiment will be described.

【０１０７】図形設定部６０は、図２で説明した第１実
施形態の図形設定部１１と同じものであり、フレーム画
像６０１と、初期フレームまたは既に他の入力フレーム
に対して設定した図形６０２とを入力とし、フレーム画
像６０１に図形を設定して出力する。スイッチ部ＳＷ６
１は、既に行われた物体抽出の結果６０５を入力とし、
それに基づいて、使用すべき物体抽出部を切り替えるた
めの信号６０４を出力する。The figure setting section 60 is the same as the figure setting section 11 of the first embodiment described with reference to FIG. 2, and includes a frame image 601 and an initial frame or a figure 602 already set for another input frame. Is input, and a figure is set in the frame image 601 and output. Switch section SW6
1 receives the result 605 of the object extraction already performed as an input,
Based on this, a signal 604 for switching the object extraction unit to be used is output.

【０１０８】物体追跡・抽出部６２は、図示のように第
１乃至第Ｋの複数の物体追跡・抽出部から構成されてい
る。これら物体追跡・抽出部はそれぞれ異なる手法で物
体抽出を行う。物体追跡・抽出部６２の中には、第１実
施形態で説明したＯＲＡＮＤ法を用いるものが少なくと
も含まれている。また、別の方法による物体追跡・抽出
部としては、例えばブロックマッチングによる形状予測
法を用いたものや、アフィン変換による物体形状予測な
どを使用することができる。これら形状予測では、既に
物体抽出されたフレームと現フレームとのフレーム間予
測により現フレーム上の物体領域の位置または形状が予
測され、その予測結果に基づいて現フレームの図形内画
像６０３から物体領域が抽出される。The object tracking / extracting section 62 is composed of first to K-th plural object tracking / extracting sections as shown in the figure. These object tracking / extracting units perform object extraction by different methods. The object tracking / extracting unit 62 includes at least one using the ORAND method described in the first embodiment. Further, as the object tracking / extracting unit using another method, for example, a unit using a shape prediction method by block matching, an object shape prediction by affine transformation, or the like can be used. In these shape predictions, the position or shape of the object region on the current frame is predicted by inter-frame prediction between the frame from which the object has already been extracted and the current frame, and based on the result of the prediction, the image region 603 of the current frame is used to extract the object region. Is extracted.

【０１０９】ブロックマッチングによる形状予測の一例
を図１３に示す。現フレームの図形内画像は図示のよう
に同じ大きさのブロックに分割される。各ブロック毎に
絵柄（テクスチャ）が最も類似したブロックが、既に物
体の形状及び位置が抽出されている参照フレームから探
索される。この参照フレームについては、物体領域を表
すシェイプデータが既に生成されている。シェイプデー
タは、物体領域に属する画素の画素値を“２５５”、そ
れ以外の画素値を“０”で表したものである。この探索
されたブロックに対応するシェイプデータが、現フレー
ムの対応するブロック位置に張り付けられる。このよう
なテクスチャの探索およびシェイプデータの張り付け処
理を現フレームの図形内画像を構成する全ブロックにつ
いて行うことにより、現フレームの図形内画像は、物体
領域と背景領域を区別するシェイプデータによって埋め
られる。よって、このシェイプデータを用いることによ
り、物体領域に対応する画像（テクスチャ）を抽出する
ことができる。FIG. 13 shows an example of shape prediction by block matching. The in-graphic image of the current frame is divided into blocks of the same size as shown. For each block, a block having the most similar pattern (texture) is searched from the reference frame from which the shape and position of the object have already been extracted. For this reference frame, shape data representing the object region has already been generated. In the shape data, the pixel values of the pixels belonging to the object area are represented by “255”, and the other pixel values are represented by “0”. The shape data corresponding to the searched block is attached to the corresponding block position in the current frame. By performing such a texture search and shape data pasting process on all the blocks constituting the graphic image in the current frame, the graphic image in the current frame is filled with shape data for distinguishing the object region from the background region. . Therefore, by using this shape data, an image (texture) corresponding to the object area can be extracted.

【０１１０】スイッチ部ＳＷ６１は、例えば第１の物体
追跡抽出部と同様の操作を行ない、抽出精度が良い場合
は第１の物体追跡抽部を選ぶよう切り替え、そうでない
場合は別の物体追跡抽出部を選ぶよう切り替える。例え
ば、第１の物体追跡抽出部が、ブロックマッチングによ
る物体形状予測手段であるとすれば、マッチング誤差の
大きさによって物体追跡抽出部の切り替えを制御すれば
よい。また、アフィン変換による物体形状予測であれ
ば、アフィン変換係数の推定誤差の大きさによって物体
追跡抽出部を切り替えることができる。スイッチ部ＳＷ
６１での切替の単位は、フレーム単位ではなく、フレー
ム内の小領域、例えばブロック毎や、輝度や色に基づい
て分割した領域毎である。これにより、使用する物体抽
出法をよりきめ細かに選択することが出来、抽出精度を
高めることができる。The switch SW61 performs, for example, the same operation as the first object tracking / extracting unit. When the extraction accuracy is good, the switch SW61 is switched to select the first object tracking / extracting unit. Switch to selecting a copy. For example, if the first object tracking extraction unit is an object shape prediction unit based on block matching, the switching of the object tracking extraction unit may be controlled according to the magnitude of the matching error. In the case of object shape prediction by affine transformation, the object tracking extraction unit can be switched according to the magnitude of the estimation error of the affine transformation coefficient. Switch SW
The switching unit at 61 is not a frame unit but a small area in a frame, for example, a block or an area divided based on luminance or color. As a result, the object extraction method to be used can be selected more finely, and the extraction accuracy can be increased.

【０１１１】図１０には、第２実施形態に係る動画像物
体追跡／抽出装置の第２の例が示されている。FIG. 10 shows a second example of the moving image object tracking / extracting apparatus according to the second embodiment.

【０１１２】図形設定部７０は図２で説明した第１実施
形態の図形設定部１１と同じものであり、画像７０１
と、初期フレームまたは既に他の入力フレームに対して
設定した図形７０２とを入力し、フレーム画像７０１に
図形を設定して出力する。The figure setting section 70 is the same as the figure setting section 11 of the first embodiment described with reference to FIG.
And the figure 702 already set for the initial frame or another input frame, and the figure is set to the frame image 701 and output.

【０１１３】第２の物体追跡・抽出部７１は、ブロック
マッチング法やアフィン変換などの形状予測によって物
体領域を抽出するために使用され、図形設定部７０から
出力される現フレームの図形内画像７０３と、既に抽出
されている別の参照フレーム上の物体の形状及び位置７
０７を入力とし、現フレームの図形内画像７０３から物
体の形状及び位置を予測する。The second object tracking / extracting section 71 is used to extract an object area by shape prediction such as a block matching method or affine transformation. And the shape and position 7 of the object on another reference frame already extracted
07 is input, and the shape and position of the object are predicted from the in-figure image 703 of the current frame.

【０１１４】参照フレーム選択部７２は、第２の物体追
跡・抽出部７１によって予測された現フレームの物体の
予測形状及び位置７０４と、既に抽出されている物体の
形状及び位置７０７とを入力し、少なくとも２つの参照
フレームを選択する。ここで、参照フレームの選択方法
について説明する。The reference frame selecting section 72 inputs the predicted shape and position 704 of the object of the current frame predicted by the second object tracking / extracting section 71 and the shape and position 707 of the object already extracted. , Select at least two reference frames. Here, a method of selecting a reference frame will be described.

【０１１５】Ｏ_i，Ｏ_j，Ｏ_currは各々フレームｉ，ｊ
及び抽出中のフレームcurrの物体とする。２つの時間的
に異なる参照フレームｆ_i，ｆ_jとの差分ｄ_i，ｄ_jを
取って、これら差分をＡＮＤ処理して現フレームｆ_curr
の物体を抽出すると、抽出したい物体Ｏ_curr以外に、物
体Ｏ_i ，Ｏ_jの重なり部分が時間的に異なるフレームの
ＡＮＤ処理により抽出される。勿論、Ｏ_i∩ Ｏ_j＝
φ、つまり物体Ｏ_i ，Ｏ_j の重なり部分が存在せず、物
体Ｏ_i ，Ｏ_jの重なりが空集合となる場合には問題な
い。O_i, O_j, O_currAre frames i, j
And the object of the frame curr being extracted. Two temporal
Different reference frame f_i, F_jAnd the difference d_i, D_jTo
Then, these differences are AND-processed and the current frame f_curr
Is extracted, the object O to be extracted is extracted._currOther than things
Body O_i , O_jOverlapped parts of frames that differ in time
It is extracted by AND processing. Of course, O_i∩ O_j=
φ, that is, the object O_i , O_j There is no overlapping part of
Body O_i , O_jIs a problem if the overlap of
No.

【０１１６】しかし物体Ｏ_i ，Ｏ_jの重なり部分が存在
し（Ｏ_i∩ Ｏ_j≠φ）、かつ、この重なり部分が抽出
したい物体の外部に存在する場合は、Ｏ_currと、Ｏ_i∩
Ｏ_j の二つが抽出結果として残る。However, the object O_i , O_jOverlap part of
(O_i∩ O_j≠ φ) and the overlapping part is extracted
If it exists outside the object you want to_currAnd O_i∩
O_j Are left as extraction results.

【０１１７】この場合、図１４（ａ）のように、Ｏ_curr
の背景領域（Ｏ_curr￣）と、物体Ｏi と、Ｏ_jとの全て
の共通領域が存在しない場合｛Ｏ_curr￣∩ （Ｏ_i∩
Ｏ_j）＝φ｝であれば、問題はない。しかし、図１４
（ｂ）のように、Ｏ_currの背景領域（Ｏ_curr￣）と、物
体Ｏ_i と、Ｏ_jとの全ての共通領域が存在する場合｛Ｏ_c
_urr￣ ∩ （Ｏ_i∩ Ｏ_j）≠φ｝は、Ｏ_currが斜線で示
すような誤った形状で抽出される。[0117] In this case, as shown in FIG. 14 (a), O _curr
{O _curr } when there is no common region between the background region (O _curr ￣) and the objects Oi and O _j ∩ (O _i ∩
If O _j ) = φ｝, there is no problem. However, FIG.
As in (b), O _curr the background region and (O _curr ¯), the object O _i and, when all of the common area between the O _j exists {O _c
_urr ￣ ∩ (O _i ∩ O _j ) ≠ φ ≠ is extracted in an erroneous shape such as O _curr indicated by oblique lines.

【０１１８】従って、正しく物体の形状を抽出する最適
な参照フレームｆ_i，ｆ_jとは、（Ｏ_i∩ Ｏ_j）∩ Ｏ_curr …（１）を満たすフレーム、つまり、Ｏ_i，Ｏ_jの重なり部分が
Ｏ_curr内に属するようなフレームｆ_i，ｆ_jである（図
１４（ａ））。Therefore, the optimal reference frames f _i and f _j for correctly extracting the shape of the object are frames satisfying (O _i ∩O _j ) ∩O _curr (1), that is, the frames of O _i and O _j Frames f _i and f _j whose overlapping portions belong to O _curr (FIG. 14A).

【０１１９】また、２つ以上の参照フレームを選ぶ場合
は、（Ｏ_i∩ Ｏ_j∩…∩Ｏ_k）∩ Ｏ_curr …（２）となる。When two or more reference frames are selected, (O _i ∩O _j ∩... ∩O _k ) ∩O _curr (2)

【０１２０】したがって、物体抽出対象となる現フレー
ム上の物体の位置または形状の予測結果に基づいて、
（１）式または（２）式を満足するような参照フレーム
を選択することにより、確実に物体の形状を抽出するこ
とが可能になる。Therefore, based on the prediction result of the position or shape of the object on the current frame from which the object is to be extracted,
By selecting a reference frame that satisfies the expression (1) or the expression (2), the shape of the object can be reliably extracted.

【０１２１】第１の物体追跡・抽出部７３では、参照フ
レーム選択部７２で選択された少なくとも２つの参照フ
レーム７０５と、現画像７０１を入力し、ＯＲＡＮＤ法
により物体を抽出してその形状及び位置７０６を出力す
る。The first object tracking / extracting section 73 receives at least two reference frames 705 selected by the reference frame selecting section 72 and the current image 701, extracts an object by the ORAND method, and extracts its shape and position. 706 is output.

【０１２２】メモリ７４には、抽出された物体形状及び
位置７０６が保持されている。The memory 74 holds the extracted object shape and position 706.

【０１２３】図１１には、第２実施形態に係る物体追跡
／抽出装置の第３の構成例が示されている。FIG. 11 shows a third configuration example of the object tracking / extracting apparatus according to the second embodiment.

【０１２４】この物体追跡／抽出装置は、図示のよう
に、図形設定部８０、第２の物体追跡・抽出部８１、ス
イッチ部ＳＷ８２、および第１の物体抽出部８３から構
成されている。図形設定部８０、第２の物体追跡・抽出
部８１、および第１の物体抽出部８３は、それぞれ図１
０の図形設定部７０、第２の物体追跡・抽出部７１、お
よび第１の物体抽出部７３に対応している。本例では、
スイッチ部ＳＷ８２によって、第２の物体追跡・抽出部
８１の抽出結果と第１の物体抽出部８３の抽出結果が選
択的に使用される。As shown, the object tracking / extracting apparatus comprises a figure setting unit 80, a second object tracking / extracting unit 81, a switch unit SW82, and a first object extracting unit 83. The figure setting unit 80, the second object tracking / extracting unit 81, and the first object extracting unit 83 are each configured as shown in FIG.
0 corresponds to the figure setting unit 70, the second object tracking / extracting unit 71, and the first object extracting unit 73. In this example,
The extraction result of the second object tracking / extraction unit 81 and the extraction result of the first object extraction unit 83 are selectively used by the switch unit SW82.

【０１２５】すなわち、図形設定部８０では、画像８０
１と初期図形の形状及び位置８０２を入力し、図形の形
状及び位置８０３を出力する。第２の物体追跡・抽出部
８１では、図形の形状及び位置８０３と、既に抽出され
ている物体の形状及び位置８０６を入力し、未だ抽出さ
れていない物体の予測形状及び位置８０４を予測し、出
力する。スイッチ部ＳＷ８２では、第２の物体抽出部で
予測された物体の形状及び位置８０４を入力し、第１の
物体追跡・抽出部を行なうかどうかを切り替える信号８
０５を出力する。物体追跡・抽出部８３では、既に抽出
されている物体の形状及び位置８０６と、未だ抽出され
ていない物体の予測形状及び位置８０４を入力し、物体
の形状及び位置８０５を決定し、出力する。That is, in the graphic setting section 80, the image 80
1 and the shape and position 802 of the initial figure are input, and the shape and position 803 of the figure are output. The second object tracking / extracting unit 81 inputs the shape and position 803 of the figure and the shape and position 806 of the object that has already been extracted, and predicts the predicted shape and position 804 of the object that has not yet been extracted. Output. The switch unit SW82 inputs the shape and position 804 of the object predicted by the second object extraction unit, and outputs a signal 8 for switching whether or not to perform the first object tracking / extraction unit.
05 is output. The object tracking / extracting unit 83 inputs the shape and position 806 of the object that has already been extracted and the predicted shape and position 804 of the object that has not yet been extracted, and determines and outputs the shape and position 805 of the object.

【０１２６】スイッチ部ＳＷ８２での切替の単位は、上
記で述べた例と同様にブロック毎に切り替えても良い
し、輝度や色に基づいて分割した領域毎に切り替えても
良い。切替を判断する方法として、例えば物体予測した
ときの予測誤差を用いることができる。すなわち、フレ
ーム間予測を用いて物体抽出を行う第２の物体追跡・抽
出部８１における予測誤差が所定のしきい値以下の場合
には、第２の物体追跡・抽出部８１によって得られた予
測形状が抽出結果として使用されるようにスイッチ部Ｓ
Ｗ８２による切替が行われ、第２の物体追跡・抽出部８
１における予測誤差が所定のしきい値を越えた場合に
は、第１の物体追跡・抽出部８３によってＯＲＡＮＤ法
にて物体抽出が行われるようにスイッチ部ＳＷ８２によ
る切替が行われ、その抽出結果が外部に出力される。The unit of switching in the switch unit SW82 may be switched for each block as in the above-described example, or may be switched for each area divided based on luminance or color. As a method of determining switching, for example, a prediction error at the time of predicting an object can be used. That is, when the prediction error in the second object tracking / extracting unit 81 that performs object extraction using inter-frame prediction is equal to or smaller than a predetermined threshold, the prediction obtained by the second object tracking / extracting unit 81 Switch unit S so that the shape is used as the extraction result
Switching by W82 is performed, and the second object tracking / extracting unit 8
When the prediction error at 1 exceeds a predetermined threshold, the first object tracking / extracting unit 83 performs switching by the switch unit SW82 so that the object is extracted by the ORAND method, and the extraction result is obtained. Is output to the outside.

【０１２７】図１５は、予測の単位となるブロック毎に
マッチング誤差に基づいて、使用する抽出部を切り替え
た場合の抽出結果の例を示している。FIG. 15 shows an example of an extraction result when the extraction unit to be used is switched based on the matching error for each block serving as a prediction unit.

【０１２８】ここで、網目で示した領域は第２の物体追
跡・抽出部８１による予測で得られた物体形状であり、
斜線で示した領域は第１の物体追跡・抽出部８３によっ
て得られた物体形状である。Here, the area shown by the mesh is the object shape obtained by the prediction by the second object tracking / extracting section 81,
The shaded area is the object shape obtained by the first object tracking / extracting unit 83.

【０１２９】図１２には、本第２実施形態に係る動画像
物体追跡／抽出装置の第４の構成例が示されている。FIG. 12 shows a fourth configuration example of the moving image object tracking / extracting apparatus according to the second embodiment.

【０１３０】この物体追跡／抽出装置は、図１１の構成
に加え、図１０の参照フレーム選択部を追加したもので
ある。This object tracking / extracting apparatus is obtained by adding a reference frame selecting unit shown in FIG. 10 to the configuration shown in FIG.

【０１３１】図形設定部９０では、画像９０１と初期図
形の形状及び位置９０２を入力し、図形の形状及び位置
９０３を出力する。第２の物体追跡・抽出部９１では、
図形の形状及び位置９０３と、既に抽出されている物体
の形状及び位置９０８を入力し、未だ抽出されていない
物体の予測形状及び位置９０４を予測し、出力する。ス
イッチ部ＳＷ９２は、その予測した物体の形状及び位置
９０４を入力とし、予測物体の精度が良いか否かを判断
し、第２の物体抽出部で抽出された物体の出力を切り替
える信号９０５を出力する。参照フレーム選択部９３
は、未だ抽出されていない物体の予測形状及び位置９０
４と既に抽出されている物体の形状及び位置９０８を入
力とし、少なくとも２つの参照フレームの物体または予
測物体の形状及び位置９０６を選択し、出力する。物体
追跡・抽出部９４は、現画像９０１と、少なくとも２つ
の参照フレームの物体又は予測物体の形状及び位置９０
６を入力とし、物体を抽出して、その形状及び位置９０
７を出力する。メモリ９５は、抽出した物体の形状及び
位置９０７と、その予測した物体の形状及び位置９０４
いずれかを保持する。The graphic setting section 90 inputs the image 901 and the shape and position 902 of the initial figure, and outputs the shape and position 903 of the figure. In the second object tracking / extracting unit 91,
The shape and position 903 of the figure and the shape and position 908 of the object already extracted are input, and the predicted shape and position 904 of the object not yet extracted are predicted and output. The switch unit SW92 receives the predicted shape and position 904 of the object, determines whether the accuracy of the predicted object is high, and outputs a signal 905 for switching the output of the object extracted by the second object extraction unit. I do. Reference frame selection unit 93
Is the predicted shape and position 90 of the object that has not yet been extracted.
4 and the shape and position 908 of the object already extracted, and select and output the shape and position 906 of at least two reference frame objects or predicted objects. The object tracking / extracting unit 94 calculates the current image 901 and the shape and position 90 of the object or the predicted object of at least two reference frames.
6, an object is extracted, and its shape and position 90 are extracted.
7 is output. The memory 95 stores the extracted shape and position 907 of the object and the predicted shape and position 904 of the object.
Hold one.

【０１３２】以下、図１６を参照して、本例における物
体追跡／抽出方法の手順を説明する。Hereinafter, the procedure of the object tracking / extracting method in this example will be described with reference to FIG.

【０１３３】（ステップＳ１）参照フレームの候補とし
ては、現フレームと時間的にずれたフレームが予め設定
される。これは現フレーム以外の全てのフレームでも良
いし、現フレーム前後の数フレームと限定してもよい。
例えば、初期フレームと、現フレームより前３フレー
ム、現フレームより後１フレームの合計５フレームに限
定する。ただし、前フレームが３フレームない場合はそ
の分後のフレームの候補を増やし、後フレームが１フレ
ームない場合はその分前４フレームを候補とする。(Step S1) As a reference frame candidate, a frame temporally shifted from the current frame is set in advance. This may be all frames other than the current frame, or may be limited to several frames before and after the current frame.
For example, the total number of frames is limited to an initial frame, three frames before the current frame, and one frame after the current frame. However, if there are not three previous frames, the number of candidates for the subsequent frame is increased, and if there is no subsequent frame, four previous frames are used as candidates.

【０１３４】（ステップＳ２）まず、ユーザが初期フレ
ームに抽出したい物体を書こむ図形を例えば長方形で設
定する。以降のフレームの図形は初期設定の図形をブロ
ックに分割し、マッチングを取って対応する位置にブロ
ックを張り付ける。全ての張り付けたブロックを含むよ
うに新たに長方形を設定することで物体を追跡する。全
ての参照フレーム候補に物体追跡の図形を設定する。物
体が抽出される度にそれを使って先のフレームの物体追
跡図形を求め直すほうが抽出エラーを防ぐことができ
る。また、ユーザは初期フレームの物体形状を入力す
る。(Step S2) First, a figure in which the user writes an object to be extracted in the initial frame is set, for example, as a rectangle. For the graphics in the subsequent frames, the default graphics are divided into blocks, matching is performed, and the blocks are pasted at corresponding positions. The object is tracked by setting a new rectangle to include all the pasted blocks. An object tracking figure is set for all reference frame candidates. Whenever an object is extracted, it is used to recalculate the object tracking figure of the previous frame, thereby preventing the extraction error. Further, the user inputs the object shape of the initial frame.

【０１３５】以下、抽出するフレームは現フレームと
し、現フレームより前のフレームは物体が既に抽出され
ており、先のフレームは抽出されていないとする。Hereinafter, it is assumed that the frame to be extracted is the current frame, that the object before the current frame has already been extracted, and that the previous frame has not been extracted.

【０１３６】（ステップＳ３）参照フレーム候補の図形
の周辺に適当な領域を設定し、現フレームとの背景の動
きを検出して参照フレームの図形内の背景を削除する。
背景の動きを検出する方法として、図形の周囲数画素の
幅の領域を設定し、この領域を現フレームに対してマッ
チングを取り、マッチング誤差が最小となる動きベクト
ルを背景の動きとする。(Step S3) An appropriate area is set around the reference frame candidate figure, the background movement with the current frame is detected, and the background in the reference frame figure is deleted.
As a method of detecting the background motion, a region having a width of several pixels around the figure is set, this region is matched with the current frame, and a motion vector that minimizes the matching error is set as the background motion.

【０１３７】（ステップＳ４）背景動き削除時の動きベ
クトル検出誤差が大きい参照フレームは候補から外すこ
とにより、背景動き削除が適当でない場合の抽出エラー
を防ぐことができる。また、参照フレーム候補が減った
場合、新たに参照フレーム候補を選び直してもよい。新
たに付け加えた参照フレーム候補の図形設定や背景動き
削除が行なわれていない場合は、新たに図形設定および
背景動き削除を行なう必要がある。(Step S4) Reference frames having a large motion vector detection error at the time of background motion deletion are excluded from candidates, thereby preventing an extraction error when background motion deletion is not appropriate. When the number of reference frame candidates decreases, a new reference frame candidate may be selected. If the figure setting and background motion deletion of the newly added reference frame candidate have not been performed, it is necessary to newly perform the figure setting and background motion deletion.

【０１３８】（ステップＳ５）次に、未だ物体が抽出さ
れていない現フレームと、現フレームより先の参照フレ
ームの候補の物体形状を予測する。現フレーム又は先の
参照フレームの候補に設定された長方形を例えばブロッ
クに分割して既に物体が抽出されているフレーム（前の
フレーム）とマッチングを取り、対応する物体形状を張
り付けて物体形状を予測する。物体が抽出される度にそ
れを使って先のフレームの物体予測をやり直すほうが抽
出エラーを防ぐことができる。(Step S5) Next, the current frame from which no object has been extracted and the object shape of a reference frame candidate preceding the current frame are predicted. The rectangle set as a candidate for the current frame or the previous reference frame is divided into blocks, for example, and is matched with a frame (an earlier frame) from which an object has already been extracted, and the corresponding object shape is pasted to predict the object shape. I do. Whenever an object is extracted, it can be used to redo the object prediction of the previous frame to prevent an extraction error.

【０１３９】（ステップＳ６）この時、予測誤差が小さ
いブロックは、予測した形状を抽出結果としてそのまま
出力する。また物体形状の予測をブロック単位で処理を
行なうとマッチング誤差によりブロック歪みが生じる場
合があるので、それを消去するようなフィルターをか
け、全体の物体形状を滑らかにしてもよい。(Step S6) At this time, for the block with a small prediction error, the predicted shape is output as it is as the extraction result. Further, if the prediction of the object shape is performed on a block basis, a block error may occur due to a matching error. Therefore, a filter that eliminates the distortion may be applied to smooth the entire object shape.

【０１４０】物体追跡及び物体形状予測の時に行なう長
方形の分割は、固定ブロックサイズで行なっても良い
し、マッチング閾値によって階層的ブロックマッチング
によって行なっても良い。The rectangle division performed at the time of object tracking and object shape prediction may be performed with a fixed block size, or may be performed by hierarchical block matching using a matching threshold.

【０１４１】予測誤差が大きいブロックについては、以
下の処理を行う。For a block having a large prediction error, the following processing is performed.

【０１４２】（ステップＳ７）参照フレームの候補から
仮の参照フレームを設定し、各々の組合せについて、式
（１）又は式（２）を満たす参照フレームのセットを選
ぶ。全参照フレーム候補のどの組合せも式（１）又は式
（２）を満たさなかった場合、Ｏ_{i ∩} Ｏ_j内の画素数
が最小のものを選ぶのがよい。また、背景動き削除時の
動きベクトル検出誤差がなるべく小さいフレームを選ぶ
ように、参照フレーム候補の組合せを考慮するほうがよ
い。具体的には、式（１）又は式（２）による条件が同
じ参照フレームセットがあるばあい、背景動き削除時の
動きベクトル検出誤差が小さい方を選ぶ、などの方法が
ある。以下、参照フレームは２フレーム選択されたとす
る。(Step S7) A provisional reference frame is set from the reference frame candidates, and a set of reference frames satisfying Expression (1) or Expression (2) is selected for each combination. If none of the combinations of all reference frame candidates satisfies the expression (1) or the expression (2), it is preferable to select the one having the minimum number of pixels in O _{i ∩} O _j . Also, it is better to consider the combination of reference frame candidates so as to select a frame in which the motion vector detection error at the time of background motion deletion is as small as possible. Specifically, when there is a reference frame set having the same condition according to the expression (1) or the expression (2), there is a method of selecting a smaller motion vector detection error when deleting background motion. Hereinafter, it is assumed that two reference frames are selected.

【０１４３】（ステップＳ８）参照フレームが選択され
ると、現フレームとのフレーム間差分を求め、設定され
た図形内のフレーム間差分に注目する。設定された図形
の外側１ライン画素の差分の絶対値のヒストグラムを求
め、多く現れる差分の絶対値を背景領域の差分値とし、
設定された図形外側の１ライン画素の背景画素を決定す
る。設定された図形外側の１ライン画素の背景画素から
内側に向けて、隣接する背景領域の差分値をもつ画素を
背景画素と決定し、背景画素でないと判定されるまで順
次続ける。この背景画素は、現フレームと一つの参照フ
レームとの共通の背景領域となる。この時、ノイズの影
響で背景領域とそれ以外の部分の境界が不自然であるこ
とがあるので、境界を滑らかにするフィルターや、余分
やノイズ領域を削除するフィルターをかけてもよい。(Step S8) When a reference frame is selected, an inter-frame difference from the current frame is obtained, and attention is paid to the inter-frame difference in the set graphic. A histogram of the absolute value of the difference of one line pixel outside the set figure is obtained, and the absolute value of the difference that frequently appears is set as the difference value of the background area.
The background pixel of one line pixel outside the set figure is determined. From the background pixel of one line pixel outside the set figure to the inside, a pixel having a difference value of an adjacent background area is determined as a background pixel, and is sequentially continued until it is determined that the pixel is not a background pixel. This background pixel is a common background area for the current frame and one reference frame. At this time, since the boundary between the background region and the other portion may be unnatural due to the influence of noise, a filter for smoothing the boundary or a filter for deleting extra or noise regions may be applied.

【０１４４】（ステップＳ９）各々の参照フレームに対
して共通の背景領域が求まると、二つの共通背景領域に
含まれない領域を検出し、それを物体領域として抽出す
る。先に予測した物体の形状を用いない部分について、
ここでの結果を出力し、物体全体の形状を出力する。(Step S9) When a common background area is obtained for each reference frame, an area not included in the two common background areas is detected and extracted as an object area. For parts that do not use the shape of the object predicted earlier,
The result here is output, and the shape of the entire object is output.

【０１４５】共通の背景から求めた形状を用いる部分と
先に予測した物体形状を用いる部分の整合性が取れない
場合、最後にフィルターをかけて出力結果を見ために良
いものにできる。When the part using the shape obtained from the common background and the part using the object shape predicted earlier cannot be matched, it is possible to obtain a good output result by applying a filter at the end.

【０１４６】以上説明したように、本第２実施形態によ
れば、入力画像によらずに物体を精度良く抽出できる。
又は、物体抽出に適した参照フレームを選択することが
できる。As described above, according to the second embodiment, an object can be accurately extracted regardless of an input image.
Alternatively, a reference frame suitable for object extraction can be selected.

【０１４７】次に、本発明の第３実施形態を説明する。Next, a third embodiment of the present invention will be described.

【０１４８】まず、図１７のブロック図を用いて、第３
実施形態に係る物体追跡／抽出装置の第１の例を説明を
する。First, using the block diagram of FIG.
A first example of the object tracking / extracting device according to the embodiment will be described.

【０１４９】ここでは、物体抽出対象となる現フレーム
から、その少なくとも一部の領域についての画像の特徴
量を抽出し、その特徴量に基づいて複数の物体抽出手段
を切り替える構成が採用されている。Here, a configuration is adopted in which a feature amount of an image of at least a part of the region is extracted from a current frame from which an object is to be extracted, and a plurality of object extraction means are switched based on the feature amount. .

【０１５０】すなわち、本物体追跡／抽出装置は、図示
のように、図形設定部１１０、特徴量抽出部１１１、ス
イッチ部ＳＷ１１２、複数の物体追跡・抽出部１１３、
およびメモリ１１４から構成されている。図形設定部１
１０、スイッチ部ＳＷ１１２、複数の物体追跡・抽出部
１１３は、それぞれ第２実施形態で説明した図９の図形
設定部６０、スイッチ部ＳＷ６１、および複数の物体追
跡・抽出部６２と同じであり、特徴量抽出部１１１によ
って抽出された現フレームの画像の特徴量に基づいて、
使用する物体追跡・抽出部の切替が行われる点が異なっ
ている。That is, as shown in the figure, the object tracking / extracting apparatus comprises a figure setting unit 110, a feature amount extracting unit 111, a switch SW 112, a plurality of object tracking / extracting units 113,
And a memory 114. Figure setting unit 1
10, the switch unit SW112, and the plurality of object tracking / extracting units 113 are the same as the graphic setting unit 60, the switch unit SW61, and the plurality of object tracking / extracting units 62 of FIG. 9 described in the second embodiment, respectively. Based on the feature amount of the image of the current frame extracted by the feature amount extraction unit 111,
The difference is that the object tracking / extracting unit to be used is switched.

【０１５１】図形設定部１１０は、抽出フレーム１１０
１と、ユーザ設定による初期図形１１０２と、既に抽出
されたフレームの抽出結果１１０６を入力とし、抽出フ
レームに図形を設定してその図形を出力する。図形は長
方形、円、楕円、など幾何図形でもよいし、ユーザが物
体形状を図形設定部１１０に入力してもよい。その場
合、図形は精密な形状でなくても、大まかな形状でもよ
い。特徴量検出部１１１は、図形が設定された抽出フレ
ーム１１０３と、既に抽出されたフレームの抽出結果１
１０６とを入力とし、特徴量１１０４を出力する。スイ
ッチ部ＳＷ１１２は、特徴量１１０４と、既に抽出され
たフレームの抽出結果１１０６とを入力とし、図形が設
定された抽出フレーム１１０３の物体追跡・抽出部への
入力を制御する。The figure setting unit 110 is used to
1, an initial figure 1102 set by the user, and an extraction result 1106 of an already extracted frame are input, a figure is set in the extracted frame, and the figure is output. The figure may be a geometric figure such as a rectangle, a circle, an ellipse, or the user may input an object shape to the figure setting unit 110. In that case, the graphic need not be a precise shape but may be a rough shape. The feature amount detection unit 111 includes an extraction frame 1103 in which a figure is set and an extraction result 1 of the already extracted frame.
106, and outputs a feature quantity 1104. The switch unit SW112 receives the feature amount 1104 and the extraction result 1106 of the already extracted frame, and controls the input of the extracted frame 1103 in which the figure is set to the object tracking / extracting unit.

【０１５２】スイッチ部ＳＷ１１２は、画像全体に対し
て特徴量を得た場合は、画像の性質を検出し、画像に対
して適当な物体追跡・抽出部への入力の制御に用いるこ
とができる。図形内部は適当な大きさに分割され、特徴
量は各分割図形毎に与えても良い。特徴量は分散や輝度
勾配、エッジ強度などであり、この場合は、これらを自
動的に算出することができる。また、人間が視覚的に認
知した物体の性質がユーザによってスイッチ部Ｗ１１２
に与えられてもよい。例えば、目的とする物体が人物で
あれば、エッジが不鮮明な髪の毛を指定して抽出時のパ
ラメータが特別に選ばれ、前処理にエッジ補正してから
抽出されてもよい。When the feature amount has been obtained for the entire image, the switch unit SW112 detects the characteristics of the image and can be used for controlling the input to an appropriate object tracking / extracting unit for the image. The inside of the figure is divided into an appropriate size, and the feature amount may be given for each divided figure. The feature amount is a variance, a luminance gradient, an edge strength, or the like. In this case, these can be automatically calculated. In addition, the property of the object visually recognized by the human is changed by the user through the switch unit W112.
May be given to For example, if the target object is a person, the parameters at the time of extraction may be specially selected by designating hair with unclear edges, and may be extracted after edge correction in preprocessing.

【０１５３】特徴量は、設定された図形内部（物体及び
その周辺）に関してだけでなく、図形外部（背景部）に
関する特徴量でも良い。The feature amount may be not only the inside of the set figure (object and its surroundings) but also the outside of the figure (background part).

【０１５４】複数（第１〜ｋ）の物体追跡・抽出部１１
３の各々では、図形が設定された抽出フレーム１１０３
と、既に抽出されたフレームの抽出結果１１０６とを入
力とし、物体を追跡・抽出した結果１１０５を出力す
る。A plurality of (first to k) object tracking / extracting units 11
3, the extracted frame 1103 in which the figure is set
And an extraction result 1106 of the already extracted frame, and outputs a result 1105 of tracking and extracting the object.

【０１５５】複数の物体追跡・抽出部１１３は、ＯＲＡ
ＮＤ法を使用して物体を抽出するもの、クロマキーを使
用して物体を抽出するもの、ブロックマッチングやアフ
ィン変換によって物体を抽出するものなどを含む。The plurality of object tracking / extracting sections 113
Includes one that extracts objects using the ND method, one that extracts objects using chroma keying, one that extracts objects by block matching and affine transformation, and the like.

【０１５６】なお、実施形態１では、設定された図形の
周囲の画素値のフレーム間差分のヒストグラムを用い
て、背景画素が決定されているが、単純に、フレーム間
差分が閾値以下の画素が背景画素と決定されても良い。
また、実施形態１では、設定された図形から図形内部に
向かって背景画素（差分値が一定値以下）が決定されて
いるが、図形から図形外部へ向けて物体画素（差分値が
一定値以上）も決定されても良いし、任意の操作順でも
よい。In the first embodiment, the background pixel is determined using the histogram of the inter-frame difference of the pixel values around the set figure. However, the pixel whose inter-frame difference is equal to or smaller than the threshold value is simply determined. It may be determined as a background pixel.
Further, in the first embodiment, the background pixel (the difference value is equal to or less than a certain value) is determined from the set figure toward the inside of the figure. ) May be determined or may be in any order of operation.

【０１５７】メモリ１１４は、物体を追跡・抽出した結
果１１０５を入力とし、それを保持する。The memory 114 receives the result 1105 obtained by tracking and extracting the object, and holds the result.

【０１５８】以下、画像の性質を示す特徴量によって、
追跡／抽出方法を切替えると、よりよい抽出結果が得ら
れる理由について説明する。In the following, the characteristic amount indicating the characteristic of the image
The reason why a better extraction result is obtained by switching the tracking / extraction method will be described.

【０１５９】例えば、背景の動きがあるかどうか予め分
かるならば、その性質を使った方がよい。背景の動きが
ある場合は、背景動き補償が行なわれるが、完全に補償
できるかわからない。複雑な動きをするフレームではほ
とんど動き補償できない。このようなフレームは、背景
動き補償の補償誤差で予め分かるので、参照フレーム候
補にしないなど工夫が可能である。しかし、背景の動き
がない場合は、この処理は不必要である。別の物体が動
いていると、誤った背景動き補償が行なわれたり、その
フレームは参照フレーム候補から外れたりして参照フレ
ーム選択条件に最適なフレームであっても選ばれず、抽
出精度が落ちることがある。For example, if it is known in advance whether or not there is background movement, it is better to use that property. If there is background motion, background motion compensation is performed, but it is not known whether it can be completely compensated. Almost no motion can be compensated for frames with complex motion. Since such a frame can be known in advance by a compensation error of the background motion compensation, it is possible to devise a method such as not using the frame as a reference frame candidate. However, if there is no background movement, this process is unnecessary. If another object is moving, erroneous background motion compensation is performed, or the frame deviates from the reference frame candidate, so that even if it is the best frame for the reference frame selection condition, it is not selected, and the extraction accuracy decreases. There is.

【０１６０】また、一つの画像中にも多様な性質が混在
している。物体の動きやテクスチャも部分的に異なり、
同じ追跡・抽出方法及び装置やパラメータではうまく物
体が抽出できないことがある。従って、ユーザが画像中
の特殊な性質を持つ一部を指定したり、画像中の違いを
自動的に特徴量として検出して、部分的に追跡・抽出方
法を切替えて物体を抽出したり、パラメータを変更した
方がよい。Also, various properties are mixed in one image. The movement and texture of the object are also partially different,
The same tracking / extraction method, device, and parameters may not be able to extract an object well. Therefore, the user can specify a part having a special property in the image, automatically detect the difference in the image as a feature, extract the object by partially switching the tracking / extraction method, It is better to change the parameters.

【０１６１】このようにして、複数の物体の追跡・抽出
手段を切替えれば、様々な画像中の物体の形状を精度良
く抽出することが可能になる。By switching the means for tracking / extracting a plurality of objects as described above, it becomes possible to accurately extract the shapes of objects in various images.

【０１６２】次に、図１８のブロック図を用いて、第３
実施形態に係る動画像物体追跡／抽出装置の第２の構成
例について説明する。Next, referring to the block diagram of FIG.
A second configuration example of the moving image object tracking / extracting device according to the embodiment will be described.

【０１６３】図形設定部１２０は、抽出フレーム１２０
１と、ユーザ設定による初期図形１２０２と、既に抽出
されたフレームの抽出結果１２０７を入力とし、抽出フ
レームに図形を設定して出力する。第２の物体追跡・抽
出部１２１は、ブロックマッチング法やアフィン変換な
どの形状予測によって物体領域を抽出するために使用さ
れ、図形が設定された抽出フレーム１２０３と、既に抽
出されたフレームの抽出結果１２０７を入力とし、物体
の追跡・抽出結果１２０４を出力する。The figure setting unit 120 is used to
1, an initial figure 1202 set by the user, and an extraction result 1207 of the already extracted frame are input, and a figure is set to the extracted frame and output. The second object tracking / extracting unit 121 is used for extracting an object region by shape prediction such as a block matching method or an affine transformation, and includes an extraction frame 1203 in which a figure is set and an extraction result of an already extracted frame. 1207 is input and an object tracking / extraction result 1204 is output.

【０１６４】特徴量抽出部１２２は、物体の追跡・抽出
結果１２０４を入力とし、物体の特徴量１２０５をスイ
ッチ部ＳＷ１２３に出力する。スイッチ部ＳＷ１２３
は、物体の特徴量１２０５を入力として、第一の物体追
跡・抽出部への物体の追跡・抽出結果１２０４の入力を
制御する。例えば、第２の物体追跡・抽出部１２１でブ
ロックマッチング法により物体形状が追跡・抽出された
場合、特徴量をマッチング誤差として、このマッチング
誤差が小さい部分は第２の物体追跡・抽出部１２１によ
る予測形状の抽出結果として出力される。また他の特徴
量として、ブロック毎に輝度勾配や分散、テクスチャの
複雑さを表すパラメータ（フラクタル次元など）があ
る。輝度勾配を用いた場合、輝度勾配がほとんどないブ
ロックに対しては、ＯＲＡＮＤ法による第１の物体追跡
・抽出部１２４の結果が使用されるように第一の物体追
跡・抽出部への入力が制御される。またエッジ検出をし
て、エッジの有無や強度を特徴量とした場合、エッジの
ない所、弱い所では第１の物体追跡・抽出部１２４の結
果が使用されるように第一の物体追跡・抽出部への入力
が制御される。このように、画像の一部分であるブロッ
ク単位や領域単位で、切替の制御が変えられる。切替の
閾値を大きくしたり小さくしたりすることで、適応的な
制御ができる。The feature amount extraction unit 122 receives the object tracking / extraction result 1204 and outputs the object feature amount 1205 to the switch unit SW123. Switch section SW123
Controls the input of the object tracking / extraction result 1204 to the first object tracking / extracting unit with the feature amount 1205 of the object as an input. For example, when the object shape is tracked and extracted by the second object tracking / extracting unit 121 by the block matching method, the feature amount is set as a matching error, and a portion where the matching error is small is determined by the second object tracking / extracting unit 121. It is output as the extraction result of the predicted shape. Further, as other feature amounts, there are parameters (such as fractal dimensions) representing the brightness gradient, variance, and texture complexity for each block. In the case where the luminance gradient is used, the input to the first object tracking / extracting unit is performed for a block having almost no luminance gradient so that the result of the first object tracking / extracting unit 124 using the ORAND method is used. Controlled. When the edge detection is performed and the presence / absence or strength of the edge is used as a feature amount, the first object tracking / extracting unit 124 uses the result of the first object tracking / extracting unit 124 at a position where there is no edge or a weak position. The input to the extraction unit is controlled. In this way, the switching control is changed in units of blocks or regions which are part of the image. Adaptive control can be performed by increasing or decreasing the switching threshold.

【０１６５】第１の物体追跡・抽出部１２４は、抽出フ
レーム１２０１と、物体の追跡・抽出結果１２０４と、
既に抽出されたフレームの抽出結果１２０７を入力と
し、抽出フレームの追跡・抽出結果１２０６をメモリ１
２５に出力する。メモリ１２５は、抽出フレームの追跡
・抽出結果１２０６を入力とし、保持する。The first object tracking / extracting section 124 includes: an extraction frame 1201; an object tracking / extraction result 1204;
The extraction result 1207 of the already extracted frame is input, and the tracking / extraction result 1206 of the extracted frame is stored in the memory 1.
25. The memory 125 receives and outputs the tracking / extraction result 1206 of the extracted frame.

【０１６６】次に、図１９のブロック図を用いて、本第
３実施形態に係る物体追跡／抽出装置の第３の構成例を
する。Next, a third configuration example of the object tracking / extracting apparatus according to the third embodiment will be described with reference to the block diagram of FIG.

【０１６７】この物体追跡／抽出装置は、図１８の構成
に加え、第２実施形態で説明した参照フレーム選択部を
追加したものである。すなわち、この物体追跡／抽出装
置は、図示のように、図形設定部１３０、第２の物体追
跡・抽出部１３１、特徴量抽出部１３２、スイッチ部Ｓ
Ｗ１３３、参照フレーム選択部１３４、第１の物体追跡
・抽出部１３５、およびメモリ１３６から構成されてい
る。This object tracking / extracting apparatus is obtained by adding the reference frame selecting unit described in the second embodiment to the configuration shown in FIG. That is, this object tracking / extracting device includes a figure setting unit 130, a second object tracking / extracting unit 131, a feature amount extracting unit 132, and a switch unit S, as shown in the figure.
W 133, a reference frame selecting unit 134, a first object tracking / extracting unit 135, and a memory 136.

【０１６８】図形設定部１３０では、抽出フレーム１３
０１と、ユーザ設定による初期図形１３０２と、既に抽
出されたフレームの抽出結果１３０８を入力とし、抽出
フレームに図形を設定して出力する。第２の物体追跡・
抽出部１３１は、ブロックマッチング法やアフィン変換
などの形状予測によって物体領域を抽出するためのもの
であり、図形を設定された抽出フレーム１３０３と、既
に抽出されたフレームの抽出結果１３０８を入力とし、
物体の追跡・抽出結果１３０４を出力する。In the figure setting unit 130, the extracted frame 13
01, an initial figure 1302 set by the user, and an extraction result 1308 of a frame already extracted, and a figure is set as an extracted frame and output. Second object tracking
The extraction unit 131 is for extracting an object region by shape prediction such as a block matching method or an affine transformation, and receives an extraction frame 1303 in which a figure is set and an extraction result 1308 of an already extracted frame as inputs.
The object tracking / extraction result 1304 is output.

【０１６９】特徴量抽出部１３２は、物体の追跡・抽出
結果１３０４を入力とし、物体の特徴量１３０５を出力
する。スイッチ部ＳＷ１３３は、物体の特徴量１３０５
を入力とし、第１の物体追跡・抽出部１３５への物体の
追跡・抽出結果１３０４の入力を制御する。The feature extraction unit 132 receives the object tracking / extraction result 1304 as input and outputs the object feature 1305. The switch unit SW133 includes a feature amount 1305 of the object.
Is input, and the input of the object tracking / extraction result 1304 to the first object tracking / extracting unit 135 is controlled.

【０１７０】参照フレーム選択部１３４は、第１の物体
追跡・抽出部１３５への物体の追跡・抽出結果１３０４
と、既に抽出されたフレームの抽出結果１３０８を入力
とし、参照フレーム１３０６を出力する。The reference frame selecting unit 134 outputs the object tracking / extracting result 1304 to the first object tracking / extracting unit 135.
, The extraction result 1308 of the already extracted frame is input, and the reference frame 1306 is output.

【０１７１】物体の特徴の一例として、動きの複雑さが
ある。第２の物体追跡・抽出部１３１でブロックマッチ
ング法により物体を追跡・抽出する場合、マッチング誤
差が大きい部分に対して、第１の物体抽出結果が出力さ
れる。部分的に複雑な動きがあると、その部分はマッチ
ング誤差が大きくなり、第１の物体追跡・抽出部１３５
で抽出されることになる。従って、このマッチング誤差
を特徴量として第１の物体追跡・抽出部１３５で用いる
参照フレームの選択方法が切替えられる。具体的には物
体形状全体ではなく、第１の物体追跡・抽出部１３５で
抽出する部分だけについて第２実施形態で説明した式
（１）または（２）の選択条件を満たすように参照フレ
ームの選択方法を選ぶのがよい。One example of a feature of an object is the complexity of movement. When the second object tracking / extracting unit 131 tracks and extracts an object by the block matching method, a first object extraction result is output for a portion having a large matching error. If there is a partially complicated motion, the matching error increases in that portion, and the first object tracking / extracting unit 135
Will be extracted. Therefore, the selection method of the reference frame used by the first object tracking / extracting unit 135 using this matching error as a feature amount is switched. Specifically, not the entire object shape, but only the portion extracted by the first object tracking / extracting unit 135 so as to satisfy the selection condition of the equation (1) or (2) described in the second embodiment. It is better to choose a selection method.

【０１７２】背景の特徴量の例は、１）背景が静止して
いる画像である、２）ズームがある、３）パーンがある
という情報等である。この特徴量はユーザが入力しても
良いし、カメラから得たパラメータが特徴量として入力
されても良い。背景の特徴量としては、背景の動きベク
トル、背景動き補正画像の精度、背景の輝度分布、テク
スチャ、エッジなどがある。例えば背景動き補正画像の
精度を背景動き補正画像と補正前画像との差分平均を特
徴量として、参照フレーム選択方法が制御できる。制御
例としては、差分平均が非常に多い場合、そのフレーム
は参照フレームの候補にしなかったり、そのフレームの
選択順位を下げてフレームを選んだりできる。背景が静
止している場合や、背景動き補正がすべてのフレームに
ついて完全であると差分がゼロとなる。参照フレーム選
択法は、第２実施形態と同じ方法を用いることができ
る。Examples of the feature amount of the background are 1) an image where the background is stationary, 2) there is a zoom, and 3) information that there is a pan. This feature may be input by the user, or a parameter obtained from the camera may be input as the feature. Background feature values include a background motion vector, accuracy of a background motion corrected image, background luminance distribution, texture, and edges. For example, the reference frame selection method can be controlled using the accuracy of the background motion corrected image as the feature amount of the difference between the background motion corrected image and the image before correction. As a control example, when the difference average is very large, the frame may not be selected as a reference frame candidate, or a frame may be selected by lowering the selection order of the frame. If the background is stationary or if the background motion correction is complete for all frames, the difference will be zero. As the reference frame selection method, the same method as in the second embodiment can be used.

【０１７３】第１の物体追跡・抽出部１３５は、抽出フ
レーム１３０１と、参照フレーム１３０６と、既に抽出
されたフレームの抽出結果１３０８を入力とし、ＯＲＡ
ＮＤ法により抽出フレームの追跡・抽出結果１３０７を
メモリ１３５に出力する。メモリ１３５は、抽出フレー
ムの追跡・抽出結果１３０７を入力し、保持する。The first object tracking / extracting unit 135 receives the extracted frame 1301, the reference frame 1306, and the extraction result 1308 of the already extracted frame as inputs, and
The tracking / extraction result 1307 of the extracted frame is output to the memory 135 by the ND method. The memory 135 inputs and holds the tracking / extraction result 1307 of the extracted frame.

【０１７４】次に、図２２を用いて、先に挙げた例のう
ち、第２の物体追跡・抽出部からの出力から特徴量を得
て、それによって複数の参照フレーム選択部を切替える
例を第４の構成例として説明する。Next, referring to FIG. 22, an example in which the feature amount is obtained from the output from the second object tracking / extracting unit and the plurality of reference frame selecting units are switched by using the output from the output from the second object tracking / extracting unit This will be described as a fourth configuration example.

【０１７５】図形設定部１６０は、抽出フレーム１６０
１と、ユーザが設定した初期図形１６０２と、既に物体
が抽出されたフレーム１６０８を入力とし、設定図形１
６０３を出力する。第２の物体追跡・抽出部１６１は、
ブロックマッチング法やアフィン変換などの形状予測に
よって物体領域を抽出するために使用され、設定図形１
６０３と、既に物体が抽出されたフレーム１６０８を入
力とし、物体追跡・抽出結果１６０４を出力する。特徴
量検出部１６３は、物体追跡・抽出結果１６０４を入力
とし、特徴量１６０５をスイッチ部ＳＷ１６４に出力す
る。スイッチ部ＳＷ１６４は、特徴量１６０５を入力
し、参照フレーム選択部への物体追跡・抽出結果１６０
４の入力を制御する。The figure setting section 160 is provided with the extraction frame 160
1, an initial figure 1602 set by the user, and a frame 1608 from which an object has already been extracted,
603 is output. The second object tracking / extracting unit 161
It is used to extract an object region by shape prediction such as block matching method or affine transformation.
603 and a frame 1608 from which an object has already been extracted are input, and an object tracking / extraction result 1604 is output. The feature amount detection unit 163 receives the object tracking / extraction result 1604 as an input and outputs the feature amount 1605 to the switch unit SW164. The switch unit SW 164 receives the feature amount 1605 and outputs the object tracking / extraction result 160 to the reference frame selection unit.
4 is controlled.

【０１７６】複数の参照フレーム選択部１６５は、物体
追跡・抽出結果１６０４と、既に物体が抽出されたフレ
ーム１６０８を入力とし、少なくとも２つの参照フレー
ム１６０６を出力する。The plurality of reference frame selection units 165 receive an object tracking / extraction result 1604 and a frame 1608 from which an object has already been extracted, and output at least two reference frames 1606.

【０１７７】第１の物体追跡・抽出部１６６は、ＯＲＡ
ＮＤ法により物体抽出を行うために使用され、参照フレ
ーム１６９６と、抽出フレーム１６０１を入力とし、物
体の追跡・抽出結果１６０７をメモリ１６７に出力す
る。メモリ１６７は、物体の追跡・抽出結果１６０７を
入力とし、保持する。The first object tracking / extracting section 166
It is used to extract an object by the ND method. The reference frame 1696 and the extracted frame 1601 are input, and the tracking / extraction result 1607 of the object is output to the memory 167. The memory 167 receives an object tracking / extraction result 1607 as an input and holds the result.

【０１７８】次に、ブロック図２３を用いて、先に述べ
た例のうち、背景の情報を得て、背景動き補正の誤差に
よって複数の参照フレーム選択部の入力を制御する例に
ついて説明する。Next, with reference to the block diagram 23, an example in which the background information is obtained and the input of a plurality of reference frame selection units is controlled by an error in the background motion correction will be described.

【０１７９】図形設定部１７０は、抽出フレーム１７０
１と、ユーザが設定した初期図形１７０２と、既に物体
が抽出されたフレーム１７１０を入力とし、設定図形１
７０３を出力する。第２の物体追跡・抽出部１７１は、
設定図形１７０３と、既に物体が抽出されたフレーム１
７１０を入力とし、物体の追跡・抽出結果１７０４を出
力する。スイッチ部ＳＷ１７２では、ユーザが指定した
背景の情報１７０５を入力し、背景動き補正部１７３へ
の抽出フレーム１７０１の入力を制御する。[0179] The graphic setting section 170
1, an initial graphic 1702 set by the user, and a frame 1710 in which an object has already been extracted,
703 is output. The second object tracking / extracting unit 171
Setting graphic 1703 and frame 1 from which an object has already been extracted
710 is input and an object tracking / extraction result 1704 is output. The switch unit SW172 inputs background information 1705 specified by the user and controls the input of the extracted frame 1701 to the background motion correction unit 173.

【０１８０】背景動き補正部１７３は、抽出フレーム１
７０１と、既に物体が抽出されたフレーム１７１０を入
力とし、背景動きを補正したフレーム１７０６を出力す
る。[0180] The background motion compensating unit 173 extracts the extracted frame 1
701 and a frame 1710 from which an object has already been extracted are input, and a frame 1706 in which background motion has been corrected is output.

【０１８１】背景特徴量検出部１７４は、抽出フレーム
１７０１と、背景動きを補正したフレーム１７０６を入
力とし、背景の特徴量１７０７をスイッチ部ＳＷ１７５
へ出力する。このスイッチ部ＳＷ１７５は、背景の特徴
量１７０７を受け、参照フレーム選択部１７６への物体
の追跡・抽出結果１７０４の入力を制御する。参照フレ
ーム選択部１７６は、物体の追跡・抽出結果１７０４
と、既に物体が抽出されたフレーム１７１０を入力し、
少なくとも２つの参照フレーム１７０８を出力する。The background feature detection unit 174 receives as input the extracted frame 1701 and the frame 1706 in which the background motion has been corrected, and outputs the background feature 1707 to the switch SW175.
Output to The switch SW 175 receives the feature amount 1707 of the background, and controls the input of the tracking / extraction result 1704 of the object to the reference frame selection unit 176. The reference frame selection unit 176 obtains an object tracking / extraction result 1704.
And input the frame 1710 from which the object has already been extracted,
Output at least two reference frames 1708.

【０１８２】第１の物体追跡・抽出部１７７は、少なく
とも２つの参照フレーム１７０８と、抽出フレーム１７
０１を入力とし、物体の追跡・抽出結果１７０９をメモ
リ１７８に出力する。メモリ１７８は、物体の追跡・抽
出結果１７０９を受け、保持する。The first object tracking / extracting section 177 includes at least two reference frames 1708 and an extracted frame 17
01 is input, and an object tracking / extraction result 1709 is output to the memory 178. The memory 178 receives and holds the tracking / extraction result 1709 of the object.

【０１８３】次に、図２０のブロック図を用いて、本第
３実施形態に係る物体追跡／抽出装置の第５の構成例を
説明する。Next, a fifth configuration example of the object tracking / extracting apparatus according to the third embodiment will be described with reference to the block diagram of FIG.

【０１８４】抽出フレーム出力制御部１４０は、画像１
４０１と抽出するフレームの順序１４０５を入力とし、
抽出フレーム１４０２を出力する。フレーム順序制御部
１４１は、ユーザが与えたフレーム順序に関する情報１
４０５を入力とし、フレーム順序１４０６を出力する。
物体追跡・抽出装置１４２は、動画像信号から目的とす
る物体の抽出／追跡を行なう物体追跡／抽出方法及び装
置であり、抽出フレーム１４０２を入力とし、追跡・抽
出結果１４０３を追跡・抽出結果出力制御部１４３に出
力する。追跡・抽出結果出力制御部１４３は、追跡・抽
出結果１４０３と、フレーム順序１４０６を入力とし、
フレーム順序を画像１４０１の順序に並び変えて出力す
る。The extraction frame output control unit 140
401 and the order 1405 of frames to be extracted,
The extracted frame 1402 is output. The frame order control unit 141 stores information 1 on the frame order given by the user.
405 is input, and a frame order 1406 is output.
The object tracking / extracting device 142 is an object tracking / extracting method and device for extracting / tracking a target object from a moving image signal. The object tracking / extracting device 142 receives an extraction frame 1402 as an input, and outputs a tracking / extraction result 1403 as a tracking / extraction result output. Output to the control unit 143. The tracking / extraction result output control unit 143 receives the tracking / extraction result 1403 and the frame order 1406 as inputs,
The frame order is rearranged to the order of the image 1401 and output.

【０１８５】フレームの順序は、ユーザが与えても良い
し、物体の動きに応じて適応的に決定しても良い。物体
の動きが検出しやすいフレーム間隔が決定され、物体が
抽出される。すなわち、参照フレームと物体抽出対象と
なる現フレームとの間のフレーム間隔が少なくとも２フ
レーム以上となるように入力フレーム順とは異なる順序
で物体抽出処理が行われるようにフレーム順の制御が行
われる。これにより、入力フレーム順にフレーム間予測
による形状予測や、ＯＲＡＮＤ演算を行う場合に比べ、
予測精度を向上でき、結果的に抽出精度を高めることが
可能となる。ＯＲＡＮＤ法の場合には適切な参照フレー
ムを選択することによって抽出精度を高めることが可能
となるため、ブロックマッチングなどによるフレーム間
予測による形状予測法について特に効果がある。The order of the frames may be given by the user or may be determined adaptively according to the motion of the object. A frame interval at which the motion of the object is easily detected is determined, and the object is extracted. That is, the frame order is controlled so that the object extraction processing is performed in an order different from the input frame order so that the frame interval between the reference frame and the current frame as the object extraction target is at least two frames or more. . Thereby, compared with the case where the shape prediction by the inter-frame prediction or the ORAND operation is performed in the input frame order,
The prediction accuracy can be improved, and as a result, the extraction accuracy can be improved. In the case of the ORAND method, since the extraction accuracy can be improved by selecting an appropriate reference frame, the shape prediction method based on inter-frame prediction by block matching or the like is particularly effective.

【０１８６】すなわち、フレームの間隔によっては動き
が小さ過ぎたり、複雑過ぎて、フレーム間予測による形
状予測手法では対応できないことがある。従って、例え
ば形状予測の誤差が閾値以下にならない場合は、予測に
用いる抽出済みフレームとの間隔をあけることにより、
予測精度が上がり、結果的に抽出精度が向上する。ま
た、背景に動きがある場合は、参照フレーム候補は抽出
フレームとの背景動きを求め補償するが、背景の動きが
フレームの間隔によっては小さ過ぎたり複雑過ぎたりし
て、背景動き補償が精度良くできない場合がある。この
場合もフレーム間隔をあけることによって動き補償精度
を上げることができる。このようにして抽出フレームの
順序を適応的に制御すれば、より確実に物体の形状を抽
出することが可能になる。That is, depending on the frame interval, the motion may be too small or too complicated, and the shape prediction method based on inter-frame prediction may not be able to cope with it. Therefore, for example, if the error of the shape prediction does not fall below the threshold, by spacing the extracted frame used for prediction,
The prediction accuracy increases, and as a result, the extraction accuracy improves. If there is motion in the background, the reference frame candidate calculates and compensates for the background motion with the extracted frame, but the background motion is too small or too complicated depending on the frame interval, and the background motion compensation is performed with high accuracy. It may not be possible. Also in this case, the motion compensation accuracy can be increased by providing a frame interval. By adaptively controlling the order of the extraction frames in this way, it is possible to more reliably extract the shape of the object.

【０１８７】次に、図２１のブロック図を用いて、本第
３実施形態に係る物体追跡／抽出装置の第６の例を説明
をする。Next, a sixth example of the object tracking / extracting apparatus according to the third embodiment will be described with reference to the block diagram of FIG.

【０１８８】抽出フレーム出力制御部１５０は、画像１
５０１と抽出するフレームの順序１５０５を入力とし、
抽出フレーム１５０２を出力する。フレーム順序制御部
１５１は、ユーザが与えたフレーム順序に関する情報１
５０５を入力とし、フレーム順序１５０６を出力する。
すなわち、フレーム順序制御部１５１は、フレーム間隔
が与えられ、フレームの抽出順序を決定する。複数の物
体追跡／抽出装置１５２は、動画像信号から目的とする
物体の抽出／追跡を行なう物体追跡／抽出方法及び装置
であり、フレーム順序１５０６にしたがって抽出フレー
ム１５０２の入力が制御され、追跡・抽出結果１５０３
を出力する。追跡・抽出結果出力制御部１５３は、追跡
・抽出結果１５０３と、フレーム順序１５０６を入力と
し、それを画像１５０１の順序に並び変えて出力する。The extraction frame output control unit 150
501 and the order 1505 of frames to be extracted are input,
The extracted frame 1502 is output. The frame order control unit 151 stores information 1 on the frame order given by the user.
505 is input, and a frame order 1506 is output.
That is, the frame order control unit 151 is given a frame interval and determines the frame extraction order. The plurality of object tracking / extracting devices 152 are object tracking / extracting methods and devices for extracting / tracking a target object from a moving image signal. The input of the extracted frame 1502 is controlled according to the frame order 1506, and the tracking / extraction is performed. Extraction result 1503
Is output. The tracking / extraction result output control unit 153 receives the tracking / extraction result 1503 and the frame order 1506 as inputs, and rearranges them in the order of the image 1501 and outputs them.

【０１８９】飛ばされた間のフレームは、既に抽出され
たフレームから内挿しても良いし、参照フレーム候補の
選び方を変えて同じアルゴリズムで抽出しても良い。The skipped frames may be interpolated from already extracted frames, or may be extracted by the same algorithm by changing the method of selecting reference frame candidates.

【０１９０】ここで、図２５を用いて、図２１の物体の
追跡／抽出装置の処理の例について説明する。Here, an example of the processing of the object tracking / extracting apparatus of FIG. 21 will be described with reference to FIG.

【０１９１】図２５で、斜線で示すフレームは２フレー
ム間隔開けて先に抽出するフレームである。飛ばされた
フレームは第２の物体追跡・抽出装置によって抽出され
る。図２５のように両脇のフレームが抽出された後に、
両脇のフレームの抽出結果から内挿して物体形状を求め
てもよい。また、閾値などのパラメータを変えたり、こ
れら両脇のフレームを参照フレーム候補に加えて、両脇
のフレームと同じ方法で抽出してもよい。In FIG. 25, the frame indicated by oblique lines is a frame to be extracted first with an interval of two frames. The skipped frame is extracted by the second object tracking / extracting device. After the frames on both sides are extracted as shown in FIG. 25,
The object shape may be obtained by interpolation from the extraction results of the frames on both sides. Alternatively, parameters such as a threshold value may be changed, or the frames on both sides may be added to the reference frame candidates and extracted in the same manner as the frames on both sides.

【０１９２】次に、図２４のブロック図を用いて、物体
追跡・抽出装置の他の構成例を説明する。Next, another configuration example of the object tracking / extracting apparatus will be described with reference to the block diagram of FIG.

【０１９３】スイッチ部ＳＷ１８２は、ユーザが指定し
た背景の情報１８０５を入力とし、背景動き補正部１８
３への抽出フレーム１８０１の入力を制御する。背景動
き補正部１８３は、抽出フレーム１８０１と、既に物体
が抽出されたフレーム１８１１を入力とし、背景動きを
補正したフレーム１８０６を出力する。背景特徴量検出
部１８４は、抽出フレーム１８０１と、背景動きを補正
したフレーム１８０６を入力とし、背景の特徴量１８０
７を出力する。スイッチ部ＳＷ１８７は、背景の特徴量
１８０７を入力とし、参照フレーム選択部１８８への物
体の追跡・抽出結果１８０４の入力を制御する。図形設
定部１８０は、抽出フレーム１８０１と既に物体が抽出
されたフレーム１８１１並びにユーザが設定した初期図
形１８０２を入力とし、図形を設定した抽出フレーム１
８０３を出力する。第二の物体追跡・抽出部１８５は、
図形を設定した抽出フレーム１８０３と既に物体が抽出
されたフレーム１８１１を入力とし、物体追跡・抽出結
果１８０４を出力する。特徴量検出部１８５は、物体追
跡・抽出結果１８０４を入力とし、特徴量１８０８を出
力する。スイッチ部ＳＷ１８６は、特徴量１８０８を入
力とし、参照フレーム選択部への物体追跡・抽出結果１
８０４の入力を制御する。参照フレーム選択部１８８
は、物体追跡・抽出結果１８０４と、既に物体が抽出さ
れたフレーム１８１１を入力とし、少なくとも２つの参
照フレーム１８０９を出力する。The switch SW182 receives the background information 1805 specified by the user as an input, and
3 is controlled. The background motion correction unit 183 receives the extracted frame 1801 and the frame 1811 from which an object has been extracted as an input, and outputs a frame 1806 in which the background motion has been corrected. The background feature detection unit 184 receives the extracted frame 1801 and the frame 1806 in which the background motion has been corrected, and receives the background feature 180
7 is output. The switch unit SW187 receives the feature amount 1807 of the background and controls the input of the tracking / extraction result 1804 of the object to the reference frame selection unit 188. The figure setting unit 180 receives the extracted frame 1801, the frame 1811 in which the object has already been extracted, and the initial figure 1802 set by the user, and sets the extracted frame 1 in which the figure is set.
803 is output. The second object tracking / extracting unit 185
An extraction frame 1803 in which a figure is set and a frame 1811 in which an object has already been extracted are input, and an object tracking / extraction result 1804 is output. The feature detection unit 185 receives the object tracking / extraction result 1804 as an input and outputs a feature 1808. The switch unit SW186 receives the feature amount 1808 as an input, and outputs the object tracking / extraction result 1 to the reference frame selection unit.
804 input is controlled. Reference frame selection unit 188
Receives an object tracking / extraction result 1804 and a frame 1811 from which an object has already been extracted, and outputs at least two reference frames 1809.

【０１９４】第一の物体追跡・抽出部１８９は、少なく
とも２つの参照フレーム１８０９と、抽出フレーム１８
０１を入力とし、物体の追跡・抽出結果１８１０をメモ
リ１９０に出力する。メモリ１９０は、物体の追跡・抽
出結果１８１０を保持する。The first object tracking / extracting section 189 comprises at least two reference frames 1809 and an extracted frame 18
01 is input and an object tracking / extraction result 1810 is output to the memory 190. The memory 190 holds an object tracking / extraction result 1810.

【０１９５】処理の流れは以下のようになる。The processing flow is as follows.

【０１９６】ユーザが初期フレームにおいて抽出したい
物体を大まかに囲む。以降のフレームの長方形は既に抽
出された物体を囲む長方形を上下左右に数画素広げて設
定する。この長方形をブロックに分割し、抽出済みのフ
レームとマッチングを取って対応する位置に抽出済み物
体の形状を張り付ける。この処理によって得られた物体
形状（予測物体形状）が大まかな物体を表す。予測の精
度がある閾値以下にならない場合、別のフレームから予
測を直してより予測精度を上げるように処理しても良
い。The user roughly encloses the object to be extracted in the initial frame. The rectangles of the subsequent frames are set by expanding the rectangle surrounding the already extracted object by several pixels in the vertical and horizontal directions. This rectangle is divided into blocks, the extracted frames are matched, and the shape of the extracted object is attached to the corresponding positions. The object shape (predicted object shape) obtained by this processing represents a rough object. If the accuracy of the prediction does not fall below a certain threshold value, a process may be performed to correct the prediction from another frame to further increase the prediction accuracy.

【０１９７】予測精度が良い場合、この予測形状全部又
は一部をそのまま抽出結果として出力する。この方法
は、物体を追跡しつつ、物体も抽出できる。When the prediction accuracy is good, the whole or a part of the predicted shape is output as an extraction result as it is. This method can extract an object while tracking the object.

【０１９８】物体追跡及び物体形状予測の時に行なうブ
ロック化は、長方形を固定ブロックサイズで分割しても
良いし、マッチング閾値によって階層的ブロックマッチ
ングによって行なっても良い。フレームを固定のサイズ
で分割し、物体を含むブロックだけを用いても良い。Blocking performed at the time of object tracking and object shape prediction may be performed by dividing a rectangle by a fixed block size, or by performing hierarchical block matching using a matching threshold. The frame may be divided by a fixed size, and only the block including the object may be used.

【０１９９】予測が悪い場合を考えて、物体予測形状を
数画素分拡張して、予測エラーによる凹凸や穴が修正さ
れる。この方法で全ての参照フレーム候補に予測物体形
状が設定される。物体が抽出される度にその物体を使っ
て先のフレームの物体追跡図形が求め直されそれにより
抽出エラーが防がれる。この追跡図形が物体を囲むよう
に設定された追跡図形とする。Considering the case where the prediction is bad, the object prediction shape is extended by several pixels, and irregularities and holes due to prediction errors are corrected. With this method, the predicted object shapes are set for all the reference frame candidates. Each time an object is extracted, the object tracking figure of the previous frame is re-determined using the object, thereby preventing an extraction error. This tracking figure is a tracking figure set to surround the object.

【０２００】以下、抽出フレームより前のフレームにつ
いては物体が既に抽出されており、先のフレームについ
ては物体が抽出されていないものとする。Hereinafter, it is assumed that an object has already been extracted for a frame before the extracted frame, and no object has been extracted for the previous frame.

【０２０１】参照フレームの候補は、一定間隔おきに抽
出するフレームに対して時間的に一定間隔毎にずれた前
後５フレームとする。参照フレームの候補は、例えば、
初期フレームと現フレームより前３フレーム、現フレー
ムより後１フレーム、の合計５フレームのように限定す
る。ただし、前フレームが３フレームない場合はその分
後のフレームの候補を増やし、後フレームが１フレーム
ない場合はその分前４フレームを候補とする。The reference frame candidates are five frames before and after, which are temporally shifted at regular intervals from frames to be extracted at regular intervals. The reference frame candidate is, for example,
The frame is limited to a total of five frames including the initial frame, three frames before the current frame, and one frame after the current frame. However, if there are not three previous frames, the number of candidates for the subsequent frame is increased, and if there is no subsequent frame, four previous frames are used as candidates.

【０２０２】参照フレーム候補の物体の周辺に適当な領
域が設定され、この領域と現フレームとの背景の動きが
検出され、これにより参照フレームの図形内の背景が削
除される。背景の動きを検出する方法として、物体を除
いた全領域で現フレームに対してマッチングを取り、マ
ッチング誤差が最小となる動きベクトルが背景の動きと
判定される。An appropriate area is set around the reference frame candidate object, and the background movement between this area and the current frame is detected, whereby the background in the figure of the reference frame is deleted. As a method of detecting background motion, matching is performed on the current frame in all regions except for the object, and a motion vector that minimizes a matching error is determined as background motion.

【０２０３】背景動き削除時の動きベクトル検出誤差が
大きい参照フレームは候補から外すことにより、背景動
き削除が適当でない場合の抽出エラーを防ぐことができ
る。また、参照フレーム候補が減った場合、新たに参照
フレーム候補を選び直してもよい。新たに付け加えた参
照フレーム候補の図形設定や背景動き削除が行なわれて
いない場合は、新たに図形設定や背景動き削除を行なう
必要がある。By removing reference frames having a large motion vector detection error at the time of background motion removal from candidates, it is possible to prevent an extraction error when background motion removal is not appropriate. When the number of reference frame candidates decreases, a new reference frame candidate may be selected. If the figure setting and the background motion deletion of the newly added reference frame candidate have not been performed, it is necessary to newly perform the figure setting and the background motion deletion.

【０２０４】予め背景の動きがないと分かる場合は、こ
の処理は行なわない。If it is known in advance that there is no background movement, this processing is not performed.

【０２０５】参照フレームの候補から仮の参照フレーム
を設定し、これらフレームの組合せについて、第２実施
形態の式（１）又は式（２）を満たす参照フレームのセ
ットを選ぶ。全参照フレーム候補のどの組合せも式
（１）又は式（２）を満たさなかった場合、Ｏ_{i ∩} Ｏ
_j内の画素数が最小のフレームを選ぶのがよい。A temporary reference frame is set from the reference frame candidates, and a set of reference frames satisfying Expression (1) or Expression (2) of the second embodiment is selected for the combination of these frames. If none of the combinations of all the reference frame candidates satisfy Expression (1) or Expression (2), O _{i ∩} O
_It is better to select a frame with the minimum number of pixels in _j .

【０２０６】また、背景動き削除時の動きベクトル検出
誤差がなるべく小さいフレームを選ぶように、参照フレ
ーム候補の組合せを考慮するほうがよい。具体的には、
式（１）又は式（２）による条件が同じ参照フレームセ
ットがある場合、背景動き削除時の動きベクトル検出誤
差が小さい方のフレームを選ぶなどの方法がある。背景
の動きがない場合は、フレーム間差分が十分検出できる
フレームを優先的に選ぶようにできる。Further, it is better to consider the combination of reference frame candidates so that a frame having a motion vector detection error at the time of background motion deletion is as small as possible. In particular,
When there is a reference frame set having the same condition according to the expression (1) or the expression (2), there is a method of selecting a frame having a smaller motion vector detection error when deleting background motion. If there is no background movement, a frame from which a difference between frames can be sufficiently detected can be preferentially selected.

【０２０７】また、物体予測の精度がよく、物体の一部
をそのまま出力する場合、物体予測結果を抽出結果とし
ない領域のみを対象に、式（１）又は式（２）による条
件を満たすフレームを選ぶ。When the object prediction accuracy is high and a part of the object is output as it is, the frame satisfying the condition of the expression (1) or (2) is targeted only for the region where the object prediction result is not the extraction result. Choose

【０２０８】以下、２参照フレームが選択されたときの
処理を説明する。The processing when two reference frames are selected will be described below.

【０２０９】参照フレームが選択されると、抽出フレー
ムとのフレーム間差分を求め、設定された図形内のフレ
ーム間差分に注目する。When a reference frame is selected, an inter-frame difference from the extracted frame is obtained, and attention is paid to the inter-frame difference in the set figure.

【０２１０】設定された閾値でフレーム間差分を２値化
する。２値化に用いる閾値は画像に対して一定でもよい
し、背景動き補償の精度に応じてフレーム毎に変えても
良い。制御例としては、背景動き補償の精度が悪けれ
ば、背景に余分な差分が多く発生しているので、２値化
の閾値を大きくする例などがある。また、物体の部分的
な輝度勾配やテクスチャ、エッジ強度に応じて変えても
良い。この制御例として、輝度勾配が少ない領域や、エ
ッジ強度が小さい領域のように比較的平坦な領域は２値
化の閾値を小さくする。更に、ユーザが物体の性質から
閾値を与えても良い。The difference between frames is binarized using the set threshold value. The threshold value used for binarization may be constant for an image, or may be changed for each frame according to the accuracy of background motion compensation. As an example of control, if the accuracy of background motion compensation is poor, there are many extra differences in the background. Further, it may be changed according to a partial luminance gradient, texture, or edge strength of the object. As an example of this control, a relatively flat region such as a region having a small luminance gradient or a region having a small edge intensity has a small threshold value for binarization. Further, the user may give a threshold from the property of the object.

【０２１１】物体追跡図形の外側の画素について、隣接
する背景領域の差分値を持つ画素を背景画素と決定す
る。また、同時に物体追跡図形の内側の画素について
も、隣接する背景領域の差分値をもたない画素を背景画
素でない、と決定する。[0211] With respect to the pixels outside the object tracking figure, the pixel having the difference value of the adjacent background area is determined as the background pixel. At the same time, with respect to the pixels inside the object tracking graphic, it is determined that the pixels having no difference value between the adjacent background areas are not the background pixels.

【０２１２】フレーム間差分は、物体の静止領域では検
出できない。従って、予測に用いたフレームとのフレー
ム間差分がゼロで、かつ予測に用いたフレームでは物体
内部の画素である場合は、静止領域画素として背景画素
に加えない。An inter-frame difference cannot be detected in a stationary area of an object. Therefore, if the difference between frames from the frame used for prediction is zero, and the frame used for prediction is a pixel inside the object, it is not added to the background pixel as a still area pixel.

【０２１３】この背景画素は、現フレームと一つの参照
フレームとの共通の背景領域となる。この時、ノイズの
影響で背景領域とそれ以外の部分の境界が不自然である
ことがあるので、画像信号に境界を滑らかにするフィル
ターや、余分なノイズ領域を削除するフィルターをかけ
てもよい。This background pixel is a common background area for the current frame and one reference frame. At this time, since the boundary between the background region and other portions may be unnatural due to the influence of noise, a filter for smoothing the boundary or a filter for removing an extra noise region may be applied to the image signal. .

【０２１４】各々の参照フレームに対して共通の背景領
域が求まると、二つの共通背景領域に含まれない領域が
検出され、それが物体領域として抽出される。先に予測
した物体の形状を用いない部分に対しては抽出結果が出
力され、物体全体の形状が抽出される。共通の背景から
求めた形状を用いる部分と先に予測した物体形状を用い
る部分の整合性が取れない場合、最後にフィルターをか
けて得た出力結果は見ために良いものにできる。When a common background area is obtained for each reference frame, an area that is not included in the two common background areas is detected and extracted as an object area. An extraction result is output for a portion that does not use the shape of the object predicted earlier, and the shape of the entire object is extracted. If the part using the shape obtained from the common background and the part using the object shape predicted earlier cannot be matched, the output result obtained by applying the filter at the end can be made good for viewing.

【０２１５】最後に抽出順が入力フレームの順序に置き
換えて抽出された物体領域が出力する。[0215] Finally, the extracted object area is output by replacing the extraction order with the input frame order.

【０２１６】本発明のような物体の形状を抽出する方法
及び装置は、現在標準化が固まりつつあるＭＰＥＧ−４
のオブジェクト符号化の入力手段として用いることがで
きる。このＭＰＥＧ−４と物体抽出の応用例として、物
体形状をウインドウ形式とする表示システムがある。こ
のような表示システムは、多地点会議システムに有効で
ある。限られた大きさのディスプレイにテキスト資料
と、各地点で会議に参加する人物を四角いウインドウで
表示するよりも、図２６のように人物は人物の形状で表
示することにより、省スペース化できる。ＭＰＥＧ−４
の機能を使えば、発言中の人物だけを大きくしたり、発
言していない人物を半透明にしたりでき、システムの使
用感がよくなる。A method and apparatus for extracting the shape of an object according to the present invention are MPEG-4, which is currently being standardized.
Can be used as input means for object encoding. As an application example of the MPEG-4 and the object extraction, there is a display system in which the object shape is a window format. Such a display system is effective for a multipoint conference system. Rather than displaying text material and a person participating in a meeting at each point in a rectangular window on a display having a limited size, the person can be displayed in the shape of a person as shown in FIG. 26, thereby saving space. MPEG-4
By using the function, only the person who is speaking can be enlarged, or the person who is not speaking can be made translucent, and the usability of the system is improved.

【０２１７】以上説明したように、本第３実施形態によ
れば、画像の性質に応じて方法及び装置で物体を選ぶこ
とによって、不必要な処理を省き、安定な抽出精度が得
られる。また、時間順という制約を外すことによって物
体の動きによらずに十分な抽出精度が得ることができ
る。As described above, according to the third embodiment, by selecting an object by a method and an apparatus according to the properties of an image, unnecessary processing can be omitted and stable extraction accuracy can be obtained. Further, by removing the constraint of the time order, sufficient extraction accuracy can be obtained regardless of the motion of the object.

【０２１８】また、本第３実施形態は、第１実施形態及
び第２実施形態の性能を改善するものであり、第１実施
形態及び第２実施形態の各構成と第３実施形態で説明し
た構成とを適宜組み合わせて使用することもできる。The third embodiment is to improve the performance of the first and second embodiments, and has been described in the first and second embodiments and the third embodiment. It can be used in combination with the configuration as appropriate.

【０２１９】図２７には、本発明の第４実施形態に係る
物体抽出装置の第１の構成例が示されている。FIG. 27 shows a first configuration example of an object extraction device according to the fourth embodiment of the present invention.

【０２２０】外部のカメラで撮像されたり、ビデオテー
プ、ビデオディスクなどの蓄積媒体から読み出されたり
した後に、本物体抽出装置に入力されるテクスチャ画像
２２１は、記録装置２２２、スイッチ部２２３、動き補
償による物体抽出回路２２４に入力される。記録装置２
２２は、入力されたテクスチャ画像２２１を保持するも
のである。例えば、パソコンなどで用いられているハー
ドディスク、光磁気ディスクなどである。記録装置２２
２は後にテクスチャ画像２２１を再び用いるために必要
であり、テクスチャ画像２２１が外部の蓄積媒体に記録
されていた画像である場合は、記録装置２２２を別に用
意する必要はなく、その蓄積媒体が記録装置２２２とし
て用いられる。この際は、記録装置２２２にテクスチャ
画像２２１を入力しなおす必要はない。テクスチャ画像
は、例えば、各画素の輝度（Ｙ）を０〜２５５の値で表
した画素をラスタ順序（画像の左上の画素から右方向
へ、上のラインから下のラインへの順序）で並べて形成
され、一般に画像信号と呼ばれている。後に述べるシェ
イプ画像と区別するために、ここではテクスチャ画像と
呼ぶことにする。テクスチャ画像としては、輝度以外に
も、色差（Ｕ，Ｖなど）、色（Ｒ，Ｇ，Ｂなど）が用い
られても良い。[0220] The texture image 221 input to the object extracting apparatus after being picked up by an external camera or read from a storage medium such as a video tape or a video disk is stored in the recording apparatus 222, the switch section 223, The signal is input to the compensation-based object extraction circuit 224. Recording device 2
Reference numeral 22 holds the input texture image 221. For example, a hard disk or a magneto-optical disk used in a personal computer or the like is used. Recording device 22
2 is necessary for using the texture image 221 again later. If the texture image 221 is an image recorded on an external storage medium, there is no need to prepare a separate recording device 222, and the storage medium is used for recording. Used as device 222. In this case, there is no need to input the texture image 221 to the recording device 222 again. The texture image is obtained by, for example, arranging pixels in which the luminance (Y) of each pixel is represented by a value of 0 to 255 in a raster order (from the upper left pixel of the image to the right, from the upper line to the lower line). And is generally called an image signal. In order to distinguish it from a shape image described later, it is referred to as a texture image here. As the texture image, in addition to the luminance, a color difference (U, V, etc.) and a color (R, G, B, etc.) may be used.

【０２２１】一方、最初のフレームにおいて、操作者が
抽出したい物体を別途抽出しておいたシェイプ画像２２
５が、動き補償による物体抽出回路２２４に入力され
る。シェイプ画像は、例えば、物体に属する画素の画素
値を“２５５”、それ以外の画素の画素値を“０”で表
した画素をテクスチャ画像と同様にラスタ順序で並べて
生成される。On the other hand, in the first frame, a shape image 22 from which the operator has separately extracted an object to be extracted.
5 is input to the motion-compensated object extraction circuit 224. The shape image is generated, for example, by arranging pixels in which the pixel values of the pixels belonging to the object are represented by “255” and the pixel values of the other pixels by “0” in the raster order in the same manner as the texture image.

【０２２２】ここで、最初のフレームのシェイプ画像２
５を生成する実施例を図３４などを用いて詳しく説明す
る。Here, the shape image 2 of the first frame
An example for generating 5 will be described in detail with reference to FIG.

【０２２３】図３４では、省略しているが、背景や前景
にも図柄があり、そのうちで、家の形をした物体２２６
を抽出したいとする。操作者は、モニタに表示された画
像２２７に対して、物体２２６の輪郭をマウスやペンで
なぞる。その輪郭線の内側の画素に“２５５”、外側の
画素に“０”を代入して得た画像をシェイプ画像とす
る。操作者が細心の注意をはらって輪郭線を描けば、こ
のシェイプ画像の精度は高いものになるが、ある程度精
度が低い場合でも、以下の方法を用いれば、精度を上げ
ることができる。Although not shown in FIG. 34, the background and the foreground also have a pattern, of which a house-shaped object 226 is formed.
Suppose you want to extract The operator traces the outline of the object 226 with the mouse or the pen on the image 227 displayed on the monitor. An image obtained by substituting “255” for a pixel inside the contour and “0” for a pixel outside the contour is defined as a shape image. If the operator draws the outline with great care, the accuracy of the shape image will be high. However, even when the accuracy is somewhat low, the accuracy can be increased by using the following method.

【０２２４】図３５には、操作者によって描かれた線２
２８と、物体２２６の輪郭線２２９が示されている。こ
の段階では、輪郭線２２９の正しい位置はもちろん抽出
されていないが、線２２８との位置関係を表すために輪
郭線２２９が示している。FIG. 35 shows a line 2 drawn by the operator.
28 and the outline 229 of the object 226 are shown. At this stage, the correct position of the outline 229 has not been extracted, of course, but the outline 229 is shown to indicate the positional relationship with the line 228.

【０２２５】まず、輪郭線２２８を含むようにブロック
が設定される。具体的には、画面をラスタ順でスキャン
し、輪郭線２２８があった時、つまり、輪郭線２２８の
シェイプ画像において、隣接する画素値に差があった
時、その画素を中心にして所定のサイズのブロックを設
ける。この際、既に設定したブロックと今回のブロック
が重なる場合には、今回のブロック設定は行わずに、ス
キャンを進めるようにすると、図３６のように互いに重
なりがなく、なおかつ接するようにブロックが設定でき
る。しかし、これだけでは、部分２３０，２３１，２３
２がブロックに入っていない。そこで、もう一度スキャ
ンを行い、ブロックに含まれない輪郭線があった時、や
はり、その画素を中心にしてブロックが設けられる。但
し、２度目のスキャンの時には、今回のブロックが既に
設定したブロックと重なる部分があっても、中心とする
画素が既に設定したブロックに含まれない限り、今回の
ブロックの設定を行う。図３７において斜線で示すブロ
ック２３３，２３４，２３５，２３６が２スキャン目で
設定されたブロックである。ブロックサイズは、固定に
してもよいが、輪郭線２２８によって囲まれる画素数が
多い場合には大きく、その画素数が少ない場合には小さ
く、輪郭線２２８の凸凹が少ない場合には大きく、凸凹
が多い場合には小さく、あるいは、画像の図柄が平坦な
場合には大きく、図柄が細かい場合には小さく、設定し
てもよい。First, a block is set so as to include the outline 228. More specifically, the screen is scanned in raster order, and when there is an outline 228, that is, when there is a difference between adjacent pixel values in the shape image of the outline 228, a predetermined value is set around that pixel. Provide a block of size. At this time, if the block already set and the current block overlap, the current block setting is not performed, and if the scan is advanced, the blocks are set so that they do not overlap each other as shown in FIG. it can. However, this alone is not sufficient for parts 230, 231, 23
2 is not in the block. Therefore, scanning is performed again, and when there is a contour line not included in the block, a block is provided around the pixel. However, at the time of the second scan, even if there is a portion where the current block overlaps with the already set block, the current block is set unless the center pixel is included in the already set block. In FIG. 37, hatched blocks 233, 234, 235, and 236 are blocks set in the second scan. The block size may be fixed, but is large when the number of pixels surrounded by the contour 228 is large, small when the number of pixels is small, large when the contour 228 has few irregularities, It may be set to be small when there are many, or large when the design of the image is flat, and small when the design is fine.

【０２２６】画面の端では、普通にブロックを設定する
と画面からはみだしてしまうことがある。そういう場合
は、そのブロックだけ、画面からはみ出さないようにブ
ロックの端を切って長方形のブロックにする。この場合
は相似ブロックも長方形とする。At the edge of the screen, if a block is set normally, it may run off the screen. In such a case, the end of the block is cut into a rectangular block so that only the block does not protrude from the screen. In this case, the similar blocks are also rectangular.

【０２２７】以上がシェイプ画像におけるブロックの設
定方法である。The above is the method of setting blocks in a shape image.

【０２２８】次に、ブロック毎に、その相似なブロック
をテクスチャ画像を用いて探索する。ここで、相似と
は、ブロックサイズが異なるブロック同士で、一方のブ
ロックサイズを他方と同じになるように、拡大あるいは
縮小した時に、対応する画素の画素値がほぼ等しくなる
ことをいう。例えば、図３８のブロック２３７に対して
は、ブロック２３８が、テクスチャ画像の図柄が相似に
なる。同様に、ブロック２３９に対しては、ブロック２
４０が、ブロック２４１に対してはブロック２４２が、
相似である。本実施形態では、相似ブロックは、輪郭線
上に設定したブロックよりも大きくする。また、相似ブ
ロックは、画面全体を探索するのではなく、例えば、図
３９に示す様に、ブロック２４３の近くのブロック２４
４，２４５，２４６，２４７を四隅とするある一定の範
囲内で探索すれば十分である。図３９は各ブロックの中
心を起点におき、ブロック２４３の起点を用いて、ブロ
ック２４４，２４５，２４６，２４７の起点を所定の画
素幅だけ、上下方向と左右方向に動かした場合である。
起点をブロックの左上角においた場合を図４０に示す。Next, for each block, a similar block is searched using a texture image. Here, “similar” means that, when blocks having different block sizes are enlarged or reduced so that one block size is the same as the other, the pixel values of the corresponding pixels are substantially equal. For example, with respect to the block 237 in FIG. 38, the block 238 has a similar texture image pattern. Similarly, for block 239, block 2
40, the block 241 is the block 242,
It is similar. In the present embodiment, the similar block is made larger than the block set on the contour line. The similar block does not search the entire screen, but, for example, as shown in FIG.
It is sufficient to search within a certain range with four corners at 4,245,246,247. FIG. 39 shows a case where the starting point of the block 243 is used to move the starting points of the blocks 244, 245, 246, and 247 in the vertical and horizontal directions by a predetermined pixel width using the starting point of each block.
FIG. 40 shows a case where the starting point is located at the upper left corner of the block.

【０２２９】探索範囲内でも、一部が画面からはみ出す
相似ブロックは、探索の対象から外すのであるが、ブロ
ックが画面の端にあると、探索範囲にある全ての相似ブ
ロックが探索の対象から外れてしまうことがある。そう
いう場合には、画面の端のブロックについては、探索範
囲を画面の内側にずらして対応する。[0229] Even within the search range, similar blocks partially protruding from the screen are excluded from the search target. However, if the block is at the edge of the screen, all similar blocks within the search range are excluded from the search target. Sometimes. In such a case, for the block at the end of the screen, the search range is shifted to the inside of the screen.

【０２３０】相似ブロックの探索は、多段階探索を行う
と、演算量を少なくできる。多段階探索とは、例えば１
画素や半画素ずつ起点をずらしながら、探索範囲全体を
探索するのではなく、初めに、とびとびの位置の起点で
誤差を調べる。次に、その中で誤差が小さかった起点の
周囲だけを少し細かく起点を動かして誤差を調べるとい
うことを繰り返しながら、相似ブロックの位置をしぼり
こんでいく方法である。When searching for similar blocks, a multi-step search can reduce the amount of calculation. The multi-step search is, for example, 1
Instead of searching the entire search range while shifting the starting point by a pixel or a half pixel, first, an error is checked at the starting point of the discrete position. Next, a method of narrowing down the position of the similar block by repeatedly moving the starting point slightly around the starting point where the error is small and checking the error is repeated.

【０２３１】相似ブロックの探索において、相似ブロッ
クの縮小処理を毎回行うと、処理時間が多く必要であ
る。そこで、予め画像全体を縮小したものを生成し、別
のメモリに保持しておけば、相似ブロックに対応する部
分のデータをそのメモリから読み出すだけで済む。In the search for similar blocks, if processing for reducing similar blocks is performed every time, a long processing time is required. Therefore, if a reduced image of the entire image is generated in advance and stored in another memory, it is only necessary to read out the data corresponding to the similar block from the memory.

【０２３２】図３８では３つのブロック２３７，２３
９，２４１についてだけ、相似ブロックを示している
が、実際には、図３７で示した全てのブロックに対して
相似ブロックを求める。以上が相似ブロックの探索方法
である。相似ブロックの探索はシェイプ画像ではなく、
テクスチャ画像を用いることが肝要である。画面内で相
似ブロックをブロックに写像する一次変換を考えた時
に、テクスチャ画像の輪郭線は、この一次変換において
不変である。In FIG. 38, three blocks 237, 23
Similar blocks are shown only for 9, 241. However, similar blocks are actually obtained for all the blocks shown in FIG. The above is the similar block search method. Searching for similar blocks is not a shape image,
It is important to use a texture image. When considering a primary transformation that maps similar blocks to blocks in a screen, the contour line of the texture image is invariant in the primary transformation.

【０２３３】次に、各ブロックとその相似ブロックの位
置関係を用いて、シェイプ画像の輪郭がテクスチャ画像
の輪郭に合うように補正する方法を説明する。Next, a method of correcting the contour of the shape image so as to match the contour of the texture image using the positional relationship between each block and its similar blocks will be described.

【０２３４】図４１において、輪郭線２２８が操作者に
よって描かれた線である。この線が、正しい輪郭線２２
９に近づけばよい。そのために、シェイプ画像の相似ブ
ロック２３８の部分を読み出し、それをブロック２３７
と同じサイズに縮小したもので、シェイプ画像のブロッ
ク２３７の部分を置き換える。この操作には、輪郭線
を、相似ブロックからブロックへの一次変換の不動点を
含む不変集合に近づける性質があるので、輪郭線２２８
は輪郭線２２９に近づく。相似ブロックの一辺がブロッ
クの一辺の２倍の長さの時、１回の置き換えで、輪郭線
２２８と正しい輪郭線２２９の隔たりは概して、１／２
になる。この置き換えを全てのブロックに対して１回行
った結果が図４２の輪郭線２４８である。このブロック
の置き換えを繰り返せば輪郭線２４８は、正しい輪郭線
にさらに近づき、やがて、図４３に示すように、正しい
輪郭線に一致する。実際には、２本の輪郭線のずれが画
素間距離よりも小さい状態は意味がないので適当な回数
で置き換えを終了する。本手法は、シェイプ画像で設定
したＮ×Ｎ画素のブロックにテクスチャ画像での輪郭線
が含まれる時に有効なのであるが、その場合、シェイプ
画像の輪郭線とテクスチャ画像の輪郭線の距離は最大で
およそＮ／２である。相似ブロックの一辺の長さが、ブ
ロックの一辺の長さのＡ倍とした時、１回の置き換えに
つき、２本の輪郭線の距離は１／Ａになるのであるか
ら、この距離が１画素よりも短くなることを式で表す
と、置き換え回数をｘとして、（Ｎ／２）×（１／Ａ）＾ｘ＜１となる。ここで＾はべき乗を表し、上式では（１／Ａ）
をｘ回乗ずるという意味である。上式から、ｘ＞ｌｏｇ（２／Ｎ）／ｌｏｇ（１／Ａ）となる。例えばＮ＝８，Ａ＝２の時は、ｘ＞２であり、置き換え回数は３回で十分である。In FIG. 41, the outline 228 is a line drawn by the operator. This line is the correct contour line 22
It should be close to 9. For this purpose, the similarity block 238 of the shape image is read out, and is read out of the block 237.
And the block size 237 of the shape image is replaced. This operation has the property of bringing the contour line closer to an invariant set including the fixed point of the linear transformation from the similar block to the block.
Approaches the contour 229. When one side of the similar block is twice as long as one side of the block, the gap between the contour 228 and the correct contour 229 is generally １／ by one replacement.
become. The result of performing this replacement once for all blocks is the contour 248 in FIG. If this block replacement is repeated, the outline 248 comes closer to the correct outline, and eventually coincides with the correct outline as shown in FIG. In practice, it is meaningless that the displacement between the two contour lines is smaller than the pixel-to-pixel distance. This method is effective when a block of N × N pixels set in the shape image includes a contour line in the texture image. In this case, the distance between the contour line of the shape image and the contour line of the texture image is maximum. It is approximately N / 2. When the length of one side of the similar block is A times the length of one side of the block, the distance between two contour lines is 1 / A for each replacement. When the number of replacements is x, (N / 2) × (1 / A) ＾ x <1. Here, ＾ represents a power, and in the above equation, (1 / A)
Is multiplied x times. From the above equation, x> log (2 / N) / log (1 / A). For example, when N = 8 and A = 2, x> 2, and three replacements are sufficient.

【０２３５】この物体抽出装置のブロック図を図３０に
示す。まず、操作者によって入力されるシェイプ画像２
４９が、シェイプメモリ２５０に記録される。シェイプ
メモリ２５０においては、図３６，３７を用いて前述し
た様にブロックが設定される。一方、テクスチャ画像２
５１は、テクスチャメモリ２５２に記録される。テクス
チャメモリ２５２からは、シェイプメモリ２５０から送
られる、ブロックの位置情報２５３を参照して、ブロッ
クのテクスチャ画像２５４が探索回路２５５に送られ
る。同時に、図３９や図４０を用いて説明した様に相似
ブロックの候補もテクスチャメモリ２５２から、探索回
路２５５に送られる。探索回路２５５では、相似ブロッ
クの各候補を縮小した後に、ブロックとの誤差を計算
し、その誤差が最小となったものを相似ブロックとして
決定する。誤差としては輝度値の差分の絶対値和や、そ
れに色差の差分の絶対値和を加えたものなどが考えられ
る。輝度だけに比べて、色差も用いると、演算量は多く
なるが、物体の輪郭において輝度の段差が小さくても、
色差の段差が大きい場合に正しく相似ブロックを決定で
きるので精度が向上する。相似ブロックの位置の情報２
５６は縮小変換回路２５７に送られる。縮小変換回路２
５７には、シェイプメモリ２５０から、相似ブロックの
シェイプ画像２５８も送られる。縮小変換回路２５７で
は、相似ブロックのシェイプ画像が縮小され、その縮小
された相似ブロックは輪郭線が補正されたシェイプ画像
２５９として、シェイプメモリ２５０に返され、対応す
るブロックのシェイプ画像が上書きされる。このシェイ
プメモリ２５０の置き換えが所定の回数に達した時は、
補正されたシェイプ画像２５９は外部に出力される。シ
ェイプメモリ２５０の書き換えは、ブロック毎に逐次上
書きしても良いし、メモリを２画面分用意して、一方か
ら他方へ、初めに画面全体のシェイプ画像をコピーした
後に、輪郭部分のブロックは相似ブロックを縮小したも
ので置き換えるようにしても良い。FIG. 30 is a block diagram of the object extracting apparatus. First, the shape image 2 input by the operator
49 are recorded in the shape memory 250. Blocks are set in the shape memory 250 as described above with reference to FIGS. On the other hand, texture image 2
51 is recorded in the texture memory 252. The texture memory 252 sends a block texture image 254 to the search circuit 255 with reference to the block position information 253 sent from the shape memory 250. At the same time, similar block candidates are also sent from the texture memory 252 to the search circuit 255 as described with reference to FIGS. The search circuit 255 calculates the error with respect to the block after reducing each similar block candidate, and determines the one with the minimum error as the similar block. The error may be a sum of absolute values of differences between luminance values or a sum of absolute values of differences of color differences. Compared to the luminance alone, using the color difference also increases the amount of computation, but even if the luminance step at the contour of the object is small,
Since the similar block can be correctly determined when the step of the color difference is large, the accuracy is improved. Information on similar block position 2
56 is sent to the reduction conversion circuit 257. Reduction conversion circuit 2
57, a shape image 258 of a similar block is also sent from the shape memory 250. In the reduction conversion circuit 257, the shape image of the similar block is reduced, and the reduced similar block is returned to the shape memory 250 as a shape image 259 whose contour is corrected, and the shape image of the corresponding block is overwritten. . When the replacement of the shape memory 250 reaches a predetermined number,
The corrected shape image 259 is output to the outside. The rewriting of the shape memory 250 may be sequentially overwritten for each block. Alternatively, two screens of memory are prepared, and the shape image of the entire screen is first copied from one to the other. The block may be replaced with a reduced one.

【０２３６】この物体抽出方法を図４８のフローチャー
トを参照して説明する。This object extraction method will be described with reference to the flowchart in FIG.

【０２３７】（フレーム内の縮小ブロックマッチングに
よる物体抽出方法）ステップＳ３１では、シェイプデー
タの輪郭部分にブロックが設定される。ステップＳ３２
では、現処理ブロックと画像データの図柄が相似である
相似ブロックが同じ画像データから見つけられる。ステ
ップＳ３３では、現処理ブロックのシェイプデータを相
似ブロックのシェイプデータが縮小したデータで置き換
えられる。(Method of Extracting Object Using Reduced Block Matching in Frame) In step S31, a block is set at the outline of the shape data. Step S32
In, a similar block in which the design of the current processing block is similar to that of the image data is found from the same image data. In step S33, the shape data of the current processing block is replaced with data obtained by reducing the shape data of the similar block.

【０２３８】ステップＳ３４で処理済みブロック数が所
定の数に達したらステップ３５に進む、そうでない場合
は次のブロックに処理対象を進めてステップ３２に戻
る。If the number of processed blocks has reached the predetermined number in step S34, the flow advances to step S35. Otherwise, the processing target advances to the next block and the flow returns to step S32.

【０２３９】ステップＳ３５では置き換えの繰り返し回
数が所定の数に達したらステップ３６に進む、そうでな
い場合は、置き換えられたシェイプデータを処理対象と
してステップ３１に戻る。ステップＳ３６では、置き換
えを繰り返されたシェイプデータが物体領域として出力
される。In step S35, if the number of repetitions of replacement reaches a predetermined number, the flow advances to step S36. Otherwise, the flow returns to step S31 with the replaced shape data as a processing target. In step S36, the shape data subjected to repeated replacement is output as an object area.

【０２４０】この方法はブロックのエッジと相似ブロッ
クのエッジが合った場合に効果がある。従って、ブロッ
クに複数のエッジがある場合には、エッジが正しく合わ
ないことがあるので、そういうブロックについては置き
換えをせずに入力されたままのエッジを保持する。具体
的には、ブロックのシェイプ画像を左右方向と上下方向
に各ラインをスキャンし、１つのラインで“０”から
“２５５”へ、あるいは“２５５”から“０”に変化す
る点が２つ以上あるラインが所定の数以上あるブロック
は置き換えをしない。また、物体と背景の境界であって
も部分によっては、輝度などが平坦な場合がある。この
ような場合もエッジ補正の効果が期待できないのでテク
スチャ画像の分散が所定値以下のブロックについても置
き換えをせずに、入力されたままのエッジを保持する。This method is effective when the edge of a block matches the edge of a similar block. Therefore, if a block has a plurality of edges, the edges may not match correctly. Therefore, for such a block, the edge that has been input is retained without being replaced. Specifically, the shape image of the block is scanned in each line in the horizontal direction and the vertical direction, and two points at which one line changes from "0" to "255" or from "255" to "0". A block having a predetermined number of lines or more is not replaced. Further, even at the boundary between the object and the background, the luminance may be flat depending on the portion. Even in such a case, since the effect of the edge correction cannot be expected, the edge as input is held without replacing the block in which the variance of the texture image is equal to or less than the predetermined value.

【０２４１】相似ブロックの誤差が所定値よりも小さく
ならない場合、縮小をあきらめ、同じサイズで相似ブロ
ックを求めても良い。この際、自分とはなるべく重なら
ないように相似ブロックを選ぶ。縮小を行わないブロッ
クだけでは、エッジが補正される効果はないが、縮小を
行うことによってエッジが補正されたブロックから、そ
の補正されたエッジをコピーすることで、縮小を行わな
いブロックについても、間接的にエッジが補正される。If the error of the similar block does not become smaller than a predetermined value, the similar block may be obtained with the same size by giving up the reduction. At this time, choose similar blocks so as to avoid overlapping with yourself. The block that is not reduced alone does not have the effect of correcting the edge.However, by copying the corrected edge from the block whose edge has been corrected by performing the reduction, the block that is not reduced is also Edges are indirectly corrected.

【０２４２】図４８に示したフローチャートは相似ブロ
ックを見つけた直後にシェイプ画像の置き換えを行う例
であったが、全ブロックの相似ブロックの位置情報を保
持するようにすることで、初めに、相似ブロックの探索
を全ブロックについて行い、次に、シェイプ画像の置き
換えを全ブロックについて行う方法を図５０のフローチ
ャートを参照して説明する。The flowchart shown in FIG. 48 is an example in which a similar image is replaced immediately after a similar block is found. However, by retaining the position information of similar blocks of all blocks, the similarity is firstly obtained. A method of performing block search for all blocks and then replacing the shape image for all blocks will be described with reference to the flowchart of FIG.

【０２４３】この例では、１回の相似ブロックの探索に
対してシェイプ画像の置き換えが複数回反復できる。In this example, the replacement of the shape image can be repeated a plurality of times for one similar block search.

【０２４４】ステップＳ４１では、シェイプデータの輪
郭部分にブロックが設定される。ステッブＳ４２では、
現処理ブロックと画像データの図柄が相似である相似ブ
ロックが同じ画像データ内から見つけられる。ステップ
Ｓ４３では、全てのブロックについて相似ブロックを見
つける処理が終わったとき、つまり、処理済みブロック
数が所定の数に達したときにはステップＳ４４に進む。
そうでない場合はステップＳ４２に戻る。ステップＳ４
４では、現処理ブロックのシェイプデータを相似ブロッ
クのシェイプデータを縮小したもので置き換える。In step S41, a block is set at the outline of the shape data. In step S42,
A similar block in which the pattern of the current processing block is similar to that of the image data is found in the same image data. In step S43, when the process of finding similar blocks has been completed for all blocks, that is, when the number of processed blocks has reached a predetermined number, the process proceeds to step S44.
If not, the process returns to step S42. Step S4
In step 4, the shape data of the current processing block is replaced with a reduced version of the shape data of the similar block.

【０２４５】ステップＳ４５では、全てのブロックにつ
いて置き換える処理が終わったとき、つまり、処理済み
ブロック数が所定の数に違したときにはステップＳ４６
に進む。そうでない場合はステップＳ４４に戻る。ステ
ップＳ４６では全ブロックの置き換え回数が所定の回数
に達した場合はＳ４７に進む。そうでない場合はＳ４４
に戻る。ステップＳ４７では、置き換え変換を繰り返さ
れたシェイプデータが物体領域として出力される。In step S45, when the process of replacing all blocks is completed, that is, when the number of processed blocks is different from the predetermined number, step S46 is executed.
Proceed to. If not, the process returns to step S44. In step S46, if the number of replacements of all blocks has reached the predetermined number, the process proceeds to S47. Otherwise, S44
Return to In step S47, the shape data on which the replacement conversion has been repeated is output as an object region.

【０２４６】次にエッジ補正の精度を上げることができ
るブロックの設定方法を説明する。Next, a description will be given of a method of setting a block capable of improving the accuracy of edge correction.

【０２４７】前述したようにシェイブ画像の輪郭線の周
囲にブロックを設定する方法では、図５１（ａ）に示さ
れるように、正しい輪郭線３０１の一部がブロックに含
まれなくなることがある。ここで、シェイプ画像の輪郭
線３０２は太い線で示してある。仮に輪郭線の右下側が
物体で左上側が背景だとすると、本当は背景である部分
３０３は物体と誤って設定されているにもかかわらず、
ブロックに含まれないために修正される可能性がない。
このようにブロックと正しい輪郭線の間に隙間がある
と、正しく補正されない。As described above, in the method of setting a block around the contour of the shaved image, as shown in FIG. 51A, a part of the correct contour 301 may not be included in the block. Here, the outline 302 of the shape image is indicated by a thick line. Assuming that the lower right side of the contour is an object and the upper left side is a background, a part 303 that is actually a background is erroneously set as an object,
It cannot be modified because it is not included in the block.
If there is a gap between the block and the correct contour line, the correction is not performed correctly.

【０２４８】ブロックと正しい輪郭線の隙間を小さくす
るには、図５１（ｂ）に示したようにブロックをある程
度重なり合わせる方法がある。こうすると、ブロックの
数が増えるので、演算量は増加するが隙間３０４は小さ
くなる。従って抽出の精度は向上する。しかし、この例
では、まだ隙間は完全にはなくならない。In order to reduce the gap between the block and the correct outline, there is a method of overlapping the blocks to some extent as shown in FIG. In this case, the number of blocks increases, so that the calculation amount increases, but the gap 304 becomes smaller. Therefore, the accuracy of extraction is improved. However, in this example, the gap is not completely eliminated yet.

【０２４９】隙間を小さくするには、図５１（ｃ）に示
したようにブロックサイズを大きくすることも有効であ
る。この例では、前述したブロックの重ねあわせを併用
した。これにより、この例では隙間が完全になくなる。To reduce the gap, it is effective to increase the block size as shown in FIG. In this example, the above-described superposition of blocks is used together. This completely eliminates the gap in this example.

【０２５０】このように、輪郭線の補正可能な範囲を広
げるにはブロックサイズを大きくすることが有効であ
る。しかし、ブロックサイズが大きすぎると、ブロック
に含まれる輪郭線の形状が複雑になり、相似ブロツクが
見つかりにくくなる。その例が図５２に示されている。As described above, it is effective to increase the block size in order to expand the range in which the contour can be corrected. However, if the block size is too large, the shape of the contour line included in the block becomes complicated, and it becomes difficult to find similar blocks. An example is shown in FIG.

【０２５１】図５２（ａ）では、斜線の部分３０５が物
体領域、白色の部分３０６が背景領域を表す。与えられ
たシェイプ画像の輪郭線３０７は黒線で示されている。
このように、シェイプ画像の輪郭線３０７は正しい輪郭
線と大きく隔たっており、また、正しい輪郭線には凹凸
がある。これに対して、前に説明した方法とは異なる方
法でブロックを配置した結果が図５２（ｂ）に示されて
いる。ここでは、まず、互いに重ならず、かつ、隙間が
ないような矩形ブロックで画像が分割される。ブロック
毎にテクスチャ画像での分散が計算され、分散が所定値
よりも小さいブロックはその設定を解消した。従って図
５２（ｂ）では、分散が所定値以上のブロックだけが残
っている。これらのブロック毎に相似ブロックを求める
のであるが、例えばブロック３０８の近くにこれを縦横
２倍にした図柄は存在しないし、他の多くのブロックに
ついても同様である。従って、誤差最小な部分を相似ブ
ロックとして選択はするものの、その位置関係を用いて
シェィブ画像の置き換え変換を反復しても図５２（ｃ）
に示す通り正しい輪郭線には合致しない。ただ、図５２
（ａ）のシェイプ画像の輪郭線３０７と比較して、エッ
ジ補正後の図５２（ｃ）のシェイプ画像の輪郭線３０９
は、テクスチャ画像の輪郭線の大まかな凹凸（左と右に
山があってその間に谷があるという程度のもの）は反映
されている。この例で仮にブロックサィズを小さくする
と、この大まかな補正さえされなくなってしまう。In FIG. 52A, a hatched portion 305 represents an object region, and a white portion 306 represents a background region. The contour line 307 of the given shape image is indicated by a black line.
As described above, the contour 307 of the shape image is largely separated from the correct contour, and the correct contour has irregularities. On the other hand, the result of arranging blocks by a method different from the method described above is shown in FIG. Here, first, the image is divided into rectangular blocks that do not overlap each other and have no gap. The variance in the texture image is calculated for each block, and the setting is canceled for a block whose variance is smaller than a predetermined value. Therefore, in FIG. 52 (b), only the blocks whose variance is equal to or more than the predetermined value remain. A similar block is obtained for each of these blocks. For example, there is no pattern in which this is doubled vertically and horizontally near the block 308, and the same applies to many other blocks. Therefore, although the portion having the smallest error is selected as a similar block, even if the replacement transformation of the sheave image is repeated using the positional relationship, it is possible to obtain the data shown in FIG.
Does not match the correct contour as shown in FIG. However, FIG.
Compared with the contour 307 of the shape image of FIG. 52A, the contour 309 of the shape image of FIG.
Reflects the rough unevenness of the contour line of the texture image (a level having a mountain on the left and right and a valley between them). If the block size is reduced in this example, even this rough correction is not performed.

【０２５２】このように、補正の範囲を広げるためにブ
ロックサイズを大きくすると、ブロックに含まれる輪郭
線の形状が複雑になり、相似ブロックが見つかりにくく
なることがある。その結果エッジの補正が大まかにしか
されなくなる。このような場合には、ブロックのサイズ
を初めは大きなサイズでエッジ補正を行い、その結果に
対して、再度ブロックサイズを小さくしてエッジ補正を
行うと、補正の精度が向上する。図５２（ｃ）に対し
て、ブロックサイズを縦横１／２にして補正を再度行
い、さらに１／４にして補正を行クた結果が図５２
（ｄ）に示される。このように、ブロックサイズを次第
に小さくしながら補正を繰り返せば補正の精度を向上で
きる。As described above, if the block size is increased in order to widen the range of correction, the shape of the outline included in the block becomes complicated, and it may be difficult to find similar blocks. As a result, the edge is only roughly corrected. In such a case, if edge correction is performed with a large block size at first, and the result is subjected to edge correction with the block size being reduced again, the accuracy of correction is improved. 52 (c), the block size is reduced to 縦 in height and width, and the correction is performed again.
It is shown in (d). As described above, the accuracy of the correction can be improved by repeating the correction while gradually reducing the block size.

【０２５３】ブロックサイズを次第に小さくする方法を
図５３のフローチャートを参照して説明する。A method for gradually reducing the block size will be described with reference to the flowchart in FIG.

【０２５４】ステップＳ５１ではブロックサイズｂ＝Ａ
と設定する。ステップＳ５２では、図４８または図５０
に示したエッジ補正と同様なエッジ補正を行う。ステッ
プＳ５３では、ｂを観察し、ｂがＺ（＜Ａ）より小さく
なると、この処理は終了する。ｂがｚ以上の場合にはス
テップＳ５４に進む。ステップＳ５４でブロックサイズ
ｂを半分にしてＳ５２に進む。In the step S51, the block size b = A
Set as In step S52, FIG.
Edge correction similar to the edge correction shown in FIG. In step S53, b is observed, and when b becomes smaller than Z (<A), this processing ends. If b is greater than or equal to z, the process proceeds to step S54. In step S54, the block size b is halved, and the process proceeds to S52.

【０２５５】以上、ブロックサイズを初めは大きめに
し、次第に小さくしながら補正を繰り返すことで補正の
精度を向上する例を示した。As described above, an example has been described in which the correction accuracy is improved by increasing the block size at first and repeating the correction while gradually reducing the block size.

【０２５６】図５４（ａ）に、ブロックを４５度傾ける
ことで、ブロックと正しい輪郭線の間に隙間をできにく
くする例が示されている。このように、輪郭線が斜めの
場合にはブロックサイズを図５１（ｃ）ほど大きくしな
くても、ブロックを傾ければ正しい輪郭線を覆うことが
できる。また、この例では、ブロックの重なりを無くし
ても図５４（ｂ）のように正しい輪郭線を覆える。この
ように、シェイプ画像の輪郭線と同じ向きにブロックの
辺を傾けることで、ブロックと正しい輪郭線の間に隙間
を生じにくくすることができる。具体的には、アルフア
画像の輪郭線の傾きを検知しそれが水平か垂直に近い場
合にはブロツクの向きは図５１（ｃ）のようにし、そう
でない場合にはブロックの傾きは図５４（ｂ）のように
する。水平や垂直に近いという判断はしきい値との比較
で行う。FIG. 54 (a) shows an example in which a block is inclined by 45 degrees to make it difficult to form a gap between the block and a correct contour line. As described above, when the contour is oblique, the correct contour can be covered by inclining the block without increasing the block size as in FIG. 51C. Further, in this example, even if the blocks do not overlap, the correct outline can be covered as shown in FIG. In this manner, by inclining the sides of the block in the same direction as the outline of the shape image, it is possible to make it difficult to generate a gap between the block and the correct outline. Specifically, the inclination of the contour line of the alpha image is detected, and if it is horizontal or nearly vertical, the block orientation is as shown in FIG. 51 (c). Otherwise, the block inclination is as shown in FIG. b). The determination that it is close to horizontal or vertical is made by comparing with a threshold value.

【０２５７】以上が、最初のフレームの物体抽出処理で
ある。これは、必ずしも動画像の最初のフレームだけで
はなく、静止画像一般に用いることができる手法であ
る。なお、置き換えを１回行ったシェイプ画像に対し
て、ブロックを設定しなおし、その相似ブロックを求め
なおして、２回目の置き換えを行うというように、置き
換えの度にブロック設定と相似ブロックの探索を行え
ば、演算量は増えるが、より補正の効果が得られる。The above is the object extraction processing of the first frame. This is a technique that can be used not only for the first frame of a moving image but also for still images in general. In addition, a block is set again for the shape image that has been replaced once, a similar block is obtained again, and the second replacement is performed. If this is performed, the amount of calculation increases, but the effect of correction can be obtained more.

【０２５８】また、相似ブロックはブロックのなるべく
近い部分から選ばれるのが好ましいので、相似ブロック
を探す範囲をブロックサイズによって切り換えるとよ
い。即ち、ブロックサイズが大きい場合には、相似ブロ
ックを探す範囲を広くし、ブロックサイズが小さい場合
には、相似ブロックを探す範囲を狭くする。It is preferable that the similar block is selected from a portion as close as possible to the block. Therefore, the range in which the similar block is searched may be switched according to the block size. That is, when the block size is large, the range for searching for similar blocks is widened, and when the block size is small, the range for searching for similar blocks is narrowed.

【０２５９】また、本手法では、シェイプデータの置き
換えの過程で、シェイプデータに小さい穴や、孤立した
小領域が誤差として出現することがある。そこで、ステ
ップＳ３４，Ｓ３５，Ｓ３６，Ｓ４５，Ｓ４６，Ｓ４
７，Ｓ５３の前などで、シェイプデータから、小さい穴
や、孤立した小領域を除くことにより、補正の精度を向
上することができる。小さい穴や、孤立した小領域を除
くには、例えば、画像解析ハンドブック（高木、下田監
修、東京大学出版会、初版１９９１年１月）５７５〜５
７６頁に記載されている膨張と収縮を組み合わせた処理
や、６７７頁に記載された多数決フィルタなどを用い
る。In the present method, small holes or isolated small areas may appear as errors in the shape data replacement process. Therefore, steps S34, S35, S36, S45, S46, S4
7, before S53, etc., the accuracy of correction can be improved by removing small holes and isolated small areas from the shape data. To remove small holes and isolated small areas, for example, an image analysis handbook (supervised by Takagi and Shimoda, University of Tokyo Press, First Edition, January 1991) 575-5
Processing combining the expansion and contraction described on page 76, a majority filter described on page 677, and the like are used.

【０２６０】また、ブロックは図４９に示すように、よ
り簡易的に設定しても良い。すなわち、画面を単純にブ
ロック分割し、そのうちブロック２２００など、輪郭線
２２８を含むブロックについてのみ、相似ブロックの探
索や置き換えの処理を行う。Also, the blocks may be set more simply as shown in FIG. That is, the screen is simply divided into blocks, and similar blocks are searched or replaced only for blocks including the outline 228, such as the block 2200.

【０２６１】また、与えられるテクスチャ画像が予めフ
ラクタル符号化（特公平０８−３２９２５５号公報「画
像の領域分割方法及び装置」）によって圧縮されている
のであれば、その圧縮データに、各ブロックの相似ブロ
ックの情報が含まれている。従って、輪郭線２２８を含
むブロックの相似ブロックとしては、圧縮データを流用
すれば、改めて相似ブロックを探索する必要はない。If the given texture image has been compressed in advance by fractal coding (Japanese Patent Publication No. 08-329255, “Method and Apparatus for Image Region Division”), the compressed data is similar to the compressed data of each block. Contains block information. Therefore, if compressed data is used as a similar block of the block including the outline 228, there is no need to search for a similar block again.

【０２６２】図２７に戻り、動画から物体を抽出する物
体抽出装置の説明を続ける。Returning to FIG. 27, description of the object extracting apparatus for extracting an object from a moving image will be continued.

【０２６３】動き補償による物体抽出回路２４２では、
テクスチャ画像２２１から検出される動きベクトルを用
いながら、最初のフレームのシェイプ画像２５を元にし
て、２フレーム目以降のフレームのシェイプ画像２６０
を生成する。In the object extraction circuit 242 using motion compensation,
Using the motion vector detected from the texture image 221, based on the shape image 25 of the first frame, the shape image 260 of the second and subsequent frames.
Generate

【０２６４】図２９に動き補償による物体抽出回路２２
４の例が示される。最初のフレームのシェイプ画像２２
５が、シェイプメモリ２６１に記録される。シェイプメ
モリ２６１においては、図４５のフレーム２６２に示し
た様に、画面全体にブロックが設定される。一方、テク
スチャ画像２２１は、動き推定回路２６４に送られ、ま
た、テクスチャメモリ２６３に記録される。テクスチャ
メモリ２６３からは、１フレーム前のテクスチャ画像２
６５が動き推定回路２６４に送られる。動き推定回路２
６４では、現処理フレームのブロック毎に、１フレーム
前のフレーム内から誤差が最小となる参照ブロックを見
つける。図４５に、ブロック２６７と、１フレーム前の
フレーム２６６から選ばれた参照ブロック２６８の例が
示される。ここで、誤差が所定のしきい値よりも小さく
なるのであれば、参照ブロックはブロックよりも大きく
する。ブロック２６９と縦横２倍の大きさの参照ブロッ
ク７０の例も図４５に示す。FIG. 29 shows an object extraction circuit 22 using motion compensation.
Four examples are shown. Shape image 22 of the first frame
5 is recorded in the shape memory 261. In the shape memory 261, blocks are set on the entire screen as shown in a frame 262 in FIG. On the other hand, the texture image 221 is sent to the motion estimation circuit 264 and is recorded in the texture memory 263. From the texture memory 263, the texture image 2 one frame before
65 is sent to the motion estimation circuit 264. Motion estimation circuit 2
In 64, a reference block with the minimum error is found from within the frame one frame before for each block of the current processing frame. FIG. 45 shows an example of a block 267 and a reference block 268 selected from the frame 266 one frame before. Here, if the error is smaller than a predetermined threshold, the reference block is made larger than the block. FIG. 45 also shows an example of a block 269 and a reference block 70 having a size twice as large in length and width.

【０２６５】図２９に戻り、参照ブロックの位置の情報
２７１は動き補償回路２７２に送られる。動き補償回路
２７２には、シェイプメモリ２６１から、参照ブロック
のシェイプ画像２７３も送られる。動き補償回路２７２
では、参照ブロックの大きさがブロックと同じ場合は、
そのまま、参照ブロックの大きさがブロックよりも大き
い場合は、参照ブロックのシェイプ画像が縮小され、そ
の参照ブロックのシェイプ画像は現処理フレームのシェ
イプ画像２６０として出力される。また、次のフレーム
に備えて、現処理フレームのシェイプ画像２６０はシェ
イプメモリ２６１に送られて、画面全体のシェイプ画像
が上書きされる。Returning to FIG. 29, the information 271 on the position of the reference block is sent to the motion compensation circuit 272. The shape image 273 of the reference block is also sent from the shape memory 261 to the motion compensation circuit 272. Motion compensation circuit 272
So, if the size of the reference block is the same as the block,
If the size of the reference block is larger than the block, the shape image of the reference block is reduced, and the shape image of the reference block is output as the shape image 260 of the current processing frame. In addition, in preparation for the next frame, the shape image 260 of the current processing frame is sent to the shape memory 261 to overwrite the shape image of the entire screen.

【０２６６】参照ブロックがブロックよりも大きい場
合、先に図４１，４２を用いて説明したのと同様に、輪
郭線が正しい位置からずれていた場合に補正する効果が
ある。従って、与えられる最初のフレームのシェイプ画
像に続く、動画シーケンスの全てのフレームにおいて、
物体が高い精度で抽出される。従来手法のように、動画
シーケンスの最初の方や、物体の動きが小さい時に、精
度が悪いという不具合はない。In the case where the reference block is larger than the block, there is an effect of correcting when the outline is shifted from the correct position, as described with reference to FIGS. Therefore, in every frame of the video sequence following the given first frame shape image,
Objects are extracted with high accuracy. Unlike the conventional method, there is no problem that accuracy is poor at the beginning of the moving image sequence or when the motion of the object is small.

【０２６７】フレーム間の動き補償による物体抽出を図
４７のフローチャートを参照して説明する。Object extraction by motion compensation between frames will be described with reference to the flowchart in FIG.

【０２６８】ステップＳ２１で現処理フレームがブロッ
クに分割される。ステップＳ２２では現処理ブロックと
画像データの図柄が相似であり、かつ、現処理ブロック
よりも大きい参照ブロックを各フレームあるいは、既に
シェイプデータを求めたフレーム内から見つける。ステ
ップＳ２３では参照ブロックのシェイプデータを切り出
して縮小したサブブロックを現処理ブロックに貼り付け
る。At step S21, the current processing frame is divided into blocks. In step S22, a reference block having a similar pattern between the current processing block and the image data and larger than the current processing block is found from each frame or from within the frame for which the shape data has already been obtained. In step S23, the shape data of the reference block is cut out and the reduced sub-block is pasted on the current processing block.

【０２６９】ステップＳ２４では処理済みブロック数が
所定の数に達したらステップ２５に進む、そうでない場
合は次のブロックに処理対象を進めてステップ２２に戻
る。ステップＳ２５では貼り合わされたシェイプデータ
を物体領域として出力する。In step S24, when the number of processed blocks reaches a predetermined number, the flow advances to step 25. Otherwise, the processing target advances to the next block and the flow returns to step S22. In step S25, the combined shape data is output as an object area.

【０２７０】ここで、各フレームとは、本実施例では最
初のフレームであり、予めシェイプ画像が与えられるフ
レームのことである。また、参照ブロックは必ずしも１
フレーム前のフレームでなくても、ここで述べたよう
に、既にシェイプ画像が求まっているフレームならよ
い。Here, each frame is the first frame in this embodiment, and is a frame to which a shape image is given in advance. The reference block is not necessarily 1
Even if it is not the frame before the frame, as described here, it is sufficient that the frame has a shape image already obtained.

【０２７１】以上が動き補償を用いた物体抽出の説明で
ある。物体抽出回路２２４としては、以上で説明した方
法の他に、先に出願した、特開平１０−００１８４７
「動画像の物体追跡／抽出装置」にあるフレーム間差分
画像を用いる方法などもある。The above is the description of the object extraction using the motion compensation. As the object extraction circuit 224, in addition to the above-described method, Japanese Patent Application Laid-Open No.
There is also a method using an inter-frame difference image in the “moving image object tracking / extracting device”.

【０２７２】図２７に戻り、動画から物体を抽出する物
体抽出装置の実施例の説明を続ける。Returning to FIG. 27, description of the embodiment of the object extracting apparatus for extracting an object from a moving image will be continued.

【０２７３】シェイプ画像２６０はスイッチ部２２３と
スイッチ部２８１に送られる。スイッチ部２２３では、
シェイプ画像２６０が“０”（背景）の時には、テクス
チャ画像２２１が、背景メモリ２７４に送られ、記録さ
れる。シェイプ画像２６０が“２５５”（物体）の時に
は、テクスチャ画像２２１は、背景メモリ２７４には送
られない。これをいくつかのフレームに対して行い、そ
のシェイプ画像２６０がある程度正確であれば、物体を
含まない、背景部分だけの画像が、背景メモリ２７４に
生成される。The shape image 260 is sent to the switch section 223 and the switch section 281. In the switch unit 223,
When the shape image 260 is “0” (background), the texture image 221 is sent to the background memory 274 and recorded. When the shape image 260 is “255” (object), the texture image 221 is not sent to the background memory 274. This is performed for some frames, and if the shape image 260 is accurate to some extent, an image of only the background portion that does not include the object is generated in the background memory 274.

【０２７４】次に、記録装置２２２から、テクスチャ画
像２７５が再度最初のフレームから順に読み出され、あ
るいは、操作者が指定する物体を抽出したいフレームだ
けが読み出され差分回路２７６に入力される。同時に、
背景メモリ２７４から、背景画像２７７が読み出され、
差分回路２７６に入力される。差分回路２７６では、テ
クスチャ画像２７５と背景画像２７７の、互いに画面内
で同じ位置にある画素同士の差分値２７８が求められ、
背景画像を用いた物体抽出回路２７９に入力される。物
体抽出回路２７９では、シェイプ画像２８０が生成され
るのであるが、これは、差分値２７８の絶対値が予め定
めるしきい値よりも大きい画素は、物体に属するとして
画素値を“２５５”とし、そうでない画素は、背景に属
するとして画素値“０”とすることで生成される。テク
スチャ画像として、輝度だけでなく、色差や色も用いる
場合は、各信号の差分の絶対値の和をしきい値と比較し
て物体か背景かが決定される。あるいは、輝度や色差毎
に別々にしきい値を定めて、輝度、色差のいずれかにお
いて、差分の絶対値がそのしきい値よりも大きい場合に
物体、そうでない場合に背景とが判定される。このよう
にして生成されたシェイプ画像２８０がスイッチ部２８
１に送られる。また、操作者によって決定される選択信
号２８２が外部からスイッチ部２８１に入力され、この
選択信号２８２によって、シェイプ画像２６０とシェイ
プ画像２８０のうちのいずれかが選択され、シェイプ画
像２８３として外部に出力される。操作者は、シェイプ
画像２６０とシェイプ画像２８０を各々、ディスプレイ
などに表示し、正確な方を選択する。あるいは、シェイ
プ画像２６０が生成された段階でそれを表示し、その精
度が満足するものでなかった場合に、シェイプ画像２８
０を生成し、シェイプ画像２６０の精度が満足するもの
であった場合には、シェイプ画像２８０を生成せずに、
シェイプ画像２６０をシェイプ画像２８３として外部に
出力するようにすれば、処理時間を節約できる。選択
は、フレーム毎に行ってもよいし、動画像シーケンス毎
に行ってもよい。Next, the texture image 275 is read again from the recording device 222 in order from the first frame, or only the frame from which the operator specifies an object to be extracted is read and input to the difference circuit 276. at the same time,
The background image 277 is read from the background memory 274,
The signal is input to the difference circuit 276. The difference circuit 276 calculates a difference value 278 between pixels of the texture image 275 and the background image 277 which are located at the same position in the screen.
It is input to an object extraction circuit 279 using a background image. In the object extraction circuit 279, a shape image 280 is generated. This is because a pixel whose absolute value of the difference value 278 is larger than a predetermined threshold value is regarded as belonging to the object and the pixel value is set to “255”. Other pixels are generated by setting the pixel value to “0” as belonging to the background. When not only luminance but also color difference or color is used as the texture image, the sum of the absolute values of the differences between the signals is compared with a threshold to determine whether the object or the background. Alternatively, a threshold value is separately set for each of the luminance and the color difference, and if the absolute value of the difference is larger than the threshold value in any of the luminance and the color difference, the object is determined, and if not, the background is determined. The shape image 280 generated in this way is
Sent to 1. In addition, a selection signal 282 determined by the operator is externally input to the switch unit 281, and one of the shape image 260 and the shape image 280 is selected by the selection signal 282 and output to the outside as the shape image 283. Is done. The operator displays each of the shape image 260 and the shape image 280 on a display or the like, and selects the correct one. Alternatively, when the shape image 260 is generated and displayed, and the accuracy is not satisfactory, the shape image 28
0, and if the accuracy of the shape image 260 is satisfactory, the shape image 280 is not generated,
If the shape image 260 is output to the outside as the shape image 283, processing time can be saved. The selection may be performed for each frame or for each moving image sequence.

【０２７５】図２７の物体抽出装置に対応する物体抽出
方法を図４６のフローチャートを参照して説明する。An object extraction method corresponding to the object extraction device of FIG. 27 will be described with reference to the flowchart of FIG.

【０２７６】（背景画像を用いる物体抽出方法）ステッ
プＳ１１では与えられる各フレームにおけるシェイプデ
ータを動き補償することにより各フレームのシェイプデ
ータが生成される。ステップＳ１２ではシェイプデータ
によって決定される背景領域の画像データが背景画像と
してメモリに記憶される。(Object Extraction Method Using Background Image) In step S11, the shape data of each frame is generated by motion-compensating the shape data of the given frame. In step S12, the image data of the background area determined by the shape data is stored in the memory as a background image.

【０２７７】ステップＳ１３では処理済みフレーム数が
所定の数に達したらステップ１４に進む、そうでない場
合は次のフレームに処理対象を進めてステップ１１に戻
る。ステップＳ１４では画像データと背景画像との差分
の絶対値が大きい画素を物体領域とし、そうでない画素
を背景領域とする。In step S13, when the number of processed frames reaches a predetermined number, the flow advances to step S14. Otherwise, the processing target advances to the next frame and the flow returns to step S11. In step S14, a pixel having a large absolute value of a difference between the image data and the background image is set as an object region, and a pixel other than that is set as a background region.

【０２７８】本実施形態においては、例えば撮像するカ
メラに動きがあると背景が動く。この場合は、前のフレ
ームからの背景全体の動き（グローバル動きベクトル）
を検出し、１スキャン目ではグローバル動きベクトルの
分だけ前のフレームからずらして背景メモリに記録し、
２スキャン目では、グローバル動きベクトルの分だけ前
のフレームからずらした部分を背景メモリから読み出
す。１スキャン目で検出したグローバル動きベクトルを
メモりに記録しておき、２スキャン目では、それを読み
出して用いれば、グローバル動きベクトルを求める時間
を節約できる。また、カメラが固定していることなどか
ら、背景が静止していることが既知の場合は、操作者が
スイッチを切り替えることなどによって、グローバル動
きベクトルの検出を行わないようにして、グローバル動
きベクトルは常にゼロにするようにすれば、処理時間は
さらに節約できる。グローバル動きベクトルを半画素精
度で求める時は、背景メモリは、入力される画像の縦横
とも２倍の画素密度とする。すなわち、入力画像の画素
値は１画素おきに背景メモリに書き込まれる。例えば次
のフレームでは背景が横方向に０．５画素動いていた場
合には、先に書き込まれた画素の間にやはり１画素おき
に画素値が書き込まれる。このようにすると、１スキャ
ン目が終了した時点で、背景画像に一度も書き込まれな
い画素ができることがある。その場合は、周囲の書き込
まれた画素から内挿してその隙間を埋める。In the present embodiment, for example, the background moves when the camera that captures the image moves. In this case, the motion of the entire background from the previous frame (global motion vector)
And in the first scan, it is shifted from the previous frame by the amount of the global motion vector and recorded in the background memory.
In the second scan, a portion shifted from the previous frame by the amount of the global motion vector is read from the background memory. If the global motion vector detected in the first scan is recorded in a memory, and read out and used in the second scan, the time for obtaining the global motion vector can be saved. If it is known that the background is stationary because the camera is fixed, the operator does not detect the global motion vector, for example, by switching a switch. If is always set to zero, processing time can be further saved. When obtaining a global motion vector with half-pixel accuracy, the background memory has a pixel density twice as high in both the vertical and horizontal directions of the input image. That is, the pixel values of the input image are written to the background memory every other pixel. For example, in the next frame, if the background moves by 0.5 pixel in the horizontal direction, the pixel value is written every other pixel between the previously written pixels. In this case, there may be a pixel which is never written to the background image at the time when the first scan is completed. In that case, the gap is filled by interpolation from surrounding written pixels.

【０２７９】また、半画素の動きベクトルを用いる／用
いないに関わらず、動画シーケンス全体を通じて、一度
も背景領域にならない部分は、１スキャン目を終わって
も背景メモリに画素値が代入されない。このような未定
義の部分は、２スキャン目では、常に物体と判定する。
これは、特に未定義の部分を記録するためのメモリを用
意して、未定義か否かをいちいち判定しなくても、背景
に希にしか出てこないと予想される画素値（Ｙ，Ｕ，
Ｖ）＝（０，０，０）などで、予め背景メモリを初期化
してから１スキャン目を開始すればよい。未定義の画素
にはこの初期画素値が残るので２スキャン目では、自動
的に物体と判定される。[0279] Regardless of whether a half-pixel motion vector is used or not, a pixel value that is never used as a background area throughout the entire moving image sequence is not substituted into the background memory even after the first scan. Such an undefined portion is always determined as an object in the second scan.
This is because a pixel value (Y, U) that is expected to appear only rarely in the background is prepared without preparing a memory for recording an undefined portion and determining each undefined portion. ,
V) = (0,0,0) or the like, the first scan may be started after the background memory is initialized in advance. Since the initial pixel value remains in the undefined pixel, the pixel is automatically determined as an object in the second scan.

【０２８０】これまでの説明では、背景メモリを生成す
る時に、既に背景の画素値が代入されている画素につい
ても、背景領域であれば、画素値が上書きされている。
この場合、動画シーケンスの最初の方でも最後の方にで
も背景である部分には、動画シーケンスの最後の方の背
景の画素値が背景メモリに記録される。動画シーケンス
の最初と最後でそういった背景が全く同じ画素値ならば
問題はないが、カメラが非常にゆっくりと動いたり、背
景の明るさが少しずつ変化するなどして、画素値がフレ
ーム間で微少に変動する場合には、動画シーケンスの最
初の方の背景と最後の方の背景とでは、画素値の差が大
きくなるので、この背景メモリを用いると、動画シーケ
ンスの最初の方のフレームで背景部分も物体と誤検出さ
れてしまう。そこで、その前までのフレームでは、一度
も背景領域にはならずに、現処理フレームで初めて背景
領域となった画素についてのみ背景メモリへの書き込み
を行い、既に背景の画素値が代入されている画素の上書
きはしないようにすれば、背景メモリには動画シーケン
スの最初の方の背景が記録されるので、正しく物体が抽
出される。そして、２スキャン目にも、その物体抽出結
果に応じて、現処理フレームの背景領域を背景メモリに
上書きするようにすれば、現処理フレームの直前のフレ
ームの背景と現処理フレームの背景という相関の高い背
景同士を比較することになり、その部分が物体と誤検出
されにくくなる。２スキャン目の上書きは、背景に微少
な変動がある場合に有効なので、操作者が背景の動きは
無しという意味にスイッチを切り替えるなどした場合
は、上書きは行わない。このスイッチは、先のグローバ
ル動きベクトルを行うか行わないかを切り替えるスイッ
チと共通でも構わない。In the above description, when the background memory is generated, the pixel value to which the background pixel value has already been assigned is overwritten by the pixel value in the background area.
In this case, the pixel value of the background at the end of the moving image sequence is recorded in the background memory in the portion that is the background at the beginning or the end of the moving image sequence. If the background is exactly the same pixel value at the beginning and end of the video sequence, there is no problem, but the pixel value may be very small between frames due to the camera moving very slowly or the background brightness changing little by little. , The difference in pixel value between the first background and the last background of the moving image sequence becomes large. The part is also erroneously detected as an object. Therefore, in the previous frames, the background area is not written at all, and only the pixels that have become the background area for the first time in the current processing frame are written to the background memory, and the background pixel value has already been assigned. If the pixel is not overwritten, the background is recorded in the background memory at the beginning of the moving image sequence, so that the object can be correctly extracted. Also, in the second scan, if the background area of the current processing frame is overwritten in the background memory according to the object extraction result, the correlation between the background of the frame immediately before the current processing frame and the background of the current processing frame is obtained. Are compared with each other, and that portion is less likely to be erroneously detected as an object. The overwriting in the second scan is effective when there is a slight change in the background. Therefore, if the operator switches the switch so that the background does not move, the overwriting is not performed. This switch may be common to the switch that switches between performing and not performing the previous global motion vector.

【０２８１】１スキャン目は背景画像を生成するのが目
的であるから、必ずしも全てのフレームを用いる必要は
ない。１フレームおき、２フレームおきなどとフレーム
を間引いても、ほぼ同じ背景画像が得られ、処理時間は
短くなる。Since the purpose of the first scan is to generate a background image, it is not always necessary to use all frames. Even if frames are thinned out every other frame, every other frame, etc., almost the same background image is obtained, and the processing time is shortened.

【０２８２】背景領域のうち、フレーム間差分がしきい
値以下の画素だけを背景メモリに記録するようにすれ
ば、画面に入り込んでくる他の物体が背景メモリに記録
されずに済む。また、１スキャン目の物体領域が実際よ
りも物体側に誤検出された場合、物体の画素値が背景メ
モリに記録されてしまう。そこで、背景領域でも物体領
域に近い画素は背景メモリに入力しないようにする。If only pixels having an inter-frame difference equal to or smaller than the threshold value in the background area are recorded in the background memory, other objects entering the screen do not need to be recorded in the background memory. Further, when the object area of the first scan is erroneously detected on the object side relative to the actual, the pixel value of the object is recorded in the background memory. Therefore, pixels in the background area that are close to the object area are not input to the background memory.

【０２８３】観光地で撮影した画像などで、前景の人な
どを除いた背景画像だけが必要な場合は、背景メモリに
記録された背景画像を外部に出力する。When only a background image excluding a person in the foreground and the like is required in an image photographed at a tourist spot or the like, the background image recorded in the background memory is output to the outside.

【０２８４】以上が本実施形態の第１の構成例の説明で
ある。本例によれば、動画シーケンスの初めの部分も最
後の部分と同様に高い抽出精度が得られる。また、物体
の動きが小さかったり、全く動かない場合でも正しく抽
出される。The above is the description of the first configuration example of the present embodiment. According to this example, a high extraction accuracy can be obtained in the first part of the moving image sequence as in the last part. Even if the movement of the object is small or does not move at all, it is correctly extracted.

【０２８５】次に、図２８を用いて、生成されたシェイ
プ画像２８０を修正する例を説明する。シェイプ画像２
８０が生成されるまでは、図２７と同じなので説明を省
略する。Next, an example of correcting the generated shape image 280 will be described with reference to FIG. Shape image 2
Until 80 is generated, it is the same as FIG.

【０２８６】シェイプ画像２８０は、背景パレットを用
いるエッジ補正回路２８４に入力される。また、テクス
チャ画像２７５が、背景パレットによるエッジ補正回路
２８４と縮小ブロックマッチングによるエッジ補正回路
２８５に入力される。エッジ補正回路２８４の詳細なブ
ロック図を図３１に示す。The shape image 280 is input to an edge correction circuit 284 using a background palette. The texture image 275 is input to an edge correction circuit 284 using a background palette and an edge correction circuit 285 using reduced block matching. A detailed block diagram of the edge correction circuit 284 is shown in FIG.

【０２８７】図３１において、シェイプ画像２８０は、
補正回路２８６に入力され、同じフレームのテクスチャ
画像２７５は比較回路２８７に入力される。背景パレッ
トを保持するメモリ２８８からは、背景色２８９が読み
出され、比較回路２８７に入力される。ここで、背景パ
レットは、背景部分に存在する輝度（Ｙ）と色差（Ｕ，
Ｖ）の組すなわちベクトルの集まり、（Ｙ１，Ｕ１，Ｖ１）（Ｙ２，Ｕ２，Ｖ２）（Ｙ３，Ｕ３，Ｖ３） …………………… のことで、予め用意される。具体的には、背景パレット
は、最初のフレームにおいて、背景領域に属する画素の
Ｙ，Ｕ，Ｖの組を集めたものである。ここで、例えば
Ｙ，Ｕ，Ｖが各々２５６通りの値をとるとすると、その
組み合わせは膨大な数になり、背景の（Ｙ，Ｕ，Ｖ）の
組み合わせ数も多くなり、後に説明する処理の演算量が
多くなってしまうので、所定のステップサイズでＹ，
Ｕ，Ｖの値を各々量子化することにより、組み合わせ数
を抑制できる。これは、量子化をしない場合は異なるベ
クトル値だったもの同士が、量子化により同じベクトル
値になる場合があるからである。In FIG. 31, the shape image 280 is
The texture image 275 of the same frame input to the correction circuit 286 is input to the comparison circuit 287. The background color 289 is read from the memory 288 holding the background palette, and is input to the comparison circuit 287. Here, the background palette includes the luminance (Y) and the color difference (U,
V), that is, a set of vectors, is prepared in advance as (Y1, U1, V1) (Y2, U2, V2) (Y3, U3, V3)... Specifically, the background palette is a collection of sets of pixels Y, U, and V belonging to the background area in the first frame. Here, for example, if Y, U, and V take 256 values, the number of combinations becomes enormous, and the number of combinations of (Y, U, V) in the background also increases. Since the amount of calculation increases, Y,
By quantizing the values of U and V, the number of combinations can be suppressed. This is because, when quantization is not performed, different vector values may have the same vector value due to quantization.

【０２８８】比較回路２８７では、テクスチャ画像２７
５の各画素のＹ，Ｕ，Ｖが量子化され、そのベクトルが
メモリ２８８から順次送られてくる背景パレットに登録
されているベクトル、すなわち背景色２８９のいずれか
と一致するかどうかが調べられる。画素毎に、その画素
の色が背景色かどうかの比較結果２９０が、比較回路２
８７から補正回路２８６へ送られる。補正回路２８６で
は、シェイプ画像２８０のある画素の画素値が“２５
５”（物体）であるにもかかわらず、比較結果２９０が
背景色であった場合に、その画素の画素値を“０”（背
景）に置き換えて、補正されたシェイプ画像２９１とし
て出力する。この処理により、シェイプ画像２８０にお
いて物体領域が背景領域にはみ出して誤抽出されていた
場合に、その背景領域を正しく分離できる。ただ、背景
と物体に共通の色があり、背景パレットに物体の色も混
じって登録されていると、物体のその色の部分もが背景
と判定されてしまう。そこで、最初のフレームでは、先
に説明したパレットを背景の仮のパレットとしておき、
同様の方法で最初のフレームの物体のパレットも作る。
次に、背景の仮のパレットの中で物体のパレットにも含
まれる色については、背景の仮のパレットから除き、残
ったものを背景パレットとする。これにより、物体の一
部が背景になってしまう不具合を回避できる。In the comparison circuit 287, the texture image 27
5, the Y, U, and V of each pixel are quantized, and it is checked whether the vector matches one of the vectors registered in the background palette sequentially transmitted from the memory 288, that is, one of the background colors 289. For each pixel, a comparison result 290 as to whether the color of the pixel is the background color is obtained by the comparison circuit 2
87 to the correction circuit 286. In the correction circuit 286, the pixel value of a certain pixel of the shape image 280 is “25”.
If the comparison result 290 is a background color despite the fact that it is 5 "(object), the pixel value of that pixel is replaced with" 0 "(background) and output as a corrected shape image 291. By this processing, when the object region protrudes into the background region in the shape image 280 and is erroneously extracted, the background region can be correctly separated, but the background and the object have a common color, and the background palette has the color of the object. If it is also registered, the color part of the object is also determined as the background, so in the first frame, the palette described above is set as a temporary palette for the background,
Create a palette of objects for the first frame in the same way.
Next, the colors included in the palette of the object in the temporary palette of the background are excluded from the temporary palette of the background, and the remaining colors are used as the background palette. Thus, a problem that a part of the object becomes the background can be avoided.

【０２８９】また、最初のフレームで与えられるシェイ
プ画像に誤差がある場合を考慮し、シェイプ画像のエッ
ジの近傍の画素は、パレットの生成に用いないようにし
ても良い。また、各ベクトルの出現頻度を数え、頻度が
所定値以下のベクトルはパレットに登録しないようにし
ても良い。量子化ステップサイズを小さくしすぎると、
処理時間が多くなったり、背景色に非常に似た色でもベ
クトル値がわずかに異なるために背景と判定されなかっ
たりし、逆に量子化ステップサイズを大きくしすぎる
と、背景と物体に共通するベクトルばかりになってしま
う。そこで、最初のフレームに対して、いくつかの量子
化ステップサイズを試し、与えられるシェイプ画像の様
に背景色と物体色が分離される量子化ステップサイズが
選ばれる。In consideration of a case where there is an error in the shape image given in the first frame, pixels near the edge of the shape image may not be used for generating a palette. Alternatively, the frequency of appearance of each vector may be counted, and a vector whose frequency is equal to or less than a predetermined value may not be registered in the palette. If you make the quantization step size too small,
If the processing time is long, or even a color very similar to the background color is not determined as the background because the vector value is slightly different, conversely if the quantization step size is too large, it will be common to the background and the object It's just a vector. Therefore, several quantization step sizes are tried for the first frame, and a quantization step size at which a background color and an object color are separated like a given shape image is selected.

【０２９０】また、途中から新しい色が背景や物体に現
れることがあるので、途中のフレームで背景パレットを
作りなおしても良い。Also, since a new color may appear on the background or object in the middle, the background palette may be recreated with the middle frame.

【０２９１】図２８に戻り、シェイプ画像２９１は、エ
ッジ補正回路２８５に入力される。エッジ補正回路２８
５は、先に説明した図３０の回路において、シェイプ画
像２４９をシェイプ画像２９１、テクスチャ画像２５１
をテクスチャ画像２７５とする回路と同じであるので、
説明は省略するが、シェイプ画像のエッジがテクスチャ
画像のエッジに合うようにシェイプ画像の補正を行う。
補正されたシェイプ画像２９２はスイッチ部２８１に送
られる。スイッチ部２８１からは、シェイプ画像２９２
とシェイプ画像２６０のうちから選択されたシェイプ画
像２９３が出力される。Returning to FIG. 28, the shape image 291 is input to the edge correction circuit 285. Edge correction circuit 28
5 is a circuit in which the shape image 249 is changed to the shape image 291 and the texture image 251 in the circuit of FIG.
Is the same as the circuit for setting the texture image 275 to
Although the description is omitted, the shape image is corrected so that the edge of the shape image matches the edge of the texture image.
The corrected shape image 292 is sent to the switch unit 281. From the switch unit 281, the shape image 292
And the shape image 293 selected from the shape images 260 is output.

【０２９２】本例では、エッジ補正回路を物体抽出回路
２７９の後段に設けたが、物体抽出回路２２４の後段に
設ければ、シェイプ画像２６０の精度を向上できる。In this example, the edge correction circuit is provided after the object extraction circuit 279. However, if it is provided after the object extraction circuit 224, the accuracy of the shape image 260 can be improved.

【０２９３】また、エッジ補正によって、抽出精度がか
えって悪化する場合も希にある。そういう時に、悪化し
たシェイプ画像２９２が出力されてしまわないように、
図２８において、シェイプ画像２８０やシェイプ画像２
９１もスイッチ部２８１に入力すれば、エッジ補正を行
わないシェイプ画像２８０や、背景パレットによるエッ
ジ補正だけを施したシェイプ画像２９１を選択すること
も可能となる。There is also a rare case where the extraction accuracy is rather deteriorated by the edge correction. At that time, in order not to output the deteriorated shape image 292,
In FIG. 28, shape image 280 and shape image 2
If 91 is also input to the switch unit 281, it is also possible to select a shape image 280 without edge correction or a shape image 291 with only edge correction using the background palette.

【０２９４】図４４は、背景パレットに登録された背景
色の画素をクロスハッチで示しており、先に図３０や図
２９を用いて説明した相似ブロックの探索の時に、図４
４の情報を用いると、輪郭抽出の精度をさらに高めるこ
とができる。背景に図柄がある場合に、物体と背景のエ
ッジではなく、背景の図柄のエッジに沿うように相似ブ
ロックが選ばれてしまうことがある。このような場合、
ブロックと相似ブロックを縮小したブロックとの誤差を
求める時に、対応画素がいずれも背景色同士の時は、そ
の画素の誤差は、計算に含めないようにすると、背景の
図柄のエッジがずれていても、誤差が発生せず、従っ
て、物体と背景のエッジが合うように相似ブロックが正
しく選択される。FIG. 44 shows pixels of the background color registered in the background palette by cross hatching. When searching for similar blocks described earlier with reference to FIGS. 30 and 29, FIG.
By using the information of No. 4, the accuracy of contour extraction can be further improved. When there is a pattern in the background, a similar block may be selected not along the edge of the object and the background but along the edge of the background pattern. In such a case,
When calculating the error between the block and the block obtained by reducing the similar block, if any of the corresponding pixels have the same background color, if the error of the pixel is not included in the calculation, the edge of the background pattern is shifted. Also, no error occurs, and therefore the similar blocks are correctly selected such that the edges of the object and the background match.

【０２９５】図３２は、本実施形態の物体抽出装置２９
４を組み込んだ画像合成装置の例である。テクスチャ画
像２９５はスイッチ部２９６と物体抽出装置２９４に入
力され、最初のフレームのシェイプ画像２１００は物体
抽出装置２９４に入力される。物体抽出装置２９４は、
図２７や図２８で構成されており、各フレームのシェイ
プ画像２９７が生成され、スイッチ部２９６に送られ
る。一方、記録回路２９８には、予め合成用背景画像２
９９が保持されており、現処理フレームの背景画像２９
９が記録回路２９８から読み出され、スイッチ部２９６
に送られる。スイッチ部２９６では、シェイプ画像の画
素値が“２５５”（物体）の画素ではテクスチャ画像２
９５が選択されて合成画像２１０１として出力され、シ
ェイプ画像の画素値が“０”（背景）の画素では、背景
画像２９９が選択されて合成画像２１０１として出力さ
れる。これにより、背景画像２９９の前景にテクスチャ
画像２９５内の物体を合成した画像が生成される。FIG. 32 shows an object extracting device 29 of this embodiment.
4 is an example of an image synthesizing apparatus in which the image synthesizing apparatus 4 is incorporated. The texture image 295 is input to the switch unit 296 and the object extraction device 294, and the shape image 2100 of the first frame is input to the object extraction device 294. The object extraction device 294
27 and 28. A shape image 297 of each frame is generated and sent to the switch unit 296. On the other hand, the recording circuit 298 stores the background image 2 for synthesis in advance.
99 is held, and the background image 29 of the current processing frame is stored.
9 is read from the recording circuit 298 and the switch unit 296 is read.
Sent to In the switch unit 296, the texture image 2 is set for a pixel whose pixel value of the shape image is “255” (object).
95 is selected and output as the composite image 2101, and the background image 299 is selected and output as the composite image 2101 for the pixel whose pixel value of the shape image is “0” (background). As a result, an image in which the object in the texture image 295 is synthesized with the foreground of the background image 299 is generated.

【０２９６】図３３はエッジ補正を行う別の例を示す。
図３３のように設定されたブロックのうちの一つが、図
３３のブロック２１０２であるとする。輪郭線を境界に
して、ブロックは物体領域と背景領域に分けられてい
る。この輪郭を左右方向にずらして得られたブロックが
２１０３，２１０４，２１０５，２１０６である。それ
ぞれずらす幅と向きが異なる。文献：福井「領域間の分
離度に基づく物体輪郭抽出」（電子情報通信学会論文
誌、Ｄ−II、Ｖｏｌ．Ｊ８０−Ｄ−II、Ｎｏ．６、ｐ
ｐ．１４０６−１４１４、１９９７年６月）の１４０８
ページに記述されている分離度を各々の輪郭線について
求め、ブロック２１０２〜２１０６のうちで分離度が最
も高い輪郭線を採用する。これにより、シェイプ画像の
輪郭がテクスチャ画像のエッジに合う。FIG. 33 shows another example of performing edge correction.
It is assumed that one of the blocks set as shown in FIG. 33 is a block 2102 in FIG. The block is divided into an object area and a background area with the outline as a boundary. Blocks 2103, 2104, 2105 and 2106 are obtained by shifting the outline in the left-right direction. The shifting width and direction are different. Literature: Fukui “Object Contour Extraction Based on Degree of Separation between Regions” (Transactions of the Institute of Electronics, Information and Communication Engineers, D-II, Vol. J80-D-II, No. 6, p.
p. 1406-1414, June 1997) 1408
The degree of separation described on the page is determined for each contour, and the contour having the highest degree of separation among the blocks 2102 to 2106 is adopted. Thereby, the contour of the shape image matches the edge of the texture image.

【０２９７】以上述べてきたように、本第４実施形態に
よれば、動画シーケンスの初めの部分も最後の部分と同
様に高い抽出精度が得られる。また、物体の動きが小さ
かったり、全く動かない場合でも正しく抽出される。さ
らに、現処理ブロックよりも大きい相似ブロックのシェ
イプデータを縮小して張り付けることにより、シェイプ
データで与えられる物体領域の輪郭線がずれていてもそ
れを正しい位置に補正することが可能となり、物体領域
の輪郭を大まかになぞったものをシェイプデータとして
与えるだけで、以降の入力フレーム全てにおいて物体領
域を高い精度で抽出することが可能となる。As described above, according to the fourth embodiment, a high extraction accuracy can be obtained in the first part of the moving image sequence as in the last part. Even if the movement of the object is small or does not move at all, it is correctly extracted. Furthermore, by shrinking and pasting the shape data of a similar block larger than the current processing block, even if the contour line of the object area given by the shape data is shifted, it can be corrected to the correct position, By simply giving a rough outline of the region as shape data, the object region can be extracted with high accuracy in all subsequent input frames.

【０２９８】なお、以上の第１乃至第４実施形態は適宜
組み合わせて利用することもできる。また、第１乃至第
４実施形態の物体抽出方法の手順はすべてソフトウェア
によって実現することもでき、この場合には、その手順
を実行するコンピュータプログラムを記録媒体を介して
通常のコンピュータに導入するだけで、第１乃至第４実
施形態と同様の効果を得ることができる。Note that the above first to fourth embodiments can be used in appropriate combinations. Further, all the procedures of the object extraction method of the first to fourth embodiments can be realized by software. In this case, a computer program for executing the procedure is simply introduced into a normal computer via a recording medium. Thus, the same effect as in the first to fourth embodiments can be obtained.

【０２９９】[0299]

【発明の効果】以上のように、本発明によれば、目的と
する物体を囲む図形を用いてその物体を追跡することに
より、目的の物体以外の周囲の余分な動きに影響を受け
ずに、目的の物体を精度良く抽出・追跡することが可能
となる。As described above, according to the present invention, by tracking a target object using a figure surrounding the target object, the target object is not affected by extra movement around the target object. In addition, it is possible to accurately extract and track a target object.

【０３００】また、入力画像によらずに高い抽出精度を
得ることが可能となる。さらに、動画シーケンスの初め
の部分も最後の部分と同様に高い抽出精度が得られる。
また、物体の動きが小さかったり、全く動かない場合で
も正しく抽出される。[0300] Also, high extraction accuracy can be obtained regardless of the input image. Further, a high extraction accuracy can be obtained in the first part of the moving image sequence as in the last part.
Even if the movement of the object is small or does not move at all, it is correctly extracted.

[Brief description of the drawings]

【図１】本発明の第１実施形態に係る動画像の物体追跡
／抽出装置の基本構成を示すブロック図。FIG. 1 is a block diagram showing a basic configuration of a moving image object tracking / extracting apparatus according to a first embodiment of the present invention.

【図２】同実施形態の物体追跡／抽出装置の第１の構成
例を示すブロック図。FIG. 2 is an exemplary block diagram showing a first configuration example of the object tracking / extracting apparatus according to the embodiment;

【図３】同実施形態の物体追跡／抽出装置の第２の構成
例を示すブロック図。FIG. 3 is an exemplary block diagram showing a second configuration example of the object tracking / extracting apparatus according to the embodiment;

【図４】同実施形態の物体追跡／抽出装置に設けられた
背景領域決定部の具体的な構成の一例を示すブロック
図。FIG. 4 is an exemplary block diagram showing an example of a specific configuration of a background area determining unit provided in the object tracking / extracting apparatus of the embodiment.

【図５】同実施形態の物体追跡／抽出装置に設けられた
図形設定部の具体的な構成の一例を示すブロック図。FIG. 5 is an exemplary block diagram showing an example of a specific configuration of a graphic setting unit provided in the object tracking / extracting apparatus of the embodiment.

【図６】同実施形態の物体追跡／抽出装置に設けられた
背景動き削除部の具体的な構成の一例を示すブロック
図。FIG. 6 is an exemplary block diagram showing an example of a specific configuration of a background motion deleting unit provided in the object tracking / extracting apparatus of the embodiment.

【図７】同実施形態の物体追跡／抽出装置に設けられた
背景動き削除部で使用される代表背景領域の一例を示す
図。FIG. 7 is an exemplary view showing an example of a representative background area used in a background motion deletion unit provided in the object tracking / extracting apparatus of the embodiment.

【図８】同実施形態の物体追跡／抽出装置の動作を説明
するための図。FIG. 8 is an exemplary view for explaining the operation of the object tracking / extracting device of the embodiment.

【図９】本発明の第２実施形態に係る第１の動画像物体
追跡／抽出装置を表すブロック図。FIG. 9 is a block diagram showing a first moving image object tracking / extracting device according to a second embodiment of the present invention.

【図１０】同第２実施形態に係る第２の動画像物体追跡
／抽出装置を表すブロック図。FIG. 10 is a block diagram showing a second moving image object tracking / extracting device according to the second embodiment.

【図１１】同第２実施形態に係る第３の動画像の物体追
跡／抽出装置を表すブロック図。FIG. 11 is a block diagram showing a third moving image object tracking / extracting apparatus according to the second embodiment.

【図１２】同第２実施形態に係る第４の動画像の物体追
跡／抽出装置を表すブロック図。FIG. 12 is a block diagram showing a fourth moving image object tracking / extracting apparatus according to the second embodiment.

【図１３】同第２実施形態の物体追跡／抽出装置で用い
られる物体予測の方法を説明するための図。FIG. 13 is a view for explaining an object prediction method used in the object tracking / extracting apparatus according to the second embodiment.

【図１４】同第２実施形態の物体追跡／抽出装置で用い
られる参照フレーム選択方法を説明するための図。FIG. 14 is an exemplary view for explaining a reference frame selection method used in the object tracking / extracting apparatus according to the second embodiment.

【図１５】同第２実施形態の物体追跡／抽出装置におい
て第１の物体追跡／抽出部と第２の物体抽出部を切り替
えて物体を抽出した結果の例を表す図。FIG. 15 is a diagram showing an example of a result of extracting an object by switching between a first object tracking / extracting unit and a second object extracting unit in the object tracking / extracting apparatus of the second embodiment.

【図１６】同第２実施形態の物体追跡／抽出装置を用い
た動画像の物体追跡／抽出処理の流れを説明する図。FIG. 16 is an exemplary view for explaining the flow of object tracking / extraction processing of a moving image using the object tracking / extracting apparatus of the second embodiment.

【図１７】本発明の第３実施形態に係る第１の動画像物
体追跡／抽出装置を表すブロック図。FIG. 17 is a block diagram illustrating a first moving image object tracking / extracting apparatus according to a third embodiment of the present invention.

【図１８】同第３実施形態に係る第２の動画像物体追跡
／抽出装置を表すブロック図。FIG. 18 is a block diagram showing a second moving image object tracking / extracting device according to the third embodiment.

【図１９】同第３実施形態に係る第３の動画像物体追跡
／抽出装置を表すブロック図。FIG. 19 is a block diagram showing a third moving image object tracking / extracting device according to the third embodiment.

【図２０】同第３実施形態に係る第５の動画像物体追跡
／抽出装置を表すブロック図。FIG. 20 is a block diagram showing a fifth moving image object tracking / extracting device according to the third embodiment.

【図２１】同第３実施形態に係る第６の動画像物体追跡
／抽出装置を表すブロック図。FIG. 21 is a block diagram showing a sixth moving image object tracking / extracting device according to the third embodiment.

【図２２】同第３実施形態に係る第４の動画像物体追跡
／抽出装置を表すブロック図。FIG. 22 is a block diagram showing a fourth moving image object tracking / extracting device according to the third embodiment.

【図２３】同第３実施形態に係る動画像物体追跡／抽出
装置の他の構成例を表すブロック図。FIG. 23 is a block diagram showing another configuration example of the moving image object tracking / extracting apparatus according to the third embodiment.

【図２４】同第３実施形態に係る動画像物体追跡／抽出
装置のさらに他の構成例を表すブロック図。FIG. 24 is a block diagram showing still another configuration example of the moving image object tracking / extracting device according to the third embodiment.

【図２５】同第３実施形態に係る動画像物体追跡／抽出
装置に適用されるフレーム順序制御による抽出フレーム
順の例を説明する図。FIG. 25 is an exemplary view for explaining an example of an extracted frame order by frame order control applied to the moving image object tracking / extracting apparatus according to the third embodiment.

【図２６】同第３実施形態に係る動画像物体追跡／抽出
装置の応用例を表す図。FIG. 26 is a diagram showing an application example of the moving image object tracking / extracting device according to the third embodiment.

【図２７】本発明の第４実施形態に係る物体抽出装置を
示すブロック図。FIG. 27 is a block diagram showing an object extraction device according to a fourth embodiment of the present invention.

【図２８】同第４実施形態に係る物体抽出装置にエッジ
補正処理を適用した場合の構成例を示すブロック図。FIG. 28 is a block diagram showing a configuration example in a case where edge correction processing is applied to the object extraction device according to the fourth embodiment.

【図２９】同第４実施形態に係る物体抽出装置に適用さ
れる動き補償部の構成例を示すブロック図。FIG. 29 is a block diagram showing a configuration example of a motion compensation unit applied to the object extraction device according to the fourth embodiment.

【図３０】同第４実施形態に係る物体抽出装置に適用さ
れる縮小ブロックマッチングによる物体抽出部の構成例
を示すブロック図。FIG. 30 is a block diagram showing an example of the configuration of an object extraction unit based on reduced block matching applied to the object extraction device according to the fourth embodiment.

【図３１】同第４実施形態に係る物体抽出装置で使用さ
れる背景パレットによるエッジ補正回路を示す図。FIG. 31 is a diagram showing an edge correction circuit based on a background pallet used in the object extraction device according to the fourth embodiment.

【図３２】同第４実施形態に係る物体抽出装置に適用さ
れる画像合成装置を示す図。FIG. 32 is a diagram showing an image synthesizing device applied to the object extracting device according to the fourth embodiment.

【図３３】同第４実施形態に係る物体抽出装置で使用さ
れる分離度を用いたエッジ補正の原理を説明する図。FIG. 33 is a view for explaining the principle of edge correction using the degree of separation used in the object extraction device according to the fourth embodiment.

【図３４】同第４実施形態に係る物体抽出装置で処理さ
れる処理画像全体を示す図。FIG. 34 is a view showing the entire processed image processed by the object extracting apparatus according to the fourth embodiment.

【図３５】同第４実施形態で用いられる操作者によって
描かれた輪郭線を示す図。FIG. 35 is a view showing an outline drawn by an operator used in the fourth embodiment.

【図３６】同第４実施形態で用いられるブロック設定
（１スキャン目）の様子を示す図。FIG. 36 is a view showing a state of block setting (first scan) used in the fourth embodiment.

【図３７】同第４実施形態で用いられるブロック設定
（２スキャン目）の様子を示す図。FIG. 37 is a view showing a state of block setting (second scan) used in the fourth embodiment.

【図３８】同第４実施形態で用いられる相似ブロックを
説明するための図。FIG. 38 is a view for explaining similar blocks used in the fourth embodiment.

【図３９】同第４実施形態で用いられる相似ブロックの
探索範囲を説明するための図。FIG. 39 is an exemplary view for explaining a similar block search range used in the fourth embodiment;

【図４０】同第４実施形態で用いられる相似ブロックの
探索範囲の別の例を説明するための図。FIG. 40 is an exemplary view for explaining another example of a similar block search range used in the fourth embodiment;

【図４１】同第４実施形態で用いられるシェイプ画像の
置き換え変換前の様子を示す図。FIG. 41 is an exemplary view showing a state before replacement conversion of a shape image used in the fourth embodiment;

【図４２】同第４実施形態で用いられるシェイプ画像の
置き換え変換後の様子を示す図。FIG. 42 is a view showing a state after replacement conversion of a shape image used in the fourth embodiment.

【図４３】同第４実施形態において抽出された輪郭線を
示す図。FIG. 43 is a view showing an outline extracted in the fourth embodiment.

【図４４】同第４実施形態において抽出された背景色の
部分を表す図。FIG. 44 is a diagram showing a background color portion extracted in the fourth embodiment.

【図４５】同第４実施形態で使用される動き補償を説明
するための図。FIG. 45 is a view for explaining motion compensation used in the fourth embodiment;

【図４６】同第４実施形態で使用される背景画像を用い
た物体抽出方法のフローチャート。FIG. 46 is a flowchart of an object extraction method using a background image used in the fourth embodiment.

【図４７】同第４実施形態で使用される動き補償による
物体抽出方法のフローチャート。FIG. 47 is a flowchart of an object extraction method by motion compensation used in the fourth embodiment.

【図４８】同第４実施形態で使用されるフレーム内の縮
小ブロックマッチングによる物体抽出方法のフローチャ
ート。FIG. 48 is a flowchart of an object extraction method using reduced block matching in a frame used in the fourth embodiment.

【図４９】同第４実施形態で用いられるブロック設定の
他の例を示す図。FIG. 49 is a view showing another example of block setting used in the fourth embodiment.

【図５０】エッジ補正を説明するためのフローチャート
図。FIG. 50 is a flowchart for explaining edge correction;

【図５１】ブロック設定の例を示す図。FIG. 51 is a diagram showing an example of block setting.

【図５２】物体領域の輪郭線を探索する過程を示す図。FIG. 52 is a view showing a process of searching for a contour line of an object area.

【図５３】ブロックサイズを次第に小さくする方法を説
明するフローチャート図FIG. 53 is a flowchart illustrating a method of gradually reducing the block size.

【図５４】ブロック設定の他の例を示す図。FIG. 54 is a view showing another example of block setting.

[Explanation of symbols]

１…初期図形設定部２…物体追跡・抽出部１１…図形設定部１２…背景領域決定部１３…物体抽出部２１…背景動き削除部２２…図形設定部２３…背景領域決定部２４…物体抽出部３１…変化量検出部３２…代表領域決定部３３…背景変化量決定部３４…代表領域の背景決定部３５…背景領域決定部３６…形状予測部３７…静止物体領域決定部４１…分離部４２…動き検出部４３…分割判定部４４…図形決定部５１…背景代表領域設定部５２…動き検出部５３…動き補償部６１…図形設定部６２…複数の物体追跡・抽出部７０…図形設定部７１…第２の物体追跡・抽出部７２…参照フレーム選択部７３…第１の物体追跡・抽出部１１１…特徴量抽出部１４１…フレーム順序制御部２２４…物体抽出部２７９…物体抽出部２８４…エッジ補正部２８５…エッジ補正部 DESCRIPTION OF SYMBOLS 1 ... Initial figure setting part 2 ... Object tracking / extraction part 11 ... Figure setting part 12 ... Background area determination part 13 ... Object extraction part 21 ... Background motion deletion part 22 ... Figure setting part 23 ... Background area determination part 24 ... Object extraction Unit 31 Change amount detection unit 32 Representative region determination unit 33 Background change amount determination unit 34 Background determination unit of representative region 35 Background region determination unit 36 Shape prediction unit 37 Static object region determination unit 41 Separation unit Reference numeral 42: motion detection unit 43: division determination unit 44: graphic determination unit 51: background representative area setting unit 52: motion detection unit 53: motion compensation unit 61: graphic setting unit 62: multiple object tracking / extraction units 70: graphic setting Unit 71: second object tracking / extracting unit 72: reference frame selecting unit 73: first object tracking / extracting unit 111: feature amount extracting unit 141: frame order control unit 224: object extracting unit 279: object extraction 284 ... edge correction unit 285 ... edge correction unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者井田孝神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内Ｆターム(参考） 5L096 BA18 FA18 FA34 FA46 GA19 GA51 HA03 HA08 JA03 ──────────────────────────────────────────────────続き Continuing on the front page (72) Takashi Ida 1-term, Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa F-term in Toshiba R & D Center (reference) 5L096 BA18 FA18 FA34 FA46 GA19 GA51 HA03 HA08 JA03

Claims

[Claims]

1. A method according to claim 1, wherein the current frame includes a target object extracted from a moving image signal and a first reference frame temporally different from the current frame. Determining a first background region common to the frames, and determining, based on a difference between the current frame and a second reference frame temporally different from the current frame, the current frame and the second reference frame; Background area determination means for determining a common second background area; and, in the image of the current frame, an area that does not belong to either the first background area or the second background area is set as an object area Means for extracting a moving image, comprising:

2. A pixel having a small difference between the current frame and the reference frame uses a given object shape of the reference frame, and when the pixel of the reference frame belongs to an object area, 2. The object extracting apparatus according to claim 1, further comprising: a stationary object determining unit that determines a pixel as an object region and, when a pixel of the reference frame belongs to a background region, determines a pixel of the current frame as a background region.

3. The object shape given in advance is used when the object shape of the reference frame has already been extracted, and when the object region has not been extracted yet in the object shape of the reference frame. 3. The object extraction method according to claim 2, further comprising: generating a shape of the object of the reference frame from the frame from which the object shape has already been extracted by a block matching method, and using the thus generated shape. apparatus.

4. The motion of the background of the reference frame or the current frame is corrected such that the motion of the background is relatively zero between each of the first and second reference frames and the current frame. 2. The apparatus according to claim 1, further comprising a background correction unit.
An object extraction device according to claim 1.

5. The object extracting apparatus according to claim 1, wherein the background area determining means includes means for determining the common background area using a predetermined threshold value.

6. The predetermined threshold value is obtained by measuring a magnitude of a difference between the current frames, and setting a large threshold value when the difference is large, and setting a threshold value when the difference is small. 6. The object extracting apparatus according to claim 5, further comprising: means for setting the size to be small.

7. The predetermined threshold value is obtained by dividing the current frame into a plurality of regions, measuring the magnitude of a difference in each of the divided regions, and setting a large threshold value when the difference is large. 6. The object extracting apparatus according to claim 5, further comprising means for setting a small threshold value when the difference is small.

8. From a frame in which an object region has already been extracted,
Prediction means for predicting the position or shape of the object on the current frame; and the first and the second regions to be used by the background area determination means based on the position or shape of the object on the current frame predicted by the prediction means. 2. The object extracting apparatus according to claim 1, further comprising: means for selecting a second reference frame.

9. An initial figure setting means for setting a figure surrounding the target object in an initial frame of the moving image signal, for each input frame of the moving image signal, Graphic setting means for setting, in the input frame, a graphic enclosing an area on the input frame corresponding to the graphic image in the reference frame, based on the correlation with the graphic image in the reference frame that is temporally different. 2. The method according to claim 1, further comprising: extracting an area that does not belong to any of the first background area and the second background area in the in-graphic image as an object area. An object extraction device according to claim 1.

10. The object extracting apparatus according to claim 9, wherein the initial figure setting means sets a figure surrounding the target moving object based on an external input.

11. An initial graphic setting means for setting a graphic surrounding a target object in an initial frame of a moving image signal, and for each input frame of the moving image signal, an input frame and a temporal relation to the input frame. A graphic setting means for setting, in the input frame, a graphic surrounding an area on the input frame corresponding to the graphic image in the reference frame, based on a correlation with the graphic image in a different reference frame. A first background area common to the current frame and the first reference frame is determined based on a difference between a current frame from which an object is to be extracted and a first reference frame temporally different from the current frame. Then, based on a difference between the current frame and a second reference frame temporally different from the current frame, a common And a background area determining means for determining a second background region, wherein in the figure in the image of the current frame, a region which does not belong to either of the said first background region second background region,
First object extraction means including means for extracting the object area as an object area; and a method different from the first object extraction means. An object extraction device comprising: a second object extraction unit to be extracted; and a unit for selectively switching between the first and second object extraction units.

12. From a current frame from which an object is to be extracted,
The image processing apparatus further includes means for extracting a feature amount of an image for at least a part of the area, wherein the switching means selectively selects the first and second object extracting means based on the extracted feature amount. The object extraction device according to claim 11, which switches.

13. A prediction method for predicting a position or a shape of an object on a current frame from which an object is to be extracted from a reference frame using a frame from which an object region has been extracted as a reference frame. The object extracting apparatus according to claim 11, further comprising means for extracting an object region from a current frame based on the prediction result.

14. When the prediction error by the second object extraction means is within a predetermined range, the result of the extraction by the second object extraction means is used as an object area, and the prediction error exceeds the predetermined range. In this case, the first and second object extracting means are selectively operated in block units within a frame based on the prediction error amount so that the result of the extraction by the first object extracting means is used as an object area. The object extraction device according to claim 13, which is used by switching.

15. The second object extracting means performs inter-frame prediction in an order different from the input frame order such that a frame interval between a reference frame and a current frame from which an object is to be extracted is at least a predetermined frame. The object extraction device according to claim 11, which performs the operation.

16. By reading out the moving image data from a storage device in which the moving image data is recorded, and performing motion compensation on shape data representing an object area on a predetermined frame in a plurality of frames constituting the moving image data. Means for generating shape data for each frame constituting the read moving image data; and by sequentially overwriting image data of a background area of each frame determined by the generated shape data on a background memory. Means for generating a background image of the moving image data; and reading the moving image data again from a storage device in which the moving image data is recorded, and the background memory for each frame constituting the read moving image data. The difference between the corresponding pixel of the background image accumulated above and the absolute value of the difference is calculated as a predetermined value. Means for determining the larger pixels than threshold as the object region, the constructed object extracting apparatus.

17. A means for selectively outputting, as an object extraction result, one of an object region determined based on shape data of each frame and an object region determined based on an absolute value of a difference from the background image. The object extraction device according to claim 16, further comprising:

18. An image input means for inputting moving image data and shape data representing an object area on a predetermined frame in a plurality of frames constituting the moving image data, and a current processing frame is divided into a plurality of blocks. Means for dividing, for each of the blocks, means for searching for a similar block in which the pattern of the image data is similar and the area is larger than the current processing block from the reference frame, and converting the shape data of the similar block from the reference frame. An object extraction device comprising: means for attaching cut-out and reduced shape data to each block of the current processing frame; and means for outputting the attached shape data as shape data of the current processing frame.

19. An image input means for inputting image data and shape data representing an object region of the image, a block is set in a contour portion of the shape data, and a pattern of the image data is set for each block. Means for searching the same image for similar blocks that are similar and larger than the block; means for replacing the shape data of each block with shape data obtained by reducing the shape data of each similar block; And a means for repeating the replacement by a predetermined number of times, and outputting the corrected shape data as corrected shape data.

20. An apparatus according to claim 19, further comprising an iterative means for performing the search for the similar block and replacing the shape data a predetermined number of times a plurality of times, wherein the block size is smaller at the end than at the beginning of the iteration. 20. The object extraction device according to 19.