JP4201958B2

JP4201958B2 - Moving image object extraction device

Info

Publication number: JP4201958B2
Application number: JP2000125577A
Authority: JP
Inventors: 俊彦三須; 保明金次; 源曽根原; 慎一境田; 文涛鄭
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2000-04-26
Filing date: 2000-04-26
Publication date: 2008-12-24
Anticipated expiration: 2020-04-26
Also published as: JP2001307104A

Description

【０００１】
【発明の属する技術分野】
本発明は、オブジェクトベース画像符号化の際にオブジェクトを抽出する動画像のオブジェクト抽出装置に関するものである。
【０００２】
【従来の技術】
従来、オブジェクトを抽出するに当たり、クロマキーにおける背景のように特別の仕掛けを必要とするものや、レンジファインダのような特別のセンサを必要とするものが実用化されている。
【０００３】
一方、通常のカメラなどによって取得された自然画像からオブジェクトを切り出す技術のうちの代表的なものとして、次のものがある。
１．輝度・色情報やベクトル場に基づいて画像を領域分割する手法
２．輝度値の時間方向の統計情報を用いて背景と前景とを互いに分離する手法
３．人為的に大体の位置・輪郭を付与してエッジ情報に基づいて切り出す手法
【０００４】
輝度・色情報や動きベクトル場に基づく領域分割手法は、色などの類似及び空間的な距離を考慮し、色などが類似した空間的に近接する画素を統合して領域分割する方法である。代表的なアルゴリズムとして、Ｋ平均アルゴリズム(S.Z.Selim等, “K-means-type algorithms," IEEE Trans. PAMI. Vol.6,No.1, pp.81-87(1984))や、領域成長法(S.W.Zucker, “Region growing: Childhood and adolescence," Computer Graphics and Image Processing, Vol.5, pp.382-399(1976))がある。
【０００５】
輝度値の時間方向の統計情報を用いる手法は、輝度の時間変化から背景の輝度値を推定し、その背景輝度と入力フレームとの差分から前景オブジェクトを抽出する手法である。背景が固定されている場合などに適用することができ、背景画像と前景の両者が得られるという特徴を有する（境田等による「背景差分による動オブジェクト抽出手法の検討」。1999年電子情報通信学会総合大会）。
【０００６】
エッジ情報に基づく手法としては、エネルギー最小化問題を反復的に解くことによって輪郭線を画像のエッジへ収束させる動的輪郭モデルsnakes(Kass 等,“Snakes: Active Contour Models," Proceedings of First International Conference on Computer Vision, pp.259-269(London UK, 1987))が代表的なものである。
【０００７】
snakesの制御に動的計画法を用いる手法(Amir 等,“Using Dynamic Programming for solving Variational Problems in Vision," IEEE Trans. PAMI, Vol.12, No.9, pp.855-867(1990))、greedyアルゴリズムによる高速化手法(Williams 等,“A Fast Algorithm for Active Contours and Curvature Estimation,", CVGIP: Image Understanding, Vol.55, No.1, pp.14-26(1992))等がある。
【０００８】
輝度のLaplacian のゼロクロス、輝度勾配、輝度値等を統合的に取り扱い、Dijkstraの最小コスト経路計画のアルゴリズムによって輪郭を追跡してオブジェクトを切り出す手法も提案されており、種々のシーンにおいて安定した切り出しが可能となる(Mortensen等,“Interactive Segmentation with Intelligent Scissors," Graphical Models and Image Processing, Vol.60, pp.349-384(1998)) 。
【０００９】
抽出対象を限定することによって抽出を安定化するアプローチとして、顔輪郭抽出の研究がある（横山等、「顔輪郭抽出のための動的輪郭モデルの提案」情処学論, Vol.40, No.3, pp1127-1137(1999) ）。
【００１０】
また、輪郭エッジを明示的に取り扱う代わりに画像から抽出された自己相似構造に基づいて局所反復写像系(Local Iterated Function Systems) を適用してオブジェクト領域を取得する手法によれば、滑らかな輪郭だけでなく尖った部分も正確に切り出すことができる（井田等「LIFSを用いた被写体輪郭線の高精度な抽出」信学論D-II, Vol.J82-D-II, No.8, pp1282-1289(1999) ）。
【００１１】
【発明が解決しようとする課題】
動画像符号化において、オブジェクトの形状とその運動の情報すなわち運動パラメータを付加することによって高圧縮率を得ることができる。しかしながら、オブジェクトの形状及びその運動を比較的少ないフレーム数でロバストすなわち安定に推定するのが困難である。
【００１２】
本発明の目的は、オブジェクトの形状及びその運動を比較的少ないフレーム数でロバストに推定することができる動画像のオブジェクト抽出装置を提供することである。
【００１３】
【課題を解決するための手段】
本発明による動画像のオブジェクト抽出装置は、動画像中に含まれる複数の運動を２フレームで推定し、運動ごとのオブジェクトを抽出する動画像のオブジェクト抽出装置であって、前記２フレームの動画像の全ての動きベクトルに対して一般化 Hough 変換を行なって、オブジェクト毎の１フレーム間隔における、拡大及び／又は縮小、垂直並進及び水平並進並びに回転のいずれかの運動パラメータの複数の候補のうち、互いに相違する動きの複数のオブジェクトが存在する場合に解空間が交差する複数の運動パラメータを抽出する運動パラメータ取得手段と、前記抽出した複数の運動パラメータ及び前記動画像の前フレームから動き推定を行って前記動画像の現フレームを予測する現フレーム予測手段と、前記フレーム予測手段によって予測された現フレームと実際の現フレームとの差分を検出する差分検出手段と、前記差分に基づいて前記動画像のオブジェクトをそれぞれ分類して出力するオブジェクト分類手段とを具え、前記オブジェクト分類手段は、該差分の絶対値を最小にする運動パラメータを当該画素の動きとして算出し、算出した画素毎の動きを分類して、分類した画素からなるオブジェクトを複数抽出することを特徴とするものである。
【００１４】
本発明によれば、比較的少ない個数のフレーム、すなわち、前フレームと現フレームとの２個のフレームのみを使用して、オブジェクトの形状及び運動パラメータを取得する。これらオブジェクトの形状及び運動パラメータを表現することによって、複数のオブジェクトを有する動画像を効率的に符号化することができる。本発明によれば、画素単位ではなくオブジェクト単位の運動を推定するため、オブジェクトの形状及びその運動を比較的少ないフレーム数でロバストに推定することができるようになる。
【００１５】
なお、本明細書中、動きベクトルとは、フレーム内の注目する画素が１フレーム前においてどの画素に対応するかを求め、その移動量を表す２次元ベクトル量のことを意味し、この動きベクトルをフレーム内全体に亘ってある所定の間隔で求めた集合を動きベクトル場と称する。
【００１６】
好適には、前記運動パラメータ取得手段が、前記動きベクトルを一般化Hough変換することによって前記運動パラメータを取得する。一般化Hough （ハフ）変換はパラメータ推定の一手法であり、観測された情報を生成し得る全てのパラメータ候補に対して投票を行い、得票数が集中したパラメータを以て推定値とする。画面内に複数の運動が混在する場合、パラメータ空間において複数の点に得票が集中するので、それらを順次探索することによって複数の運動の推定が可能となる。
【００１８】
前記運動パラメータとして、例えば、アフィンパラメータ及び／又はアフィンパラメータ空間中の独立する１個以上のパラメータを用いる。
【００１９】
【発明の実施の形態】
本発明によるの実施の形態を、図面を参照して詳細に説明する。
図１は、本発明による動画像のオブジェクト抽出装置の実施の形態を示す図である。この動画像のオブジェクト抽出装置は、Hough 変換を用いて動きベクトル場を運動パラメータ空間に写像することによって、動画像中に含まれる複数の運動を推定し、運動ごとのオブジェクトとして抽出するものである。
【００２０】
図１に示す動画像のオブジェクト抽出装置は、動きベクトル（ｕ（ｘ，ｙ），ｖ（ｘ，ｙ））から動きベクトル場を演算する動きベクトル演算回路１と、動きベクトル（ｕ（ｘ，ｙ），ｖ（ｘ，ｙ））をHough 変換して運動パラメータ
【外１】

を取得する運動パラメータ取得手段としてのHough 変換回路２と、現フレームを予測する予測部３と、予測された現フレームと実際の現フレームとの間の誤差を検出する誤差検出部４と、動きごとにオブジェクトを分類し及び抽出する動き分類回路５と、入力される動画像信号を１フレーム分遅延させる遅延回路６とを具える。
【００２１】
予測部３は、各運動パラメータに対応して動きを補償する動き補償回路３−１，３−２，３−３及び３−４を有し、誤差検出部４は、各運動パラメータに対応して差分を演算する差分回路４−１，４−２，４−３及び４−４を有する。これら動き補償回路及び差分回路の個数はそれぞれ、抽出したオブジェクトの数（Ｍ個）と同一になる。本実施の形態では、後に説明するように４パラメータの線形結合で表現される個々のオブジェクトの運動が動きベクトル場として観測されると仮定する。なお、本実施の形態では、前記４パラメータとして、例えば、水平並進、垂直並進、回転及び拡大を用いる。
【００２２】
本実施の形態の動作を説明する。先ず、動画像信号が動きベクトル演算回路１、誤差検出部４及び遅延回路６にそれぞれ入力される。なお、入力される動画像は、水平画素数Ｈ及び垂直画素数Ｖ（Ｈ及びＶを共に自然数とする。）からなる１画面を単位とし、時間間隔Ｔ（Ｔを実数とする。）で更新される信号とする。本実施の形態において、Ｈ×Ｖ画素からなる１画面をフレームと称し、図１においてｎ−１番目のフレーム（前フレーム）及びｎ番目のフレーム（現フレーム）をそれぞれＩ^(n-1) （ｘ，ｙ）及びＩ⁽ⁿ⁾ （ｘ，ｙ）で表現する（ｎを自然数とする。）。
【００２３】
動きベクトル場演算回路１は、入力された動画像から動きベクトル場を計算する。なお、動きベクトル場の計算アルゴリズムは、ブロックマッチング法、輝度勾配法等手法を問わない。取得するベクトル場も全画素に対して求める必要がなく、例えば４×４画素間隔の代表点に対して求める等、適切な間隔で得られればよい。
【００２４】
その後、Hough 変換回路２は、このように計算された動きベクトル場を構成する全ての動きベクトルを用いて、運動パラメータを取得する。本実施の形態では、取得する運動パラメータを、垂直並進、水平並進、回転及び拡大の４個の２次元運動とする。
【００２５】
ここで、Hough 変換の原理及び計算手順を、フレーム画像内オブジェクトの１フレーム間隔での運動として説明する。
画像中心を原点とする画像座標（ｘ，ｙ）における動きベクトルを（ｕ（ｘ，ｙ），ｖ（ｘ，ｙ））とし、オブジェクトの１フレーム間隔における水平並進量をξ［画素］とし、垂直並進量をη［画素］とし、回転量をφとし、拡大量をμとする。なお、回転量は角度の勾配で表す。すなわち、tan ^-1φが弧度角となる。拡大量μを、拡大倍率から１を減じたもので表す。すなわち、１＋μが拡大倍率となる。この場合、このオブジェクトが生成する動きベクトル場は以下の式を満足する。
【数１】
ｕ（ｘ，ｙ）＝μｘ−φｙ＋ξ
ｖ（ｘ，ｙ）＝φｘ＋μｙ＋η
【００２６】
観測された動きベクトル場から運動パラメータξ，η，φ，μを求めることを考える。画像座標（ｘ₀ ，ｙ₀ ）における動きベクトルが（ｕ₀ ，ｖ₀ ）である場合、これら４個の観測量ｘ₀ ，ｙ₀ ，ｕ₀ ，ｖ₀ の組合せを生じ得る運動パラメータξ，η，φ，μの候補は無数に存在するが、以下の式を満足する。
【数２】
ξ −ｙ₀ φ＋ｘ₀ μ−ｕ₀ ＝０
η＋ｘ₀ φ＋ｙ₀ μ−ｖ₀ ＝０
【００２７】
数２の解空間を全ての動きベクトルに対して計算すると、運動パラメータξ，η，φ，μの張る空間において、オブジェクトの運動パラメータに対応する点に解空間が多数交差する。互いに相違する動きの複数のオブジェクトが存在する場合、対応する複数の点に多数の解空間が交差する。その交点を抽出するためにHough 変換が行われる。
【００２８】
なお、この説明では垂直並進、水平並進、回転及び拡大に関する４次元の運動パラメータを用いたが、必要に応じて回転や拡大の除去、アフィン変換までの考慮など、運動パラメータを増減して定式化を行うこともできる。
【００２９】
次に、Hough 変換回路２の実装法を説明する。
先ず、パラメータ空間を離散化して４次元配列で表現する。４次元配列の要素の初期値を全て０にする。次いで、動きベクトル場を構成する個々の動きベクトル（ｕ₀ （ｘ₀ ，ｙ₀ ），ｖ₀ （ｘ₀ ，ｙ₀ ））について、これらｘ₀ ，ｙ₀ ，ｕ₀ ，ｖ₀ の組合せを生じ得る全ての解の候補点（ξ，η，φ，μ）に対応する４次元配列の要素の値に正の値（例えば１）を加える。この操作を投票と呼ぶ。
【００３０】
全ての動きベクトルに対して投票操作を行った後、得票数の集中した要素を探索する。この要素に対応する運動パラメータが、オブジェクトの動きである。互いに相違する動きの複数のオブジェクトが存在する場合、複数の要素に得票が集中する。その集中した要素から複数のオブジェクトの動きを検出することができる。なお、以下の説明において、オブジェクトの個数をＭとし、第ｉオブジェクトの運動パラメータを
【外２】

とする。但し、ｉ＝１，２，．．．，Ｍとする。
【００３１】
次に、Hough 変換回路２で得られた複数の動きパラメータの各々について、対応する動き補償回路３−１，３−２，３−３又は３−４が動き補償を行う。これら動き補償回路３−１，３−２，３−３又は３−４は、運動パラメータ
【外３】

に従って、前フレームＩ^(n-1) （ｘ，ｙ）をそれぞれ垂直並進し、水平並進し、回転し又は拡大する。
【００３２】
これら垂直並進、水平並進、回転及び拡大の結果が現フレームＩ⁽ⁿ⁾ （ｘ，ｙ）と重複した領域が、対応する運動パラメータの領域となる。一方、重複しない領域は、互いに相違する動きを有する領域となる。運動パラメータ〔外３〕による動き補償の結果を
【外４】

とすると、
【数３】

と表現することができる。
【００３３】
次に、差分回路４−１，４−２，４−３及び４−４のそれぞれは、対応する動き補償回路３−１，３−２，３−３又は３−４から出力された〔外４〕と現フレームＩ⁽ⁿ⁾ （ｘ，ｙ）との差分Ｄ_i （ｘ，ｙ）を、
【数４】

に従って演算する。
【００３４】
最後に、動き分類回路５は、差分回路４−１，４−２，４−３及び４−４の結果Ｄ_i （ｘ，ｙ）を用いて、動きごとにオブジェクトを抽出し及び分類する。具体的には、フレームの各画素（ｘ，ｙ）に対して、全ての運動パラメータ〔外３〕に対するＤ_i （ｘ，ｙ）を求め、その絶対値｜Ｄ_i （ｘ，ｙ）｜を最小にする運動パラメータ〔外３〕を以てその画素の動きとする。フレームを画素ごとに動きの分類を行った結果をＣ（ｘ，ｙ）とする。結果Ｃ（ｘ，ｙ）は、オブジェクトの番号ｉを値として有する。すなわち、
【数５】

と表現される。なお、オブジェクトにテクスチャが比較的少ない場合のように所定の画素において｜Ｄ_i （ｘ，ｙ）｜に有効な差が生じないときには、その画素を分類をこの際には未定とし、後に周囲から補間することもできる。また、ノイズ除去フィルタなどを付加してオブジェクトの形状を整形することもできる。
【００３５】
本実施の形態によれば、前フレームと現フレームとの２個のフレームのみを使用して、オブジェクトの形状及び運動パラメータをロバストに取得することができる。これらオブジェクトの形状及び運動パラメータを表現することによって、複数のオブジェクトを有する動画像を効率的に符号化することができる。
【００３６】
本発明は、上記実施の形態に限定されるものではなく、幾多の変更及び変形が可能である。例えば、運動パラメータをHough 変換以外の方法によって取得することができ、運動パラメータの組合せを任意に設定することができる。
【図面の簡単な説明】
【図１】本発明による動画像のオブジェクト抽出装置の実施の形態を示す図である。
【符号の説明】
１動きベクトル場演算回路
２ Hough 変換回路
３予測部
３−１，３−２，３−３，．．．，３−Ｍ動き補償回路
４誤差検出部
４−１，４−２，４−３，．．．，４−Ｍ差分回路
５動き分類回路
６遅延回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a moving image object extracting apparatus for extracting an object at the time of object-based image encoding.
[0002]
[Prior art]
Conventionally, in order to extract an object, a device that requires a special device such as a chroma key background and a device that requires a special sensor such as a range finder have been put into practical use.
[0003]
On the other hand, as a representative technique for cutting out an object from a natural image acquired by a normal camera or the like, there is the following.
1. 1. Method for dividing an image into regions based on luminance / color information and a vector field 2. A method of separating the background and the foreground from each other using the statistical information of the luminance value in the time direction. A method of artificially assigning approximate positions and contours and cutting out based on edge information [0004]
The region dividing method based on luminance / color information and a motion vector field is a method of dividing regions by integrating spatially adjacent pixels having similar colors and the like in consideration of similarities such as colors and spatial distances. Typical algorithms include the K-means algorithm (SZSelim et al., “K-means-type algorithms,” IEEE Trans. PAMI. Vol.6, No.1, pp.81-87 (1984)) and the region growth method ( SWZucker, “Region growing: Childhood and adolescence,” Computer Graphics and Image Processing, Vol.5, pp.382-399 (1976)).
[0005]
The technique using the statistical information of the luminance value in the time direction is a technique for estimating the luminance value of the background from the temporal change in luminance and extracting the foreground object from the difference between the background luminance and the input frame. It can be applied to cases where the background is fixed and has the feature that both the background image and the foreground can be obtained ("Study of moving object extraction method by background difference" by Sakaida et al. 1999 IEICE) General tournament).
[0006]
As a method based on edge information, an active contour model snakes (Kass et al., “Snakes: Active Contour Models,” Proceedings of First International Conference, which converges contours to image edges by iteratively solving the energy minimization problem. on Computer Vision, pp.259-269 (London UK, 1987)).
[0007]
A method using dynamic programming to control snakes (Amir et al., “Using Dynamic Programming for solving Variational Problems in Vision,” IEEE Trans. PAMI, Vol. 12, No. 9, pp. 855-867 (1990)), There is a speeding up method using the greedy algorithm (Williams et al., “A Fast Algorithm for Active Contours and Curvature Estimation,” CVGIP: Image Understanding, Vol.55, No.1, pp.14-26 (1992)).
[0008]
The Laplacian zero cross of luminance, luminance gradient, luminance value, etc. are handled in an integrated manner, and a method of extracting an object by tracking the contour by Dijkstra's minimum cost path planning algorithm has also been proposed, and stable extraction in various scenes has been proposed. (Mortensen et al., “Interactive Segmentation with Intelligent Scissors,” Graphical Models and Image Processing, Vol. 60, pp. 349-384 (1998)).
[0009]
There is research on facial contour extraction as an approach to stabilize extraction by limiting the extraction target (Yokoyama et al., “Proposal of Active Contour Model for Facial Contour Extraction”, essay theory, Vol.40, No .3, pp1127-1137 (1999)).
[0010]
In addition, instead of explicitly handling contour edges, a method that obtains an object region by applying a local iterated function system based on a self-similar structure extracted from an image allows only smooth contours to be obtained. (Ida et al. “High-precision extraction of object contours using LIFS,” IEICE D-II, Vol.J82-D-II, No.8, pp1282- 1289 (1999)).
[0011]
[Problems to be solved by the invention]
In moving picture coding, a high compression ratio can be obtained by adding information on the shape of an object and its movement, that is, a movement parameter. However, it is difficult to robustly and stably estimate the shape of an object and its motion with a relatively small number of frames.
[0012]
An object of the present invention is to provide a moving image object extraction device capable of robustly estimating the shape and motion of an object with a relatively small number of frames.
[0013]
[Means for Solving the Problems]
A moving image object extraction device according to the present invention is a moving image object extraction device that estimates a plurality of movements included in a moving image in two frames and extracts an object for each movement. A generalized Hough transform is performed on all the motion vectors of, and among a plurality of candidates for motion parameters of one of enlargement and / or reduction, vertical translation and horizontal translation, and rotation at one frame interval for each object, When there are a plurality of objects having different motions, motion parameter acquisition means for extracting a plurality of motion parameters intersecting the solution space, and motion estimation is performed from the extracted motion parameters and the previous frame of the moving image. a current frame prediction means for predicting the current frame of the moving image Te, predicted by said frame predicting means A difference detecting means for detecting a difference between the actual current frame and current frame, comprising the object classification means for outputting by classifying each object of the moving image based on the difference, the object classification means, The motion parameter that minimizes the absolute value of the difference is calculated as the motion of the pixel, the motion for each pixel is classified, and a plurality of objects composed of the classified pixels are extracted .
[0014]
According to the present invention, the shape and motion parameters of an object are obtained using only a relatively small number of frames, that is, only two frames, a previous frame and a current frame. By expressing the shape and motion parameters of these objects, a moving image having a plurality of objects can be efficiently encoded. According to the present invention, since the motion of an object unit, not a pixel unit, is estimated, the shape of the object and its motion can be estimated robustly with a relatively small number of frames.
[0015]
In this specification, a motion vector means a two-dimensional vector amount representing the amount of movement of a pixel to which a pixel of interest in a frame corresponds one frame before, and this motion vector A set obtained at a predetermined interval over the entire frame is called a motion vector field.
[0016]
Preferably, the motion parameter acquisition means acquires the motion parameter by performing a generalized Hough transform on the motion vector. Generalized Hough transform is a method of parameter estimation. Voting is performed for all parameter candidates that can generate observed information, and an estimated value is obtained by using a parameter in which the number of votes is concentrated. When a plurality of motions coexist in the screen, votes are concentrated at a plurality of points in the parameter space, so that a plurality of motions can be estimated by sequentially searching for them.
[0018]
As the motion parameter, for example, an affine parameter and / or one or more independent parameters in the affine parameter space are used.
[0019]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing an embodiment of a moving image object extracting apparatus according to the present invention. This moving image object extraction device estimates a plurality of motions included in a moving image by mapping a motion vector field to a motion parameter space using Hough transform, and extracts them as objects for each motion. .
[0020]
The moving image object extraction apparatus shown in FIG. 1 includes a motion vector calculation circuit 1 that calculates a motion vector field from a motion vector (u (x, y), v (x, y)), and a motion vector (u (x, y, y), v (x, y)) to Hough transform and motion parameters [Outside 1]

Hough transform circuit 2 as a motion parameter acquisition means for acquiring motion, prediction unit 3 for predicting the current frame, error detection unit 4 for detecting an error between the predicted current frame and the actual current frame, and motion A motion classification circuit 5 that classifies and extracts objects for each frame, and a delay circuit 6 that delays an input moving image signal by one frame.
[0021]
The prediction unit 3 includes motion compensation circuits 3-1, 3-2, 3-3, and 3-4 that compensate for motion corresponding to each motion parameter, and the error detection unit 4 corresponds to each motion parameter. Difference circuits 4-1, 4-2, 4-3, and 4-4. The number of motion compensation circuits and difference circuits is the same as the number of extracted objects (M). In this embodiment, it is assumed that the motion of each object represented by a linear combination of four parameters is observed as a motion vector field, as will be described later. In the present embodiment, for example, horizontal translation, vertical translation, rotation, and enlargement are used as the four parameters.
[0022]
The operation of this embodiment will be described. First, a moving image signal is input to the motion vector calculation circuit 1, the error detection unit 4, and the delay circuit 6, respectively. Note that the input moving image is updated at a time interval T (T is a real number) in units of one screen having a horizontal pixel number H and a vertical pixel number V (H and V are both natural numbers). Signal. In this embodiment, one screen composed of H × V pixels is referred to as a frame, and in FIG. 1, the ⁽ⁿ⁻¹⁾ th frame (previous frame) and the nth frame (current frame) are respectively represented by I ⁽ⁿ⁻¹⁾ ( x, y) and I ⁽ⁿ⁾ (x, y) (n is a natural number).
[0023]
The motion vector field calculation circuit 1 calculates a motion vector field from the input moving image. Note that the motion vector field calculation algorithm may be a block matching method, a luminance gradient method, or the like. The vector field to be acquired does not need to be obtained for all the pixels, and may be obtained at an appropriate interval, for example, for a representative point having a 4 × 4 pixel interval.
[0024]
Thereafter, the Hough transform circuit 2 acquires motion parameters using all motion vectors constituting the motion vector field calculated in this way. In the present embodiment, the motion parameters to be acquired are four two-dimensional motions of vertical translation, horizontal translation, rotation, and enlargement.
[0025]
Here, the principle and calculation procedure of the Hough transform will be described as the movement of the object in the frame image at one frame interval.
The motion vector at the image coordinates (x, y) with the image center as the origin is defined as (u (x, y), v (x, y)), and the horizontal translation amount at one frame interval of the object is defined as ξ [pixel]. The vertical translation amount is η [pixel], the rotation amount is φ, and the enlargement amount is μ. Note that the rotation amount is represented by an angle gradient. That is, tan ^-1 φ is the arc angle. The enlargement amount μ is expressed by subtracting 1 from the enlargement magnification. That is, 1 + μ is the magnification. In this case, the motion vector field generated by this object satisfies the following expression.
[Expression 1]
u (x, y) = μx−φy + ξ
v (x, y) = φx + μy + η
[0026]
Consider obtaining motion parameters ξ, η, φ, and μ from the observed motion vector field. When the motion vector at the image coordinates (x ₀ , y ₀ ) is (u ₀ , v ₀ ), the motion parameters ξ, which can produce a combination of these four observations x ₀ , y ₀ , u ₀ , v ₀ There are an infinite number of candidates for η, φ, and μ, but the following expression is satisfied.
[Expression 2]
ξ −y ₀ φ + x ₀ μ−u ₀ = 0
η + x ₀ φ + y ₀ μ−v ₀ = 0
[0027]
When the solution space of Equation 2 is calculated for all motion vectors, a large number of solution spaces intersect with points corresponding to the motion parameters of the object in the space spanned by the motion parameters ξ, η, φ, and μ. When there are a plurality of objects having different motions, a large number of solution spaces intersect with a plurality of corresponding points. A Hough transform is performed to extract the intersection.
[0028]
In this description, four-dimensional motion parameters related to vertical translation, horizontal translation, rotation, and enlargement were used, but formulation was made by increasing / decreasing motion parameters as necessary, including removal of rotation and enlargement, and consideration of affine transformation. Can also be done.
[0029]
Next, a mounting method of the Hough conversion circuit 2 will be described.
First, the parameter space is discretized and expressed in a four-dimensional array. All the initial values of the elements of the four-dimensional array are set to zero. Next, for each motion vector (u ₀ (x ₀ , y ₀ ), v ₀ (x ₀ , y ₀ )) constituting the motion vector field, a combination of these x ₀ , y ₀ , u ₀ , v ₀ is determined. A positive value (for example, 1) is added to the values of the elements of the four-dimensional array corresponding to all possible solution points (ξ, η, φ, μ). This operation is called voting.
[0030]
After the voting operation is performed on all the motion vectors, an element having a large number of votes is searched. The motion parameter corresponding to this element is the motion of the object. When there are a plurality of objects having different movements, votes are concentrated on a plurality of elements. The movement of a plurality of objects can be detected from the concentrated elements. In the following description, the number of objects is M, and the motion parameter of the i-th object is

And However, i = 1, 2,. . . , M.
[0031]
Next, for each of the plurality of motion parameters obtained by the Hough transform circuit 2, the corresponding motion compensation circuit 3-1, 3-2, 3-3 or 3-4 performs motion compensation. These motion compensation circuits 3-1, 3-2, 3-3 or 3-4 are motion parameters

Accordingly, the front frame I ^(n-1) (x, y) is translated vertically, horizontally, rotated, or enlarged.
[0032]
A region where the results of the vertical translation, horizontal translation, rotation, and enlargement overlap with the current frame I ⁽ⁿ⁾ (x, y) is a region of the corresponding motion parameter. On the other hand, non-overlapping areas are areas having different movements. Results of motion compensation using motion parameters [Outside 3] [Outside 4]

Then,
[Equation 3]

It can be expressed as
[0033]
Next, each of the difference circuits 4-1, 4-2, 4-3 and 4-4 is output from the corresponding motion compensation circuit 3-1, 3-2, 3-3 or 3-4 [outside 4] and the difference D _i (x, y) between the current frame I ⁽ⁿ⁾ (x, y),
[Expression 4]

Calculate according to
[0034]
Finally, the motion classification circuit 5 extracts and classifies objects for each motion using the results D _i (x, y) of the difference circuits 4-1, 4-2, 4-3, and 4-4. Specifically, for each pixel (x, y) of the frame, D _i (x, y) for all motion parameters [outside 3] is obtained, and the absolute value | D _i (x, y) | The motion of the pixel is determined by the motion parameter [outside 3] to be minimized. Let C (x, y) be the result of classifying the motion of the frame for each pixel. The result C (x, y) has the object number i as a value. That is,
[Equation 5]

It is expressed. Note that when there is no effective difference in | D _i (x, y) | in a predetermined pixel as in the case where the texture of the object is relatively small, the classification of the pixel is undetermined at this time, and from the surroundings later Interpolation is also possible. It is also possible to shape the shape of an object by adding a noise removal filter or the like.
[0035]
According to the present embodiment, it is possible to robustly acquire the shape and motion parameters of an object using only two frames, the previous frame and the current frame. By expressing the shape and motion parameters of these objects, a moving image having a plurality of objects can be efficiently encoded.
[0036]
The present invention is not limited to the above-described embodiment, and many changes and modifications can be made. For example, the motion parameters can be obtained by a method other than the Hough transform, and a combination of motion parameters can be arbitrarily set.
[Brief description of the drawings]
FIG. 1 is a diagram showing an embodiment of a moving image object extracting apparatus according to the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Motion vector field arithmetic circuit 2 Hough conversion circuit 3 Prediction part 3-1, 3-2, 3-3,. . . , 3-M motion compensation circuit 4 error detectors 4-1, 4-2, 4-3,. . . , 4-M Difference circuit 5 Motion classification circuit 6 Delay circuit

Claims

A moving image object extraction device that estimates a plurality of movements included in a moving image in two frames and extracts an object for each movement,
A generalized Hough transform is performed on all the motion vectors of the two-frame moving image , and motion parameters of one of enlargement and / or reduction, vertical translation and horizontal translation, and rotation at one frame interval for each object A plurality of motion parameters acquisition means for extracting a plurality of motion parameters intersecting the solution space when there are a plurality of objects having different motions among the plurality of candidates ;
Current frame prediction means for predicting the current frame of the moving image by performing motion estimation from the extracted plurality of motion parameters and the previous frame of the moving image;
Difference detection means for detecting a difference between the current frame predicted by the frame prediction means and the actual current frame;
Object classification means for classifying and outputting each object of the moving image based on the difference, and
The object classifying means calculates a motion parameter that minimizes the absolute value of the difference as a motion of the pixel, classifies the calculated motion for each pixel, and extracts a plurality of objects including the classified pixels. A moving image object extraction device.

The moving image object extraction device according to claim 1, wherein an affine parameter and / or one or more independent parameters in an affine parameter space are used as the motion parameter .