JP3663645B2

JP3663645B2 - Moving image processing apparatus and method

Info

Publication number: JP3663645B2
Application number: JP28565894A
Authority: JP
Inventors: 雅博藤田; 勝之田中; 仁佐藤; 繁有沢
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1994-11-18
Filing date: 1994-11-18
Publication date: 2005-06-22
Anticipated expiration: 2020-06-22
Also published as: JPH08149461A

Description

【０００１】
【産業上の利用分野】
本発明は、たとえば動画像の伝送・記録などに用いて好適な、動画像中の物体の３次元形状モデルを抽出する動画像処理装置、および、その形状モデルを用いて高圧縮率で動画像を符号化および復号化する動画像符号化装置、動画像復号化装置、さらに、その符号化された動画像を記録媒体上に記録する動画像記録装置、および、それを再生する動画像再生装置に関する。
【０００２】
【従来の技術】
動画像系列の中の各物体の３次元モデルを使って動画像を符号化することで、動画像系列を圧縮する方法が提案されている。各物体の３次元形状とその動きがわかれば、元の動画像系列と全く同じ動画像系列が生成できる。
そこで、たとえば画像通信において、送信側と受信側で３次元モデルを共有し、送信側で入力画像の動きの情報を検出し、受信側でその動きの情報から画像合成を行えば、画像が再生できる。この場合、動きの情報のみを伝送すればよいことから超低レートでの画像通信が期待できる。具体的には、顔の３次元構造モデルをワイヤフレームに変形し送信側と受信側で共有し、表情などの特徴のみを伝送して顔画像の合成を行う方法が盛んに試みられている。
【０００３】
しかし、この符号化方法を自然画像に応用する場合には、予め３次元モデルを用意しておくことは不可能であり、与えられた動画像系列から３次元モデルを抽出する必要がある。そのような、動画像系列の中から３次元形状モデルを抽出し、そのモデルを利用して動画像を符号化する方法としては、Hans George Musmann 、Michael Hotter、Jorn Ostermannらによる「OBJECT-ORIENTED analysis coding of moving images．」〔Signal processing:Image Communication 1(1989):117-138,Elsvier SCIENCE PUBLISHERS B.V. 〕に開示されている方法がある。この方法によれば、エッジ部分については動きベクトルを求めそれを利用して奥行きを求め、エッジ以外の部分については奥行きを補間して、３次元形状を推測する。また、輝度情報は、３次元上に属性としてマッピングし、補間された３次元形状のデータと共に送出を行っている。
【０００４】
【発明が解決しようとする課題】
しかし、前述したような方法で３次元形状を推測し、動画像系列を圧縮する方法においては、輝度が急変するエッジ部分さえ動きベクトルを求めることは難しく、動きベクトルから正確な奥行きを推測することは非常に難しかった。したがって、エッジ部分の奥行きを補間して３次元形状を推測しても、正確な３次元形状が求められなかった。また、そのため、実際の形状とのずれがあるため余分な情報が増加して、圧縮率が上げられなかった
さらに、前述したような方法では、３次元上に属性としてマッピングした輝度情報を初期情報として伝送するので初期情報量が多いという問題があった。
【０００５】
したがって、本発明の目的は、動画像系列から忠実度の高い３次元形状モデルを抽出する動画像処理装置を提供することにある。また、その３次元形状モデルを使って高い圧縮率で動画像の符号化が可能な動画像符号化装置、および、それを復号する動画像復号化装置を提供することにある。さらに、その符号化された動画像を記録媒体上に記録する動画像記録装置、および、それを再生する動画像再生装置を提供することにある。
【０００６】
【課題を解決するための手段】
本発明の動画像処理装置においては、３次元物体の表面データではなく、２次元動画像にしたときに視覚的に重要な点の３次元位置、および、その点の分析値を初期値として持ち、物体の移動を検出しながら、３次元位置情報を獲得するようにした。
【０００７】
本発明の第１の観点によれば、入力された連続的な動画像の各フレームの静止画像についてエッジを構成する点を特徴点として検出し、前記エッジを構成する各点の位置と分析値を抽出し、前記特徴点によって構成される前記各フレームの画像を構成する各セグメントの３次元形状情報を得るモデリング手段と、前記モデリング手段により得られた前記各セグメントの３次元形状情報を記憶する記憶手段と、前記連続的な動画像の各フレーム間で該動画像を構成する前記各セグメントが３次元的に動いた量の推定値を、前記記憶手段に記憶されている前フレームの前記各セグメントの３次元形状情報と現フレームの各セグメントの３次元形状情報とから算出する動き推定手段と、前記動き推定手段により推定された前記各セグメントの動いた量により各セグメントを３次元的に移動させた位置と実際の位置の差の２乗の総和が最小になるように、実際の位置から３次元的に移動させた位置の最小自乗推定値を求めることで、前記記憶手段に記憶されている各セグメントの３次元形状情報を更新する更新手段とを有し、
前記モデリング手段は、前記入力された連続的な動画像の各フレームの静止画像を異なる解像度スケールを持つフィルタで分析してエッジを構成する点を特徴として抽出する画像分析手段と、前記連続的な動画像の１フレーム目の静止画像について、前記抽出された特徴点により構成されるセグメントの２次元形状情報を抽出するセグメンテーション手段と、前記各セグメントの２次元形状情報に、所定の奥行き方向の位置情報を付加し、前記各セグメントの３次元形状情報の初期値を生成する３次元情報生成手段とを有し、
前記更新手段は、前記動き推定手段により推定された前記各セグメントの動いた量により各セグメントを３次元的に移動させた結果の各特徴点の位置と、実際の各特徴点との位置の差に基づいて、実際の位置から３次元的に移動させた位置の最小自乗推定値を求めることで、前記記憶手段に記憶されている各セグメントの３次元形状情報を各フレームごとに更新し、
前記連続的な動画像を構成する各セグメントの３次元形状情報を獲得する、動画像処理装置が提供される。
【０００９】
好ましくは、前記更新手段は、前記更新手段は、前記推定された動きにより各セグメントを３次元的に移動させた結果の各特徴点の位置を状態量、当該特徴点の実際の位置を観測量、前記状態量と前記観測量の差をノイズとし、カルマンフィルタにより前記状態量の最小自乗推定値を求めることにより、前記各セグメントの３次元形状情報を更新する。
【００１０】
本発明の第２の観点によれば、上記動画像処理装置と、前記獲得された３次元形状情報と、前記画像分析手段により分析された当該３次元形状情報を構成する各特徴点の分析値とを符号化する初期符号化手段と、前記動き推定手段により推定された動きにより各セグメントを３次元的に移動させた結果の各特徴点の位置と、実際の各特徴点の位置の差を求める差検出手段と、前記動き推定手段により推定された各セグメントの動き推定値と、前記差検出手段により検出された各特徴点の位置の差とを、各フレームごとに符号化する符号化手段とを有し、前記連続的な動画像を符号化する、動画像符号化装置が提供される。
【００１１】
本発明の第３の観点によれば、上記動画像符号化装置と、該動画像符号化装置により符号化された前記連続的な動画像を記録媒体上に記録する記録手段とを有する動画像記録装置が提供される。
【００１２】
本発明の第４の観点によれば、入力された連続的な動画像に関する所定の静止画像が分析され、エッジを構成する点が特徴点として検出され、符号化された前記連続的な動画像を復号する動画像復号装置であって、前記符号化された連続的な動画像を構成し、前記特徴点を構成要素とする各セグメントの３次元形状情報と、該３次元形状情報を構成する前記各特徴点の分析値とを復号化する初期復号化手段と、前記連続的な動画像の各フレームごとに符号化された、前記各セグメントの動き推定値と、前記各セグメントの各特徴点ごとの前記入力された動画像の対応する各特徴点からの変位とを復号化する復号化手段と、前記各セグメントの位置を、前記復号化手段により復号化された各セグメントの動き推定値に基づいて３次元的に移動させる移動手段と、前記移動手段により移動された位置の前記各セグメントを、２次元画面上に投影した投影画像を得る投影手段と、前記投影画像における各セグメントの各特徴点の位置を、前記復号化手段により復号化された各セグメントの各特徴点ごとの前記入力された動画像の対応する各特徴点からの変位に基づいて移動させる変形手段と、前記変形手段により変形された結果の画像と、前記初期復号化手段により復号化された各特徴点の分析値とに基づいて、画像を合成する画像合成手段とを有し、符号化された連続的な動画像を復号化する、動画像復号化装置が提供される。
【００１３】
本発明の第５の観点によれば、記録媒体上に記録された符号化された連続的な動画像信号を読み出す信号読み取り手段と、前記読み出された符号化された連続的な動画像信号を復号化し、前記連続的な動画像を合成する、上記動画像復号化装置とを有する動画像再生装置が提供される。
【００１５】
【作用】
本発明の動画像処理装置においては、２次元動画像にしたときに視覚的に重要な点の３次元位置、および、その点の分析値を特徴点として、物体の移動を検出しながら、３次元形状情報を順次更新しするようにした。したがって、その連続的な動画像系列を通して最も矛盾が無い３次元形状情報が抽出できる。
また、３次元形状情報が精度よく抽出できるので、動画像系列の各フレーム間の動きをそのセグメント全体の動きで表した際に、各特徴点ごとの微小な変位が少なくなり、その分布は変位０を中心に局在化する。その結果、さらに圧縮率が向上する。
【００１６】
本発明によれば、上述した動画像中の物体の３次元形状モデルを抽出する動画像処理装置、および、その形状モデルを用いて高圧縮率で動画像を符号化および復号化する動画像符号化装置、動画像復号化装置、さらに、その符号化された動画像を記録媒体上に記録する動画像記録装置、および、それを再生する動画像再生装置が提供される。
【００１７】
【実施例】
本発明の一実施例の動画像符号化装置について、図１を参照して説明する。
図１は、本発明の一実施例の動画像符号化装置１０の構成を示すブロック図である。
動画像符号化装置１０は、画像分析部１１、分析画像記憶部１２、セグメンテーション部１３、記憶部１４、動き推定・対応探索部１５、投影部１６、誤差検出部１７、カルマンフィルタ１８、および、符号化部１９を有する。
この動画像符号化装置１０は、後述する動画像復号化装置３０と協働して画像処理系を構成する。
【００１８】
本実施例の動画像符号化装置１０は、ＶＴＲなどからの動画像系列より、図示せぬ連続シーケンス検出部で連続した画像データを検出し、その各連続した画像データ系列を符号化し記録する、動画像記録装置である。その記録に際しては、前記連続シーケンスより、そのシーケンスを構成するセグメントの３次元形状情報を抽出する第１のステップと、抽出された３次元形状情報を用いて符号化を行う第２のステップとに分けられる。以下、各部の動作について、前記第１のステップ、第２のステップごとに説明する。
【００１９】
まず、連続的な動画像系列より、その動画像を構成する各セグメントの３次元形状情報を抽出するステップにおける各部の動作について説明する。
ＶＴＲなどから入力された動画像系列は、連続したシーケンスが検出され、画像分析部１１に入力される。
画像分析部１１は、順次入力される各フレームの画像データを分析し、特徴点を抽出し、特徴点の位置と分析値を求める。本実施例においては、入力画像データに対して、異なる解像度スケールを持つ複数のフィルタで画像データの分析を行い、エッジを構成する点を特徴点として検出し、入力画像データを特徴画像データであるエッジの画像データに変換し、そのエッジを構成する各点の位置と分析値を抽出する。
【００２０】
分析画像記憶部１２は、画像分析部１１で分析された連続的な連続シーケンスの特徴点画像を記憶するメモリである。記憶されている各フレームの特徴点の情報は、動き推定・対応探索部１５より順次参照され、また、１フレーム目の特徴点の情報は、セグメンテーション部１３および符号化部１９より参照される。
【００２１】
セグメンテーション部１３は、画像分析部１１より入力された１フレーム目の特徴画像データの特徴点の位置と分析値より、セグメンテーションを行い、この画像データを構成しているセグメント（部分または要素）を抽出し、各セグメント毎の特徴点の情報を記憶部１４に記憶する。
このセグメンテーションは、カラー画像から赤・緑・青・明度・色相・彩度の信号、および、テレビ信号に対応したＹ信号、Ｉ信号、Ｑ信号の合計９種類のセグメントについて特徴を抽出し、その特徴に関するヒストグラムに基づいてセグメンテーションを行う再帰的しきい値処理により行う。
【００２２】
記憶部１４は、各セグメントの各特徴点について、位置情報Ｘ，Ｙ，Ｚと、分析値ｇ、確率共分散行列ｖ、付加情報ａを記憶する記憶手段であり、メモリにより構成される。記憶部１４に記憶されている情報は、入力された連続シーケンスがＳ個のセグメントを有し、各セグメントがＵs 個（ｓ＝１〜Ｓ）のエッジより構成され、その各エッジがＮsu個（ｕ＝１〜Ｕs 、ｓ＝１〜Ｓ）の特徴点より構成される場合、式１のように表される。
【００２３】
【数１】

【００２４】
なお、確率共分散行列ｖsun （ｎ＝１〜Ｎsu、ｕ＝１〜Ｕs 、ｓ＝１〜Ｓ）は、各エッジを構成する点のちらばりであるので、同一のエッジを構成する各特徴点については同一の値が付される。
【００２５】
記憶部１４に記憶されている情報は、まず、１フレーム目についての情報がセグメンテーション部１３より入力され、初期データが生成される。その後、２フレーム目以降の画像データが入力されるごとに、後述するカルマンフィルタ１８によりその内容が更新される。
【００２６】
動き推定・対応探索部１５は、前フレームの画像のエッジ画像の各点の情報｛Ｆsun ｝と現フレームのエッジ位置と分析値から、セグメントの動いた量を推定し、前フレームの各点の情報｛Ｆsun ｝と、現フレームのエッジ画像の特徴点の対応付けを行う。
その方法について具体的に以下に説明する。
まず、図２において、座標系ＸＹＺはカメラ座標系で、座標系の原点はレンズの中心で、光軸は奥行き方向となるＺ軸と一致させているものとする。このような座標系においては、点Ｐの像はＸＹ平面に平行で原点からカメラの焦点距離ｆだけ離れた所に設置された平面に投影されると考えることができる。この投影面上の点Ｐの像の位置がカメラより入力された画像上の画素の位置となる。その投影面に対して、その面のＺ軸との交点を原点とし、Ｘ軸およびＹ軸と平行な座標系ｘｙを設定する。
【００２７】
ＸＹＺ空間内の点Ｐの座標をｐ＝（Ｘp ，Ｙp ，Ｚp ）、点Ｐのｘｙ平面上の像である点Ｑの座標をｑ＝（ｘq ，ｙq ）とすると、点Ｑの座標ｑは式２のように表される。
【００２８】
【数２】

【００２９】
あるセグメントｓ（ｓ＝１〜Ｓ）がＵs 個（ｓ＝１〜Ｓ）のエッジより構成され、その各エッジがＮsu個（ｕ＝１〜Ｕｓ，ｓ＝１〜Ｓ）の点の情報で表され、それら各点の位置はｐsun ＝（Ｘsun ，Ｙsun ，Ｚsun ）（ｎ＝１〜Ｎsu）で表されるとする。
この画像データを構成するセグメントが、相対的にＸ軸周りにΔωｘ、Ｙ軸周りにΔωｙ、Ｚ軸周りにΔωｚ回転し、また、Δｔ＝（Δｔｘ，Δｔｙ，Δｔｚ）だけ平行移動した場合、このセグメントを構成する各点ｐsun の移動量Δｐsun ＝（ΔＸsun ，ΔＹsun ，ΔＺsun ）は、前記各軸周りの回転Δωｘ，Δωｙ，Δωｚ、および，平行移動量Δｔが小さいとすると、式３のようになる。
【００３０】
【数３】

【００３１】
点ｐsun のｘｙ平面上への投影点をｑsun ＝（ｘsun ，ｙsun ）とすると、前記セグメントの移動にともなう投影点ｑsun の移動量Δｑsun ＝（Δｘsun ，Δｙsun ）は式４のようになる。
【００３２】
【数４】

【００３３】
式２と式４より式５が得られる。
【００３４】
【数５】

【００３５】
式３および式５を、Ｎ個の点の内のｍ＝１〜ＭのＭ個に適用すると、式６のようになる。
【００３６】
【数６】

【００３７】
なお、Δｔ^t は行列Δｔの転置行列を示す。
Δｑについては、新たな画像が入力される前に得ていた３次元位置ｐｍの式２による仮想の投影点ｑｍ’に対応する画像上の点が分からないので、図３に示すように、３次元位置情報の仮想の投影像Ｉｐにおいて物体像Ｉｒの投影特徴点ｑｍから最も近い点と仮定する。
Ｍ≧３のとき回転および平行移動量のパラメータΔＣの推定値ΔＣ’は、最小自乗法を用いて式７のように求められる。
【００３８】
【数７】

【００３９】
式７により得られたΔＣ’による３次元位置情報の移動量Δｑsun を式２より計算して、新たに式３により仮想の投影像を作り、同様に近い点を対応点と仮定し、式７の計算を繰り返し、式８のようにしていくと、仮想の投影像と物体像Ｉｒは近づく。
【００４０】
【数８】

【００４１】
この計算を、Σ｜Δｑsun ｜² が予め定めた所定値ε以下になるまで繰り返すことにより、元の画像の３次元位置情報ｐsun に対する新たな画像の対応点ｑsun が求められる。
【００４２】
以上述べたような動き推定・対応探索の方法によれば、物体を剛体と仮定し、回転および平行移動についての６個のパラメータで３次元位置情報を構成する点を拘束することで、個々の点それぞれ独立にではなく、包括的に動き推定・対応探索が行われている。したがって、全体として矛盾のない対応関係が全ての点について得られ、誤対応による３次元位置情報におけるノイズが減る。
【００４３】
投影部１６は、動き推定・対応探索部１５により得られた、各セグメントの平行移動量ｔ、回転移動量ωによって各セグメントの情報｛Ｆsun ｝の位置情報を、３次元空間において平行および回転移動させ、さらに、各セグメントの各点の３次元位置ｐsun ＝（Ｘsun ，Ｙsun ，Ｚsun ）を画像上の位置ｑsun ＝（ｘsun ，ｙsun ）に変換し、得られた画像上の点ｑsun に分析値ｇsun を与える。３次元位置ｐsun から投影点ｑsun への変換は式３により行う。
【００４４】
誤差検出部１７は、投影後の各特徴点の情報｛Ｆsun ｝の画像上での位置ｑsun'と対応する入力画像のエッジ位置ｑsun との差を求め、さらに、その差を量子化し、フラクチュエーション（変動）を求める。
量子化方法としては、一定の適切な量子化ステップ（たとえば１画素幅）による線形な量子化、いくつかの線形でない量子化ステップを設定した非線形量子化、量子化ステップを固定せず、入力される画像の性質により量子化ステップを適宜変える量子化などがあり、要求される伝送レート、画質に応じて、適切な量子化方法を用いれば良い。たとえば、高圧縮率が要求される場合には、量子化ステップを大きくしたり、画像に直線が多く量子化ノイズによる直線の不連続性が目立つ場合は、非線形量子化を行い、フラクチュエーションの小さい部分の量子化を細かくするようにする。
求められたフラクチュエーションは、カルマンフィルタ１８に入力される。
【００４５】
カルマンフィルタ１８は、前の画像における各セグメントの各点の情報｛Ｆsun ｝の３次元位置ｐsun とそれに対応する入力画像のエッジ位置ｑsun から３次元位置ｐsun を更新する。カルマンフィルタはノイズを含むシステムにおいて時系列の観測量から状態量の最小自乗推定値を逐次得ることのできるフィルタである。
ここで、最小自乗推定値とは、公知の最小自乗法に基づいて推定した値を言い、前の画像における各セグメントの各点の情報｛Ｆ sun ｝の３次元位置ｐ sun とそれに対応する入力画像のエッジ位置ｑ sun との誤差の自乗（２乗）の和が最小になるように、３次元位置ｐ sun を推定することを言う。
本実施例において、状態量は３次元位置ｐsun 、観測量である２次元位置ｑsun である。２次元位置ｑsun にはΔｑsun の量子化によるノイズが含まれる。また、動き推定値にもノイズが含まれる。
初期値の平面上の３次元形状｛ｐsun ｝は、カルマンフィルタによりセグメントに動きがあるごとに、実際の３次元形状に近づくように更新されていく。各点の情報｛Ｆsun ｝における確率共分散行列ｖsun はｐsun の確率共分散行列（３×３）でｐsun を更新するのに用いられ、同時に確率共分散行列ｖsun も更新される。
【００４６】
以上の、カルマンフィルタにおける更新を、連続的な動画像の全フレームについて行うと、最終的に記憶部１４には、各セグメントごとの忠実度の高い３次元形状モデルが記憶される。
【００４７】
次に、前記第１のステップにおいて抽出された３次元形状モデルを用いて、この連続的な動画像を符号化する第２のステップについて説明する。
第２のステップにおいても、各部の動作は第１のステップと同じである。しかし、第２のステップにおいては、記憶部１４に記憶されている最終的な各セグメントの３次元形状情報を用いて動き推定・対応探索を行い、セグメントごとの動きを抽出し、各特徴点の実際の位置との差を求める。
【００４８】
したがって、まず、動き推定・対応探索部１５において、記憶部１４に記憶されている各セグメントの３次元形状情報を用いて、分析画像記憶部１２に記憶されている各フレームごとの特徴点の位置より、各セグメントの全体の動きと、各特徴点の対応を求める。その求め方は、前記第１のステップの場合と同一である。ここで求められた動きは投影部１６および符号化部１９に出力される。
【００４９】
そして、投影部１６は、動き推定・対応探索部１５により得られた各セグメントの移動量によって各セグメントの位置情報を、３次元空間において平行および回転移動させ、さらに、各セグメントの各点の３次元位置を、画像上の位置に変換する。
誤差検出部１７は、投影後の各特徴点の画像上での位置と対応する入力画像のエッジ位置との差を求め、さらに、その差を量子化し、フラクチュエーションを求める。求められたフラクチュエーションは、符号化部１９に出力される。
【００５０】
符号化部１９は、入力された動画像系列の情報を符号化し、伝送路に送出する。
符号化部１９は、各連続画像シーケンスについて、まず、記憶部１４に記憶されている３次元形状情報、および、分析値、を符号化する。また、各フレームの画像データについては、各セグメント毎に、動き推定・対応探索部１５より出力される動き推定値と、誤差検出部１７より出力されるフラクチュエーションを符号化し出力する。
符号化された各連続画像シーケンスごとのデータは、たとえば、ＶＴＲなどの画像記録装置に記録される。
【００５１】
このように、本実施例の動画像符号化装置１０によれば、各フレームごとのセグメントの動きを推定しながら、忠実度の高い３次元形状モデルを抽出し、その抽出された３次元形状モデルを参照して、さらに、各フレームごとに各モデルの動き、および、各特徴点の微小な変位を求めている。したがって、各特徴点の微小な変位の情報は分散せず、変位が無い場合を中心に局在化する。その結果符号化の圧縮率を上げることができる。
【００５２】
なお、本実施例においては動画像符号化装置について説明したが、本発明の動画像処理装置は、本実施例の構成において符号化部１９を持たない構成で実現できる。連続的な動画像よりその動画像を構成する各セグメントの３次元形状情報を抽出し、その情報を用いて種々の画像処理を行うような装置、たとえば、特殊効果装置などに本発明を適用する場合には、その符号化部１９を持たない構成の画像処理装置を適宜適用すればよい。
また、本発明の動画像記録装置は、本実施例の構成にさらに、符号化部１９で符号化された結果を記録媒体上に記録する手段を追加することにより実現できる。そのようにすれば、従来の符号化方法よりはるかに高圧縮率で動画像を記録でき、同一の記録媒体に、より長時間の動画像を記録できる動画像記録装置が提供できる。
【００５３】
次に、本発明の一実施例の動画像復号化装置について、図４を参照して説明する。
図４は、本発明の一実施例の動画像復号化装置３０の構成を示すブロック図である。
動画像復号化装置３０は、復号部３１、記憶部３２、動き処理部３３、投影部３４、変形部３５、再合成部３６、および、合成画像記憶部３７より構成される。
本実施例の動画像復号化装置３０は、伝送路より伝送された符号化された動画像系列を展開し、合成し、出力する動画像復号化装置であって、前述した動画像符号化装置１０と協働して画像処理系を構成する。
【００５４】
以下、各部の動作について説明する。
復号部３１は、伝送路より伝送された信号を受信し、復号化して各情報を取り出し、適宜各部に出力する受信手段である。
復号部３１は、まず、連続的な動画像シーケンスを構成する各セグメントの３次元形状情報を受信し、復号し、記憶部３２に記憶する。その際の各セグメントの位置は、その連続的な動画像の１フレーム目での位置で表される。そして、２フレーム目以降については、符号化されたその各セグメントの全体の移動量（グローバルモーション）と各特徴点の細かな動き（フラクチュエーション）を受信し、復号し、グローバルモーションは動き処理部３３に、フラクチュエーションは変形部３５に出力される。
【００５５】
記憶部３２は、連続的な動画像シーケンスを構成する各セグメントの３次元形状情報と、各セグメントの位置情報を記憶するメモリである。記憶部３２は、復号部３１より入力された連続的な動画像シーケンスを構成する各セグメントの３次元形状情報を初期値として記憶し、以後、動き処理部３３によりその位置を各フレームごと更新される。
動き処理部３３は、復号部３１により入力された動き推定値に基づいて、記憶部３２に記憶されている各セグメントを移動させる。移動させた情報は、投影部３４に出力するとともに、記憶部３２の各セグメントの位置情報を更新する。
【００５６】
投影部３４は、動き処理部３３により移動された各セグメントを２次元画像上に投影する。
変形部３５は、投影部３４により投影された画像の各特徴点の位置に対して、復号部３１より入力されたフラクチュエーションを各特徴点に加え、各特徴点の位置を補正する。
再合成部３６は、変形部３５より入力された各特徴点の情報、および、記憶部３２に記憶されている各特徴点の分析値、および、この連続的な動画像のＤＣ成分に基づいて、画像データを復元する。
合成画像記憶部３７は、再合成部３６により復元された画像データを記憶しておくメモリである。合成画像記憶部３７に記憶されている動画像情報は、適宜表示装置などに出力される。
【００５７】
なお、本発明の動画像再生装置は、本実施例の動画像復号化装置の構成にさらに、記録媒体上に記録された信号を読み出す手段を加えることにより実現できる。そのような動画像再生装置においては、高い圧縮率で長時間記録されている動画像系列を再生でき、さらに、各セグメントの移動量を各フレーム間の移動量を細分した値に設定することにより、原画像のフレームには存在しなかったような各セグメントを微小に送った超スローモーション画像などの動画像を生成することができる。
【００５８】
【発明の効果】
本発明によれば、動画像系列から忠実度の高い３次元形状モデルを抽出することのできる動画像処理装置を提供することができた。
したがって、その３次元形状モデルを使って高い圧縮率で動画像の符号化が可能な動画像符号化装置、および、それを復号する動画像復号化装置を提供することができた。
さらに、動画像を記録媒体上に長時間記録することのできる動画像記録装置、および、それを再生する動画像再生装置を提供することができた。
【図面の簡単な説明】
【図１】本発明の一実施例の動画像符号化装置の構成を示すブロック図である。
【図２】図１に示した動画像符号化装置の動き推定・対応探索部の方法を説明する図であり、３次元空間の点Ｐを望む様子を示し座標系の説明をする図である。
【図３】図１に示した動画像符号化装置の動き推定・対応探索の方法を説明する図である。
【図４】本発明の一実施例の動画像復号化装置の構成を示すブロック図である。
【符号の説明】
１０…動画像符号化装置
１１…画像分析部１２…分析画像記憶部
１３…セグメンテーション部１４…記憶部
１５…動き推定・対応探索部１６…投影部
１７…誤差検出部１８…カルマンフィルタ
３０…動画像復号化装置
３１…復号部３２…記憶部
３３…動き処理部３４…投影部
３５…変形部３６…再合成部
３７…合成画像記憶部[0001]
[Industrial application fields]
  The present invention is a moving image processing apparatus that extracts a three-dimensional shape model of an object in a moving image, which is suitable for use in, for example, transmission / recording of a moving image, and a moving image with a high compression ratio using the shape model. Coding apparatus, moving picture decoding apparatus, moving picture recording apparatus for recording the coded moving picture on a recording medium, and moving picture reproduction apparatus for reproducing the same About.
[0002]
[Prior art]
  A method for compressing a moving image sequence by encoding a moving image using a three-dimensional model of each object in the moving image sequence has been proposed. If the three-dimensional shape of each object and its movement are known, the same moving image sequence as the original moving image sequence can be generated.
  Therefore, for example, in image communication, if a transmission side and a reception side share a three-dimensional model, the transmission side detects motion information of the input image, and the reception side performs image synthesis from the motion information, the image is reproduced. it can. In this case, since only motion information needs to be transmitted, image communication at an ultra-low rate can be expected. More specifically, a method has been actively attempted in which a three-dimensional structural model of a face is transformed into a wire frame and shared between the transmitting side and the receiving side, and only facial features are transmitted to synthesize a facial image.
[0003]
  However, when this encoding method is applied to a natural image, it is impossible to prepare a three-dimensional model in advance, and it is necessary to extract a three-dimensional model from a given moving image sequence. As a method for extracting a three-dimensional shape model from a moving image sequence and encoding the moving image using the model, “OBJECT-ORIENTED analysis” by Hans George Musmann, Michael Hotter, Jorn Ostermann et al. coding of moving images. ”[Signal processing: Image Communication 1 (1989): 117-138, Elsvier SCIENCE PUBLISHERS BV]. According to this method, the motion vector is obtained for the edge portion, the depth is obtained by using the motion vector, and the depth is obtained for the portion other than the edge to estimate the three-dimensional shape. Also, the luminance information is mapped as an attribute on three dimensions and is transmitted together with the interpolated three-dimensional shape data.
[0004]
[Problems to be solved by the invention]
  However, in the method of estimating the three-dimensional shape by the method as described above and compressing the moving image series, it is difficult to obtain the motion vector even for the edge portion where the luminance changes suddenly, and the accurate depth is estimated from the motion vector. Was very difficult. Therefore, even when the depth of the edge portion is interpolated to estimate the three-dimensional shape, an accurate three-dimensional shape cannot be obtained. Also, because of the deviation from the actual shape, extra information increased and the compression rate could not be increased.
  Further, the above-described method has a problem that the amount of initial information is large because luminance information mapped as an attribute on three dimensions is transmitted as initial information.
[0005]
  Therefore, an object of the present invention is to provide a moving image processing apparatus that extracts a three-dimensional shape model with high fidelity from a moving image sequence. It is another object of the present invention to provide a moving picture coding apparatus capable of coding a moving picture at a high compression rate using the three-dimensional shape model, and a moving picture decoding apparatus for decoding the moving picture coding apparatus. It is another object of the present invention to provide a moving image recording apparatus for recording the encoded moving image on a recording medium and a moving image reproducing apparatus for reproducing the same.
[0006]
[Means for Solving the Problems]
  The moving image processing apparatus of the present invention has not the surface data of a three-dimensional object but the three-dimensional position of a visually important point when it is converted into a two-dimensional moving image, and the analysis value of that point as an initial value. The three-dimensional position information is acquired while detecting the movement of the object.
[0007]
  According to the first aspect of the present invention, a point constituting an edge is detected as a feature point in a still image of each frame of an input continuous moving image, and the position and analysis value of each point constituting the edge are detected. Are extracted, and modeling means for obtaining three-dimensional shape information of each segment constituting the image of each frame constituted by the feature points, and storing the three-dimensional shape information of each segment obtained by the modeling means Between the storage means and each frame of the continuous moving imagesoThe estimated value of the amount that each segment constituting the moving image has moved three-dimensionally is obtained by using the three-dimensional shape information of each segment of the previous frame stored in the storage means and the three-dimensional value of each segment of the current frame. A motion estimation means calculated from the shape information, and a sum of squares of the difference between the actual position and the position where each segment is three-dimensionally moved by the amount of movement of each segment estimated by the motion estimation means. Updating means for updating the three-dimensional shape information of each segment stored in the storage means by obtaining a least square estimated value of the position three-dimensionally moved from the actual position so as to be minimized; Have
  The modeling means analyzes the still image of each frame of the input continuous moving image with a filter having a different resolution scale, and extracts the features constituting points constituting the edge, and the continuous analysis Segmentation means for extracting the two-dimensional shape information of the segment constituted by the extracted feature points for the still image of the first frame of the moving image, and the position in the predetermined depth direction in the two-dimensional shape information of each segment Three-dimensional information generating means for adding information and generating an initial value of the three-dimensional shape information of each segment;
  The update unit is configured to detect a difference between a position of each feature point as a result of moving each segment three-dimensionally based on an amount of movement of each segment estimated by the motion estimation unit and a position of each actual feature point. The three-dimensional shape information of each segment stored in the storage means is updated for each frame by obtaining the least square estimated value of the position three-dimensionally moved from the actual position based onAnd
  There is provided a moving image processing apparatus for acquiring three-dimensional shape information of each segment constituting the continuous moving image.
0009]
  Preferably, the updating unit is configured to determine the position of each feature point as a result of moving each segment three-dimensionally by the estimated movement as a state quantity and the actual position of the feature point as an observation quantity. ,The state quantity and the observed quantityThe three-dimensional shape information of each segment is updated by obtaining the least square estimated value of the state quantity using a Kalman filter.
0010]
  According to a second aspect of the present invention, the moving image processing apparatus, the acquired three-dimensional shape information, and the analysis value of each feature point constituting the three-dimensional shape information analyzed by the image analysis means Initial encoding means for encodingAnd the motionDifference detection means for obtaining a difference between the position of each feature point as a result of moving each segment three-dimensionally by the movement estimated by the estimation means, and the actual position of each feature point;MovementEncoding means for encoding the motion estimation value of each segment estimated by the estimation means and the difference in position of each feature point detected by the difference detection means for each frame, and A moving image encoding apparatus for encoding a typical moving image is provided.
0011]
  According to a third aspect of the present invention, there is provided a moving image comprising the moving image encoding device and recording means for recording the continuous moving image encoded by the moving image encoding device on a recording medium. A recording device is provided.
0012]
  According to a fourth aspect of the invention,A moving image decoding apparatus that analyzes a predetermined still image related to an input continuous moving image, detects points constituting an edge as feature points, and decodes the encoded continuous moving image, SaidConstruct encoded continuous videoAnd the feature point as a component3D shape information of each segment and the 3D shape informationSaidAn initial decoding means for decoding the analysis value of each feature point; a motion estimation value of each segment encoded for each frame of the continuous moving image;For each segmentFor each feature pointFrom each corresponding feature point of the input moving imageDecoding means for decoding displacement, moving means for moving the position of each segment three-dimensionally based on a motion estimation value of each segment decoded by the decoding means, and by the moving means Projection means for obtaining a projection image obtained by projecting each segment at the moved position on a two-dimensional screen; and each segment obtained by decoding the position of each feature point of each segment in the projection image by the decoding means For each feature pointFrom each corresponding feature point of the input moving imageDeformation means for moving based on displacement, an image composition means for synthesizing an image based on an image resulting from deformation by the deformation means, and an analysis value of each feature point decoded by the initial decoding means And a moving picture decoding apparatus that decodes a coded continuous moving picture.
0013]
  According to a fifth aspect of the present invention, signal reading means for reading an encoded continuous moving image signal recorded on a recording medium, and the read encoded continuous moving image signal Is provided, and the moving picture decoding apparatus including the moving picture decoding apparatus for synthesizing the continuous moving pictures is provided.
0015]
[Action]
  In the moving image processing apparatus of the present invention, while detecting the movement of an object using the three-dimensional position of a visually important point when the two-dimensional moving image is obtained and the analysis value of the point as a feature point, Dimensional shape information was updated sequentially. Therefore, the most consistent three-dimensional shape information can be extracted through the continuous moving image series.
  In addition, since 3D shape information can be extracted with high accuracy, when the motion between frames of a moving image series is represented by the motion of the entire segment, the amount of minute displacement for each feature point is reduced, and the distribution is changed. Localize around 0. As a result, the compression rate is further improved.
[0016]
  According to the present invention, the above-described moving image processing apparatus that extracts a three-dimensional shape model of an object in a moving image, and a moving image code that encodes and decodes a moving image at a high compression rate using the shape model. There are provided an encoding device, a moving image decoding device, a moving image recording device for recording the encoded moving image on a recording medium, and a moving image reproducing device for reproducing the same.
[0017]
【Example】
  A moving picture coding apparatus according to an embodiment of the present invention will be described with reference to FIG.
  FIG. 1 is a block diagram showing a configuration of a moving image encoding apparatus 10 according to an embodiment of the present invention.
  The moving image encoding device 10 includes an image analysis unit 11, an analysis image storage unit 12, a segmentation unit 13, a storage unit 14, a motion estimation / correspondence search unit 15, a projection unit 16, an error detection unit 17, a Kalman filter 18, and a code. A conversion unit 19.
  This moving image encoding device 10 constitutes an image processing system in cooperation with a moving image decoding device 30 described later.
[0018]
  The moving image encoding apparatus 10 of the present embodiment detects continuous image data by a continuous sequence detection unit (not shown) from a moving image sequence from a VTR or the like, and encodes and records each continuous image data sequence. This is a moving image recording apparatus. In the recording, the first step of extracting the three-dimensional shape information of the segments constituting the sequence from the continuous sequence, and the second step of encoding using the extracted three-dimensional shape information. Divided. Hereinafter, the operation of each unit will be described for each of the first step and the second step.
[0019]
  First, the operation of each unit in the step of extracting the three-dimensional shape information of each segment constituting the moving image from the continuous moving image series will be described.
  A continuous sequence is detected from the moving image series input from the VTR or the like, and is input to the image analysis unit 11.
  The image analysis unit 11 analyzes the image data of each frame that is sequentially input, extracts feature points, and obtains the positions and analysis values of the feature points. In this embodiment, the input image data is the feature image data by analyzing the image data with a plurality of filters having different resolution scales, detecting the points constituting the edges as feature points. The image data is converted into edge image data, and the position and analysis value of each point constituting the edge are extracted.
[0020]
  The analysis image storage unit 12 is a memory that stores feature point images of a continuous sequence analyzed by the image analysis unit 11. The stored feature point information of each frame is sequentially referred to by the motion estimation / correspondence search unit 15, and the feature point information of the first frame is referred to by the segmentation unit 13 and the encoding unit 19.
[0021]
  The segmentation unit 13 performs segmentation based on the position of the feature point of the feature image data of the first frame input from the image analysis unit 11 and the analysis value, and segments constituting the image data(Part or element)And information on the feature points for each segment is stored in the storage unit 14.
  This segmentation is a total of nine types of signals: red, green, blue, lightness, hue, saturation, and Y, I, and Q signals corresponding to television signals.About segmentsThis is performed by recursive threshold processing that extracts features and performs segmentation based on a histogram related to the features.
[0022]
  The storage unit 14 is a storage unit that stores position information X, Y, and Z, an analysis value g, a probability covariance matrix v, and additional information a for each feature point of each segment, and includes a memory. The information stored in the storage unit 14 is that the input continuous sequence has S segments, each segment is composed of Us (s = 1 to S) edges, and each edge has Nsu ( When it is composed of feature points of u = 1 to Us and s = 1 to S), it is expressed as in Expression 1.
[0023]
[Expression 1]

[0024]
  Note that the probability covariance matrix vsun (n = 1 to Nsu, u = 1 to Us, s = 1 to S) is a dispersion of points constituting each edge, and therefore each feature point constituting the same edge. Are assigned the same value.
[0025]
  As the information stored in the storage unit 14, first, information about the first frame is input from the segmentation unit 13, and initial data is generated. Thereafter, every time image data for the second frame and thereafter is input, the contents are updated by a Kalman filter 18 described later.
[0026]
  The motion estimation / correspondence search unit 15 estimates the amount of movement of the segment from the information {Fsun} of each point of the edge image of the image of the previous frame, the edge position of the current frame, and the analysis value. The information {Fsun} is associated with the feature points of the edge image of the current frame.
  The method will be specifically described below.
  First, in FIG. 2, the coordinate system XYZ is a camera coordinate system, the origin of the coordinate system is the center of the lens, and the optical axis is made to coincide with the Z axis in the depth direction. In such a coordinate system, it can be considered that the image of the point P is projected onto a plane that is parallel to the XY plane and that is located at a focal distance f of the camera from the origin. The position of the image of the point P on the projection plane is the position of the pixel on the image input from the camera. A coordinate system xy parallel to the X axis and the Y axis is set with respect to the projection plane, with the intersection of the plane and the Z axis as the origin.
[0027]
  If the coordinates of the point P in the XYZ space are p = (Xp, Yp, Zp) and the coordinates of the point Q, which is an image of the point P on the xy plane, are q = (xq, yq), the coordinate q of the point Q is It is expressed as Equation 2.
[0028]
[Expression 2]

[0029]
A segment s (s = 1 to S) is composed of Us (s = 1 to S) edges, and each edge is information of Nsu points (u = 1 to Us, s = 1 to S). It is assumed that the positions of these points are represented by psun = (Xsun, Ysun, Zsun) (n = 1 to Nsu).
thisConfigure image dataWhen a segment is relatively rotated by Δωx around the X axis, Δωy around the Y axis, Δωz around the Z axis, and translated by Δt = (Δtx, Δty, Δtz), each point constituting this segment The movement amount Δpsun = (ΔXsun, ΔYsun, ΔZsun) of psun is expressed by Equation 3 when the rotations Δωx, Δωy, Δωz and the parallel movement amount Δt around the axes are small.
[0030]
[Equation 3]

[0031]
Assuming that the projection point of the point psun on the xy plane is qsun = (xsun, ysun), the movement amount Δqsun = (Δxsun, Δysun) of the projection point qsun accompanying the movement of the segment is expressed by the following equation (4).
[0032]
[Expression 4]

[0033]
Equation 5 is obtained from Equation 2 and Equation 4.
[0034]
[Equation 5]

[0035]
Formula 3When Expression 5 is applied to M of m = 1 to M among N points, Expression 6 is obtained.
[0036]
[Formula 6]

[0037]
  Δt^t Indicates a transposed matrix of the matrix Δt.
  With respect to Δq, since the point on the image corresponding to the virtual projection point qm ′ obtained by Expression 2 of the three-dimensional position pm obtained before the new image is input is not known, as shown in FIG. It is assumed that the virtual projection image Ip of the dimensional position information is a point closest to the projection feature point qm of the object image Ir.
  When M ≧ 3, the estimated value ΔC ′ of the rotation and parallel movement amount parameter ΔC is expressed by the least square method.UsingEquation 7likeDesired.
[0038]
[Expression 7]

[0039]
The movement amount Δqsun of the three-dimensional position information by ΔC ′ obtained by Expression 7 is calculated from Expression 2, a virtual projection image is newly created by Expression 3, and a similar point is assumed as a corresponding point in the same manner. When the calculation is repeated as shown in Equation 8, the virtual projection image and the object image Ir approach each other.
[0040]
[Equation 8]

[0041]
  This calculation is expressed as Σ | Δqsun |² Is repeated until the value becomes equal to or less than a predetermined value ε, a new image corresponding point qsun with respect to the three-dimensional position information psun of the original image is obtained.
[0042]
  According to the motion estimation / correspondence search method as described above, it is assumed that the object is a rigid body, and the points constituting the three-dimensional position information are constrained by six parameters for rotation and translation. Motion estimation / correspondence search is performed comprehensively, not independently of each point. Accordingly, a consistent correspondence as a whole is obtained for all points, and noise in the three-dimensional position information due to incorrect correspondence is reduced.
[0043]
  The projection unit 16 translates the position information of the information {Fsun} of each segment in parallel and rotationally in the three-dimensional space based on the parallel movement amount t and the rotational movement amount ω obtained by the motion estimation / correspondence search unit 15. Further, the three-dimensional position psun = (Xsun, Ysun, Zsun) of each point of each segment is converted into a position qsun = (xsun, ysun) on the image, and the analysis value gsun is converted to the point qsun on the obtained image. give. The conversion from the three-dimensional position psun to the projection point qsun is performed according to Equation 3.
[0044]
  The error detection unit 17 obtains a difference between the position qsun ′ on the image of the information {Fsun} of each feature point after projection and the edge position qsun of the corresponding input image, further quantizes the difference, and produces a fractionation.(Fluctuation)Ask for.
  As quantization methods, linear quantization with a certain appropriate quantization step (for example, one pixel width), nonlinear quantization with several non-linear quantization steps set, quantization steps are not fixed and input Depending on the nature of the image to be quantized, the quantization step may be changed as appropriate, and an appropriate quantization method may be used according to the required transmission rate and image quality. For example, when a high compression ratio is required, if the quantization step is increased, or if there are many straight lines in the image and linear discontinuities are noticeable due to quantization noise, nonlinear quantization is performed and the fractionation is small. Make the quantization of the part finer.
  The obtained fractionation is input to the Kalman filter 18.
[0045]
  The Kalman filter 18 updates the three-dimensional position psun from the three-dimensional position psun of the information {Fsun} of each segment in the previous image and the corresponding edge position qsun of the input image. The Kalman filter is a filter that can sequentially obtain a least-square estimated value of a state quantity from a time-series observation quantity in a system including noise.
  Here, the least square estimated value means a value estimated based on a known least square method, and information {F of each point of each segment in the previous image. sun } Three-dimensional position p sun And the corresponding edge position q of the input image sun The three-dimensional position p so that the sum of the squares of the errors with sun Say to estimate.
  In this embodiment, the state quantity is the three-dimensional position psun and the two-dimensional position qsun which is the observation quantity. The two-dimensional position qsun includes noise due to quantization of Δqsun. Also, the motion estimation value includes noise.
  The three-dimensional shape {psun} on the plane of the initial value is updated so as to approach the actual three-dimensional shape every time the segment moves by the Kalman filter. The probability covariance matrix vsun at each point information {Fsun} is used to update psun with the probability covariance matrix (3 × 3) of psun, and at the same time the probability covariance matrix vsun is updated.
[0046]
  When the above-described update in the Kalman filter is performed for all frames of the continuous moving image, the storage unit 14 finally stores a high-fidelity three-dimensional shape model for each segment.
[0047]
  Next, a second step of encoding the continuous moving image using the three-dimensional shape model extracted in the first step will be described.
  Also in the second step, the operation of each part is the same as in the first step. However, in the second step, the motion estimation / correspondence search is performed using the final three-dimensional shape information of each segment stored in the storage unit 14, the motion for each segment is extracted, and each feature point is extracted. Find the difference from the actual position.
[0048]
  Therefore, first, in the motion estimation / correspondence search unit 15, the position of the feature point for each frame stored in the analysis image storage unit 12 using the three-dimensional shape information of each segment stored in the storage unit 14. Thus, the overall movement of each segment and the correspondence between each feature point are obtained. The method for obtaining it is the same as in the case of the first step. The motion obtained here is output to the projection unit 16 and the encoding unit 19.
[0049]
  Then, the projection unit 16 translates and rotates the position information of each segment in the three-dimensional space according to the movement amount of each segment obtained by the motion estimation / correspondence search unit 15, and further, 3 of each point of each segment. The dimensional position is converted to a position on the image.
  The error detection unit 17 obtains a difference between the position of each projected feature point on the image and the edge position of the corresponding input image, further quantizes the difference, and obtains a fractionation. The obtained fractionation is output to the encoding unit 19.
[0050]
  The encoding unit 19 encodes the input moving image sequence information and sends it to the transmission path.
  For each continuous image sequence, the encoding unit 19 first encodes the three-dimensional shape information and the analysis value stored in the storage unit 14. For each frame of image data, the motion estimation value output from the motion estimation / correspondence search unit 15 and the fraction output from the error detection unit 17 are encoded and output for each segment.
  The encoded data for each continuous image sequence is recorded in an image recording device such as a VTR.
[0051]
  Thus, according to the moving picture coding apparatus 10 of the present embodiment, a three-dimensional shape model with high fidelity is extracted while estimating the motion of the segment for each frame, and the extracted three-dimensional shape model is extracted. Further, the movement of each model and the minute displacement of each feature point are obtained for each frame. Therefore, the information on the minute displacement of each feature point is not dispersed and is localized around the case where there is no displacement. As a result, the encoding compression rate can be increased.
[0052]
  Although the moving picture encoding apparatus has been described in the present embodiment, the moving picture processing apparatus of the present invention can be realized with a configuration that does not include the encoding unit 19 in the configuration of the present embodiment. The present invention is applied to an apparatus that extracts three-dimensional shape information of each segment constituting a moving image from continuous moving images and performs various image processing using the information, for example, a special effect device. In this case, an image processing apparatus having a configuration that does not include the encoding unit 19 may be applied as appropriate.
  In addition, the moving image recording apparatus of the present invention can be realized by adding means for recording the result encoded by the encoding unit 19 on a recording medium in addition to the configuration of the present embodiment. By doing so, it is possible to provide a moving image recording apparatus that can record moving images at a much higher compression rate than conventional coding methods and can record moving images for a longer time on the same recording medium.
[0053]
  Next, a moving picture decoding apparatus according to an embodiment of the present invention will be described with reference to FIG.
  FIG. 4 is a block diagram showing the configuration of the video decoding device 30 according to an embodiment of the present invention.
  The moving image decoding apparatus 30 includes a decoding unit 31, a storage unit 32, a motion processing unit 33, a projection unit 34, a deformation unit 35, a recombination unit 36, and a composite image storage unit 37.
  A moving picture decoding apparatus 30 according to the present embodiment is a moving picture decoding apparatus that expands, combines, and outputs an encoded moving picture sequence transmitted from a transmission path, the moving picture encoding apparatus described above. The image processing system is configured in cooperation with 10.
[0054]
  Hereinafter, the operation of each unit will be described.
  The decoding unit 31 is a receiving unit that receives a signal transmitted from the transmission path, decodes it, extracts each piece of information, and outputs it to each unit as appropriate.
  First, the decoding unit 31 receives, decodes, and stores the three-dimensional shape information of each segment constituting the continuous moving image sequence in the storage unit 32. The position of each segment at that time is represented by the position in the first frame of the continuous moving image. For the second and subsequent frames, the entire movement amount (global motion) of each encoded segment and the fine movement (fractation) of each feature point are received and decoded, and the global motion is a motion processing unit. In 33, the fractionation is output to the deformation part 35.
[0055]
  The storage unit 32 is a memory that stores the three-dimensional shape information of each segment constituting the continuous moving image sequence and the position information of each segment. The storage unit 32 stores the three-dimensional shape information of each segment constituting the continuous moving image sequence input from the decoding unit 31 as an initial value, and thereafter the position is updated for each frame by the motion processing unit 33. The
  The motion processing unit 33 moves each segment stored in the storage unit 32 based on the motion estimation value input by the decoding unit 31. The moved information is output to the projection unit 34 and the position information of each segment in the storage unit 32 is updated.
[0056]
  The projection unit 34 projects each segment moved by the motion processing unit 33 onto a two-dimensional image.
  The deformation unit 35 corrects the position of each feature point by adding the fractionation input from the decoding unit 31 to each feature point with respect to the position of each feature point of the image projected by the projection unit 34.
  The re-synthesizing unit 36 is based on the information of each feature point input from the deforming unit 35, the analysis value of each feature point stored in the storage unit 32, and the DC component of this continuous moving image. , Restore the image data.
  The composite image storage unit 37 is a memory that stores the image data restored by the recombination unit 36. The moving image information stored in the composite image storage unit 37 is appropriately output to a display device or the like.
[0057]
  Note that the moving picture reproducing apparatus of the present invention can be realized by adding means for reading a signal recorded on a recording medium to the structure of the moving picture decoding apparatus of the present embodiment. In such a moving image reproducing apparatus, a moving image sequence recorded for a long time at a high compression rate can be reproduced, and further, the moving amount of each segment is set to a value obtained by subdividing the moving amount between frames. Thus, it is possible to generate a moving image such as a super slow motion image in which each segment that does not exist in the frame of the original image is sent minutely.
[0058]
【The invention's effect】
  According to the present invention, it is possible to provide a moving image processing apparatus capable of extracting a three-dimensional shape model with high fidelity from a moving image series.
  Therefore, it was possible to provide a moving image encoding apparatus capable of encoding a moving image with a high compression rate using the three-dimensional shape model, and a moving image decoding apparatus for decoding the same.
  Furthermore, it was possible to provide a moving image recording apparatus capable of recording a moving image on a recording medium for a long time, and a moving image reproducing apparatus for reproducing the moving image recording apparatus.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a moving picture encoding apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a method of a motion estimation / correspondence search unit of the video encoding device shown in FIG. 1, and a diagram for explaining a coordinate system by showing a point P in a three-dimensional space. .
FIG. 3 is a diagram for explaining a motion estimation / correspondence search method of the video encoding device shown in FIG. 1;
FIG. 4 is a block diagram showing a configuration of a moving picture decoding apparatus according to an embodiment of the present invention.
[Explanation of symbols]
10: Moving picture encoding apparatus
    11 ... Image analysis unit 12 ... Analysis image storage unit
    13 ... Segmentation part 14 ... Storage part
    15 ... Motion estimation / correspondence search unit 16 ... Projection unit
    17 ... Error detection unit 18 ... Kalman filter
30. Moving picture decoding apparatus
    31 ... Decoding unit 32 ... Storage unit
    33 ... Motion processing unit 34 ... Projection unit
    35 ... Deformation part 36 ... Recombination part
    37. Composite image storage unit

Claims

A point constituting an edge is detected as a feature point in a still image of each frame of an input continuous moving image, a position and an analysis value of each point constituting the edge are extracted, and configured by the feature point Modeling means for obtaining three-dimensional shape information of each segment constituting the image of each frame;
Storage means for storing the three-dimensional shape information of each segment obtained by the modeling means;
An estimate of the amount of each segment is moved three-dimensionally constituting the moving picture among the frames of the continuous moving images, three-dimensional of each segment of the previous frame stored in said storage means Motion estimation means for calculating from the shape information and the three-dimensional shape information of each segment of the current frame;
3 from the actual position so that the sum of the squares of the difference between the position where each segment is moved three-dimensionally and the actual position is minimized by the amount of movement of each segment estimated by the motion estimation means. Updating means for updating the three-dimensional shape information of each segment stored in the storage means by obtaining a least square estimated value of the position moved dimensionally;
The modeling means includes
Image analysis means for analyzing the still image of each frame of the input continuous moving image with a filter having a different resolution scale and extracting a point constituting an edge; and
Segmentation means for extracting the two-dimensional shape information of the segment constituted by the extracted feature points for the still image of the first frame of the continuous moving image;
3D information generating means for adding position information in a predetermined depth direction to the 2D shape information of each segment and generating an initial value of the 3D shape information of each segment;
Have
The update unit is configured to detect a difference between a position of each feature point as a result of moving each segment three-dimensionally based on an amount of movement of each segment estimated by the motion estimation unit and a position of each actual feature point. To obtain the least square estimated value of the position moved three-dimensionally from the actual position, to update the three-dimensional shape information of each segment stored in the storage means for each frame ,
A moving image processing apparatus for acquiring three-dimensional shape information of each segment constituting the continuous moving image.

The updating means includes a state quantity as a position of each feature point as a result of moving each segment three-dimensionally by the estimated motion, an observation quantity as an actual position of the feature point, and the state quantity and the observation quantity. The difference between the two is used as noise, and the three-dimensional shape information of each segment is updated by obtaining a least square estimated value of the state quantity by a Kalman filter.
The moving image processing apparatus according to claim 1 .

The moving image processing apparatus according to claim 1 or 2,
Initial encoding means for encoding the acquired three-dimensional shape information and an analysis value of each feature point constituting the three-dimensional shape information analyzed by the image analysis means;
Difference detection means for obtaining a difference between the position of each feature point as a result of moving each segment three-dimensionally by the motion estimated by the motion estimation means, and the actual position of each feature point;
Encoding means for encoding the motion estimation value of each segment estimated by the motion estimation means and the position difference of each feature point detected by the difference detection means for each frame;
Encoding the continuous moving image;
Video encoding device.

A video encoding device according to claim 3 ;
A moving image recording apparatus comprising: recording means for recording the continuous moving image encoded by the moving image encoding apparatus on a recording medium.

A moving image decoding apparatus that analyzes the still image of each frame of the input continuous moving image and detects the points constituting the edge as feature points and decodes the encoded continuous moving image,
Decodes the three-dimensional shape information of each segment constituting the encoded continuous moving image and having the feature points as constituent elements, and the analysis value of each feature point constituting the three-dimensional shape information Initial decoding means to
The motion estimation value of each segment encoded for each frame of the continuous moving image, and the displacement from each corresponding feature point of the input moving image for each feature point of each segment Decoding means for decoding
Moving means for moving the position of each segment three-dimensionally based on a motion estimation value of each segment decoded by the decoding means;
Projection means for obtaining a projection image obtained by projecting each segment at the position moved by the movement means on a two-dimensional screen;
The position of each feature point of each segment in the projected image is moved based on the displacement from each corresponding feature point of the input moving image for each feature point of each segment decoded by the decoding means. Deformation means to cause,
Image synthesizing means for synthesizing images based on the resulting image transformed by the deforming means and the analysis value of each feature point decoded by the initial decoding means,
Decoding encoded continuous video,
Video decoding device.

A signal reading means for reading an encoded continuous moving image signal recorded on the recording medium;
6. A moving image reproducing apparatus comprising: the moving image decoding apparatus according to claim 5 , wherein the read encoded moving image signal is decoded and the continuous moving image is synthesized.