JP2004240913A

JP2004240913A - Object shape calculation device, object shape calculation method, and object shape calculation program

Info

Publication number: JP2004240913A
Application number: JP2003032073A
Authority: JP
Inventors: Hidenori Takeshima; 秀則竹島; Takashi Ida; 孝井田; Osamu Hori; 修堀; Nobuyuki Matsumoto; 信幸松本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2003-02-10
Filing date: 2003-02-10
Publication date: 2004-08-26
Anticipated expiration: 2023-02-10
Also published as: JP3929907B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain timewise smooth and precise shape information from rough shape information of a subject in a dynamic image. <P>SOLUTION: This device comprises a three-dimensional image formation part 102 for connecting mask data showing the inputted dynamic image and its rough shape timewise to form a three-dimensional time and space image; a child box setting part 103 for setting a child box in the master data; an image parent box retrieval part 104 for retrieving an image parent box similar to an area on the image corresponding to each child box; and a mask correction part 105 for correcting the mask data by use of the area on the mask data corresponding to each image parent box and each child box. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像中に映っている所望の物体の形状を求める装置、方法並びにプログラムに関する。
【０００２】
【従来の技術】
物体と背景とが混在する画像から、ある特定の物体（被写体）のみが写っている領域を抽出する物体領域抽出技術は、静止画のみならず動画像の加工及び編集にも用いられる。例えば、動画像中に映っている人物を抽出し、別の背景に合成する等の利用法がある。
【０００３】
画像から被写体が写っている領域を抽出する際には、予め調べておいた被写体と背景との境界（被写体の形状）を用いて、画像中において被写体が存在する領域の画素を抽出する。従って、被写体の形状を高い精度で取得することが重要である。
【０００４】
従来、画像中の被写体の形状を高い精度で取得する技術の１つに次のようなものがある。まず、被写体のおおまかな形状を与える（例えば、被写体の形状を手動で入力する）。そして、被写体の境界部分のもつ自己相似性を利用し、与えられた形状の補正を行うことにより、精度の高い形状を求めるというものである（例えば、特許文献１）。
【０００５】
【特許文献１】
特開２０００−８２１４５公報
【０００６】
【発明が解決しようとする課題】
特許文献１に示された手法では、被写体の形状補正を動画像のフレームごとに独立して行っている。すなわち、特許文献１の手法では、２次元で形状補正を行い、フレーム間における形状の相関を考慮していないため、時間方向に関して滑らかでない形状が得られることがある。
【０００７】
時間方向に関して滑らかでない形状を用いて動画像から被写体を抽出すると、再生の際に被写体の一部あるいは全部がちらついて見えるという問題が起こることがある。
【０００８】
そこで、本発明では、動画像中の対象となる被写体の概略形状から、時間方向に対しても滑らかであり、かつ、高精度な形状を求めるための装置および方法を提供することを目的とする。
【０００９】
【課題を解決するための手段】
上記課題を解決するために、本発明の物体形状算出装置は、３次元画像を入力する画像入力部と、前記３次元画像中で物体が占める概略の領域を表す３次元形状データを入力する形状入力部と、前記３次元形状データを用いて、前記物体と前記物体以外の背景との境界面に沿って、前記形状データの修正処理の単位となる３次元領域であるチャイルドボックスを複数設定するチャイルドボックス設定部と、前記３次元画像中で前記チャイルドボックスと同じ位置にある領域の画素情報と相似し、前記チャイルドボックスより体積の大きい前記３次元画像中の領域である画像用ペアレントボックスを探索する画像用ペアレントボックス探索部と、前記３次元形状データ中で前記チャイルドボックスが示す領域の形状データを、前記３次元形状データ中で前記画像用ペアレントボックスと同じ位置にある領域の形状データを前記チャイルドボックスの大きさに縮小したもので修正する形状データ修正部と、修正した３次元形状データを出力する出力部とを備える。
【００１０】
本発明の物体形状算出方法は、３次元画像データを入力する画像入力ステップと、前記３次元画像において物体が占める概略の領域を表す３次元形状データを入力する形状入力ステップと、前記３次元形状データを用いて、前記物体と前記物体以外の背景との境界面に沿って、前記３次元形状データの修正処理の単位となる領域であるチャイルドボックスを複数設定するチャイルドボックス設定ステップと、前記３次元画像中で前記チャイルドボックスと同じ位置にある領域の画素情報と相似し、前記チャイルドボックスより体積の大きい前記３次元画像中の領域である、画像用ペアレントボックスを探索する画像用ペアレントボックス探索ステップと、前記３次元形状データ中で前記チャイルドボックスが示す領域の形状データを、前記３次元形状データ中で前記画像用ペアレントボックスと同じ位置にある領域の形状データを前記チャイルドボックスの大きさに縮小したもので置き換える置換ステップと、前記３次元形状データを出力する出力ステップとを備える。
【００１１】
本発明の物体形状算出プログラムは、コンピュータに、３次元画像データを入力する画像入力ステップと、前記３次元画像において物体が占める概略の領域を表す３次元形状データを入力する形状入力ステップと、前記３次元形状データを用いて、前記物体と前記物体以外の背景との境界面に沿って、前記３次元形状データの修正処理の単位となる領域であるチャイルドボックスを複数設定するチャイルドボックス設定ステップと、前記３次元画像中で前記チャイルドボックスと同じ位置にある領域の画素情報と相似し、前記チャイルドボックスより体積の大きい前記３次元画像中の領域である、画像用ペアレントボックスを探索する画像用ペアレントボックス探索ステップと、前記３次元形状データ中で前記チャイルドボックスが示す領域の形状データを、前記３次元形状データ中で前記画像用ペアレントボックスと同じ位置にある領域の形状データを前記チャイルドボックスの大きさに縮小したもので置き換える置換ステップと、前記３次元形状データを出力する出力ステップとを実行させる。
【００１２】
また、本発明の物体形状算出プログラムは、コンピュータを、３次元画像を入力する画像入力手段、前記３次元画像中で物体が占める概略の領域を表す３次元形状データを入力する形状入力手段、前記３次元形状データを用いて、前記物体と前記物体以外の背景との境界面に沿って、前記形状データの修正処理の単位となる３次元領域であるチャイルドボックスを複数設定するチャイルドボックス設定手段、前記３次元画像中で前記チャイルドボックスと同じ位置にある領域の画素情報と相似し、前記チャイルドボックスより体積の大きい前記３次元画像中の領域である画像用ペアレントボックスを探索する画像用ペアレントボックス探索手段、前記３次元形状データ中で前記チャイルドボックスが示す領域の形状データを、前記３次元形状データ中で前記画像用ペアレントボックスと同じ位置にある領域の形状データを前記チャイルドボックスの大きさに縮小したもので修正する形状データ修正手段、並びに、修正した３次元形状データを出力する出力手段として機能させる。
【００１３】
【発明の実施の形態】
（第１の実施形態）以下、図面を参照して本発明の第１の実施形態について説明する。
【００１４】
（概要）本実施形態の物体形状算出装置は、動画像内における所望の物体についての形状を求めるのに好適な装置である。
【００１５】
本装置は、形状を求めたい物体（被写体）が写っている動画像と、この動画像の全フレームにおける被写体の概略形状データとを入力として受ける。そして、動画像を用いて概略形状データを修正して高精度な形状データを算出して出力する。
【００１６】
本装置では、概略形状データはビットマップとする。例えば、被写体に相当する画素は１、被写体でない（例えば背景）画素は０という画素値を持った画像である。本装置では動画像の各フレームの画素数と、各フレームの概略形状データの画素数とは等しいものとする。
【００１７】
尚、概略形状データの画素数は画像の画素数と一致していなくても良い。例えば、概略形状データの縦横の画素数を画像の縦横の画素数よりも多く（例えば縦横それぞれ２倍）すれば、形状算出処理の高精度化が期待できる。一方、概略形状データの縦横の画素数を画像の縦横の画素数よりも少なく（例えば縦横それぞれ半分）すれば、形状算出処理の高速化が期待できる。
【００１８】
本装置では、動画像を、各フレームを時間方向に重ねた３次元時空間画像として扱う。すると、被写体はフレームの縦横に加えて時間方向にも広がる３次元物体となる（以下、３Ｄ物体）。各フレームの概略形状データも同様にして時系列的に積み上げて、３次元物体の概略形状データとして扱う（以下、３Ｄ概略形状データ）。
【００１９】
本装置では、３次元画像における３Ｄ物体の表面の自己相似性を利用して、３Ｄ概略形状データを修正することにより、時間方向に関しても滑らかな形状を求める。３次元画像とは、３方向の広がりを持つ画像のことである。例えば（縦、横、奥行き）の３方向の広がりを持つ画像である。（縦、横、時間）の３方向の広がりを持つ３次元時空間画像も３次元画像の一種である。
【００２０】
具体的には次のような処理を行う。まず、３Ｄ概略形状データにおいて、物体と非物体（背景）との境界面に直方体状の領域を複数設定する。これらの直方体状の領域それぞれをチャイルドボックスと呼ぶ。チャイルドボックスを設定する際には、内部に境界面が含まれるようにする。
【００２１】
次に、３次元画像において、３Ｄ概略形状データで設定した各チャイルドボックスと同じ位置にある領域に注目する。これらの領域を「画像用チャイルドボックス」と呼ぶ。
【００２２】
そして、３次元画像において、各画像用チャイルドボックスについて、各画像用チャイルドボックスより体積が大きくて、かつ、内部に含まれる画素情報が各画像用チャイルドボックスと最も相似する直方体状領域である画像用ペアレントボックスを求める。
【００２３】
異なる大きさの２つの領域が「相似する」とは、２つの領域の大きさを揃えた時に、内部に含まれる画素情報の相関が高くなる、ことを意味する。
【００２４】
従って、最も相似する画像用ペアレントボックスを求めるには、例えば、複数の画像用ペアレントボックス候補領域について、各候補領域の大きさを画像用チャイルドボックスの大きさに縮小した上で、画像用チャイルドボックスとの相関を求め、最も相関が高い候補領域を画像用ペアレントボックスとして選択すれば良い。相関については、内部に含まれる画素値の類似度で判定する。
【００２５】
最後に、３Ｄ概略形状データにおいて、３次元画像で探索した各画像用ペアレントボックスと同じ位置にある領域に注目する。これらの領域を「ペアレントボックス」と呼ぶ。そして、３Ｄ概略形状データにおいて、各チャイルドボックスを、それぞれに対応するペアレントボックスをチャイルドボックスのサイズに縮小したもので置換する。置換により３Ｄ概略形状データの境界面が３Ｄ物体の表面に近づく。修正後の３Ｄ概略形状データを３Ｄ形状データと呼ぶ。
【００２６】
前述したように、３Ｄ概略形状データは各フレームの概略形状データを時系列的に積み上げたものである。従って、３Ｄ形状データを時系列的に切り出すことで、各フレームにおける被写体の形状データを得ることができる。
【００２７】
（構成）以下、図面を参照して本実施形態の物体形状算出装置の構成を説明する。
【００２８】
本装置は、以下の構成を備える。
・被写体が写っている動画像の全フレームの画像と、被写体の概略形状を表す全フレーム分のマスクデータとを入力する入力部１０１。
・動画像の全フレーム及び全フレーム分のマスクデータのそれぞれを、時間方向に連結し、３次元時空間画像及び３次元マスクデータを生成する３次元化部１０２。
【００２９】
・３次元マスクデータにおいて、被写体とそれ以外（以下、背景）との境界面に沿って、チャイルドボックスと呼ばれる直方体状領域を設定するマスク用チャイルドボックス設定部１０３。
・３次元時空間画像において、３次元マスクデータで設定したチャイルドボックスと同じ位置にある領域である、画像用チャイルドボックスに注目し、各画像用チャイルドボックスより体積が大きくて、かつ、内部に含まれる画素情報が各画像用チャイルドボックスと最も相似する直方体状領域である画像用ペアレントボックスを探索する画像用ペアレントボックス探索部１０４。
【００３０】
・３次元マスクデータにおいて、３次元時空間画像で探索した画像用ペアレントボックスと同じ位置にある領域である、ペアレントボックスに注目し、３次元マスクデータにおいて、各チャイルドボックスそれぞれを、それぞれに対応するペアレントボックスをチャイルドボックスと同じ大きさに縮小したもので置換して３次元マスクデータを修正するマスク修正部１０５。
・修正された３次元マスクデータを時系列的に並んだデータに戻す２次元化部１０６。
・２次元化されたマスクデータを出力するマスク出力部１０７。
【００３１】
以下、各部について説明する。
【００３２】
（入力部１０１）入力部１０１では、被写体が写っている動画像の全フレームと被写体の概略形状を表すマスクデータの全フレーム分とを入力する。マスクデータは、被写体が存在する領域と存在しない領域とが識別可能であれば、どのように与えても良い。本実施形態では人間が手動で描いた概略形状を用いる。
【００３３】
（３次元化部１０２）３次元化部１０２では、動画像の全フレームと全マスクデータとをそれぞれ時間方向に連結して、それぞれ３次元時空間画像（３Ｄ画像）及び３次元時空間マスクデータ（３Ｄマスクデータ）を生成する。「（時間方向に）連結する」とは、各フレームの２次元画像を時間方向に厚さ△ｔを持った３次元時空間画像とみなして、時間方向に時間順に繋げていくことを意味する。
【００３４】
図２は連結する様子を説明する図である。フレーム２０１−１〜２０１−５には、動いている被写体２０３が写っている。また、各フレームには概略形状２０２が与えられている。尚、図２では説明を簡単にするために概略形状２０２と被写体２０３とが同一画像上にあるように表現してあるが、実際は別々のデータとして存在する。
【００３５】
フレーム２０１−１〜２０１−５の被写体２０３だけを抜き出して表現したのが被写体２０４である。被写体２０４は時間とともに動いているので、フレーム２０１−１〜２０１−５を時間方向に連結して３次元時空間画像にすると、３次元物体画像２０５のように曲がった筒状の３次元画像として表現される（動いていなければ、真っ直ぐな筒状になる）。尚、図２では被写体２０４は画面内で横方向に動いていない（縦方向のみに動いている）ので、３次元物体画像２０５も縦方向（垂直方向）にのみ曲がっている。
【００３６】
概略形状２０２についても同様で、連結を行うと、物体が存在する領域は筒状の３次元画像として表現される。
【００３７】
（チャイルドボックス設定部１０３）マスク用チャイルドボックス設定部１０３では、３Ｄマスクデータにおける被写体と背景との境界面に沿ってチャイルドボックスと呼ばれる直方体状の関心領域を設定する。チャイルドボックス内に境界面が入るように設定を行う。望ましくは、境界面がチャイルドボックスの重心あるいはその近傍を通るようにすると良い。
【００３８】
図３（Ａ）及び（Ｂ）はチャイルドボックスを設定する様子を説明する図である。この図は説明を簡単にするために、３Ｄマスクデータの一部分を時間方向に垂直な面に沿って切り出して表現している。図３（Ａ）に示すように、３Ｄマスクデータ３０１の境界面に沿ってチャイルドボックス３０２を設定する。この時、チャイルドボックスの重心が境界面を通るようにすると良い。そして、図３（Ｂ）に示すように、３Ｄマスクデータ３０１の境界面を埋め尽くすようにチャイルドボックス３０２を複数設定する。
【００３９】
（画像用ペアレントボックス探索部１０４）画像用ペアレントボックス探索部１０４では、３Ｄ画像において、３Ｄマスクデータで設定した各チャイルドボックスと同じ位置にある領域に注目する。これらの領域を画像用チャイルドボックスと呼ぶ。そして、各画像用チャイルドボックスについて、各画像用チャイルドボックスと最も相似する領域であり、かつ、各画像用チャイルドボックスより体積が大きい直方体状の領域である画像用ペアレントボックスの探索を行う。
【００４０】
前述したように、探索にあたっては複数の画像用ペアレントボックス候補領域のうち、画像用チャイルドボックスと最も相似する候補領域を画像用ペアレントボックスとする。具体的には、画像用ペアレントボックス候補領域と画像用チャイルドボックスとの間で内部に含まれる画素の類似度を計算することにより行う。類似度の計算は、例えば、画像用チャイルドボックスを画像用ペアレントボックスの大きさに引き伸ばした上で、画素値の差分絶対値和（ＳＡＤ）若しくは差分２乗和（ＳＳＤ）を求めれば良い。あるいは各候補領域を画像用チャイルドボックスの大きさに縮小して画素値のＳＡＤ若しくはＳＳＤを求めても良い。
【００４１】
画像用ペアレントボックス候補は、画像用チャイルドボックスの重心を中心とする所定の範囲内（例えば１０画素×１０画素×１０画素）に重心を持つものを用いる。
【００４２】
（マスク修正部１０５）マスク修正部１０５では、３Ｄマスクデータにおいて、３Ｄ画像で探索した各画像用ペアレントボックスと同じ位置にある領域に注目する。これらの領域をペアレントボックスと呼ぶ。そして、各ペアレントボックスと各チャイルドボックスとを用いて、３Ｄマスクデータの修正を行う。
【００４３】
図４（Ａ）〜（Ｄ）はマスクデータの修正を説明する図である。尚、説明を簡単にするために図面は２次元で表現してある。
【００４４】
図４（Ａ）は３Ｄマスクデータ４００上にペアレントボックス４０１とチャイルドボックス４０２との設定が終わった段階である。これから３Ｄマスクデータ４００の境界面４０３を修正していく。ここでは仮想的に被写体と背景との真の境界面４０４も表示してある。
【００４５】
ペアレントボックス４０１を設定する位置は、３Ｄ画像においてチャイルドボックス４０２との相似性が最高になる位置になっている。３Ｄ画像における相似性が高いということは、真の境界面４０４に関して相似性が高いことになる。従って、ペアレントボックス４０１とチャイルドボックス４０２とでは、真の境界面４０４の部分がほぼ一致するようになっている。
【００４６】
３Ｄマスクデータ４００の修正は次のように行う。まず、３Ｄマスクデータ４００から、ペアレントボックス４０１で指定される領域を抽出する。図４（Ｂ）はペアレントボックス４０１で指定される領域のみを示した図である。
【００４７】
次に、抽出した領域をチャイルドボックス４０２と同じ大きさに縮小して、置換ボックス４０５を生成する。図４（Ｃ）は置換ボックス４０５を示した図である。
【００４８】
そして、３Ｄマスクデータ４００において、チャイルドボックス４０２で示される領域の画素を置換ボックス４０５の画素で置換する。図４（Ｄ）は置換を行った後の３Ｄマスクデータ４００を示す図である。置換を行うことにより、境界面４０３は真の境界面４０４に接近する。
【００４９】
３Ｄマスクデータに設定した全てのチャイルドボックスに関して、上述した修正を行う。
【００５０】
（２次元化部１０６）２次元化部１０６は、修正した３Ｄマスクデータを時間方向に△ｔ間隔で切り出し、２次元のマスクデータとする。
【００５１】
（マスクデータ出力部１０７）マスクデータ出力部１０７は、２次元のマスクデータを出力する。
【００５２】
（実現手段）本実施形態の物体形状算出装置は、コンピュータで動作させるプログラムとして実現される。すなわち、コンピュータに上述の各部の機能を実現させる物体形状算出プログラムである。尚、本装置の一部あるいは全部を半導体集積回路等のハードウエアとして実現しても良い。
【００５３】
図５は、本実施形態の物体形状算出プログラムを動作させるコンピュータの一例を示す図である。
【００５４】
このコンピュータは中央演算処理装置５０１と、本テクスチャ圧縮プログラム及び処理途中のデータの一時記憶を行うメモリ５０２と、本テクスチャ圧縮プログラム、圧縮前のデータや圧縮後のデータを格納する磁気ディスクドライブ５０３と、光ディスクドライブ５０４を備える。
【００５５】
また、ＬＣＤやＣＲＴ等の表示装置５０８と、表示装置５０８に画像信号を出力するインターフェースである画像出力部５０５と、キーボードやマウス等の入力装置５０９と、入力装置５０９からの入力信号を受けるインターフェースである入力受け付け部５０６とを備える。
【００５６】
また、この他に、外部装置との接続インターフェース（例えばＵＳＢ、ネットワークインターフェースなど）である出入力部５０７を備える。
【００５７】
本実施形態の物体形状算出プログラムは予め磁気ディスクドライブ５０２に格納しておく。そして、物体形状算出する際に、磁気ディスクドライブ５０３から読み出されてメモリ３０２に格納され、中央演算処理装置５０１において実行される。
【００５８】
実行結果として生成されるマスクデータは磁気ディスクドライブ５０３に格納する。また、物体形状算出の過程において、利用者に処理状況等の情報を提示したり、あるいは何らかの入力を促すためのＧＵＩを適宜画像出力部５０５を介して表示装置５０８に表示させる。
【００５９】
尚、物体形状算出の結果生じるマスクデータは、磁気ディスクドライブ５０３に格納するだけでなく、プロセス間通信の機能（共有メモリ、パイプ等）を用いて、マスクデータを必要とする他のプログラムに出力しても良い。
【００６０】
また、本実施形態では、物体形状算出プログラムは、コンピュータで動作しているＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）と協働して動作する。
【００６１】
（動作）以下、図面を参照しながら本実施形態の動作を説明する。図６は本実施形態の物体形状算出装置の動作を説明する図である。
【００６２】
（ステップ６０１）被写体が写っている動画像Ｉ＝｛Ｉ_１、Ｉ_２・・・Ｉ_ｎ｝と、被写体の概略形状のマスクデータＲ＝｛Ｒ_１、Ｒ_２・・・Ｒ_ｎ｝とを入力する。そして、入力された動画像及び概略形状マスクデータを１フレームずつ順次蓄積する。尚、Ｉ_ｋ及びＲ_ｋは、それぞれ１フレーム分のデータを表す。
【００６３】
（ステップ６０２）動画像の各フレームＩ_ｋと各フレームの概略形状マスクデータＲ_ｋとのそれぞれを時間方向について連結して、それぞれ３次元時空間画像Ｉ（３Ｄ画像Ｉ）、３次元時空間マスクデータＲ（３ＤマスクデータＲ）を生成する。
【００６４】
（ステップ６０３）３ＤマスクデータＲにおいて、被写体と背景との境界面に沿ってチャイルドボックスＣ＝｛ｃ（１）、ｃ（２）・・・ｃ（Ｎ）｝を設定する。３Ｄ画像Ｉにおいて、各チャイルドボックスＣ（ｉ）と同じ位置にある領域にも注目し、これらをチャイルドボックスＣ’＝｛ｃ’（１）、ｃ’（２）・・・ｃ’（Ｎ）｝と呼ぶ。
【００６５】
本実施形態では各チャイルドボックスｃ（ｉ）及びｃ’（ｉ）の形状を直方体とし、大きさは縦（３Ｄマスクデータの空間方向の一方）に８画素、横（３Ｄマスクデータの空間方向の他方）に８画素、奥行き（３Ｄマスクデータの時間方向）に４画素としておく。ただし、形状及び大きさはこれ以外のものでも良い。例えば一辺が８画素の立方体でも良い。
【００６６】
各チャイルドボックスｃ（ｉ）及びｃ’（ｉ）は、それぞれの重心が境界面を通るように設定する。また、チャイルドボックスＣは境界面を埋め尽くすように多数設定する。
【００６７】
（ステップ６０４）３Ｄ画像Ｉにおいて、ペアレントボックスＰ’＝｛ｐ’（１）、ｐ’（２）・・・ｐ’（Ｎ）｝を探索して設定する。また、３Ｄマスクデータにおいて、各ペアレントボックスｐ’（ｉ）と同じ位置にある領域にも注目し、これらをペアレントボックスＰ＝｛ｐ（１）、ｐ（２）・・・ｐ（Ｎ）｝と呼ぶ。
【００６８】
３Ｄ画像Ｉにおいて、各チャイルドボックスｃ’（ｉ）のペアレントボックスｐ’（ｉ）を探索する。探索は、各チャイルドボックスｃ’（ｉ）の重心を中心とする所定の範囲内（例えば１０画素×１０画素×１０画素の範囲）で行う。この範囲内に重心を持つペアレントボックスｐ’（ｉ）の候補とチャイルドボックスｃ’（ｉ）との間での相似性を求め、最も相似する候補をペアレントボックスｐ’（ｉ）とする。
【００６９】
相似性の算出は、ペアレントボックスｐ’（ｉ）の候補とチャイルドボックスｃ’（ｉ）との間で内部に含まれる画素の類似度を計算することにより行う。類似度の計算は、例えば、チャイルドボックスｃ’（ｉ）をペアレントボックスｐ’（ｉ）の大きさに拡大した上で、画素値の差分絶対値和（ＳＡＤ）を求めれば良い。
【００７０】
別の類似度の計算としては、絶対値差分和に限らず、画素値の差分２乗和（ＳＳＤ）を用いても良い。いわゆるブロックマッチングを３次元的に行うので、ブロックマッチングにおいて類似度や相関を求める手法を応用することができる。
【００７１】
ペアレントボックスＰ’とチャイルドボックスＣ’との関係は、ペアレントボックスＰ’の方がチャイルドボックスＣ’よりも体積が大きいことを除いて自由である。チャイルドボックスＣ’の３辺を均等な倍率で拡大したものでなくても良い。本実施形態ではペアレントボックスＰは全て（１６画素×１６画素×８画素）とする。ペアレントボックスＰとチャイルドボックスＣの関係もまた同様である。
【００７２】
例えば、チャイルドボックスＣ’が（８画素×８画素×８画素）の立方体であった時に、ペアレントボックスＰ’が（１２画素×１６画素×２４画素）の直方体であっても良い。あるいは、チャイルドボックスＣが（８画素×８画素×４画素）の直方体であった時に、ペアレントボックスＰが（１２画素×１２画素×４画素）の直方体であっても良い。
【００７３】
（ステップ６０５）３ＤマスクデータＲにおいて、各チャイルドボックスｃ（ｉ）と、これに対応するペアレントボックスｐ（ｉ）とを用いて３ＤマスクデータＲの修正を行う。
【００７４】
まず、３ＤマスクデータＲから各チャイルドボックスｃ（ｉ）に対応するペアレントボックスｐ（ｉ）が示す領域の画素情報を抽出する。次に、抽出した画素情報をチャイルドボックスと同じ大きさに縮小した置換ボックスＲ’を生成する。そして、３Ｄマスクデータにおいてチャイルドボックスが示す領域の画素情報を置換ボックスＲ’の画素情報で置き換えて修正する。
【００７５】
尚、置き換え処理は１回でも良いが、複数回行うとよりマスクデータを修正する効果が高くなる。
【００７６】
置き換え処理を１回行うことにより、真の境界面と３ＤマスクデータＲにおける境界面との誤差は、各チャイルドボックスｃ（ｉ）の大きさに対する、それぞれに対応するペアレントボックスｐ（ｉ）の大きさの比率の逆数倍になる。
【００７７】
誤差が収束するのに必要な置き換え回数、すなわち誤差が１画素以下となる条件は、各チャイルドボックスｃ（ｉ）と対応するペアレントボックスｐ（ｉ）との大きさの比率によって変わる。
【００７８】
各チャイルドボックスｃ（ｉ）の最大の辺の長さをＬとすると、置き換え処理後に残る誤差は、Ｌ／２に各チャイルドボックスｃ（ｉ）に対する、それぞれに対応するペアレントボックスｐ（ｉ）との大きさの比率の逆数を、置き換え処理を行った回数だけ乗じた値になる。この残る誤差が１未満になった時点で収束したとみなせる。
【００７９】
上述したように、本実施形態では各チャイルドボックスｃ（ｉ）は直方体で縦・横・時間方向にそれぞれ８画素・８画素・４画素の大きさであり、対応するペアレントボックスが直方体で縦・横・時間方向にそれぞれ１６画素・１６画素・８画素の大きさであるから、上述の残る誤差が１未満になるには、３回の置き換え処理が必要であることになる。
【００８０】
本実施形態では、上述した残る誤差を収束させるという観点から本ステップの処理を３回繰り返すこととする。
【００８１】
（ステップ６０６）修正後の３ＤマスクデータＲからマスクデータＲ_ｋを切り出して出力する。
【００８２】
ステップ６０２におけるマスクデータの連結処理と、反対の処理を行う。ステップ６０２では２次元のマスクデータを時間方向に連結したが、このステップでは時間方向に関して分離して２次元のマスクデータとする。
【００８３】
（本実施形態の効果）以上、本実施形態の物体形状算出装置ならば、時間方向についても形状算出を行っているので、動画像として再生した時に従来よりもちらつきを抑制された形状情報を得ることができる。
【００８４】
（第１の変形例）各チャイルドボックスｃ（ｉ）の大きさが大きくなるほど、１度に３ＤマスクデータＲを修正できる範囲が広くなるが、修正の精度は悪くなるという問題がある。
【００８５】
そこで、ステップ６０３〜ステップ６０５の処理を、大きいｃ（ｉ）及びｃ’（ｉ）で実施してから、より小さなｃ（ｉ）及びｃ’（ｉ）で実施し、さらに小さなｃ（ｉ）及びｃ’（ｉ）で実施するということを適宜繰り返せば、より広い範囲に対し効率良く高い精度で３ＤマスクデータＲを修正することができる。
【００８６】
例えば、最初はｃ（ｉ）及びｃ’（ｉ）の大きさを（８画素×８画素×８画素）で実施し、２回目は（４画素×４画素×４画素）で行うことが考えられる。あるいは（１６画素×１６画素×４画素）→（８画素×８画素×４画素）→（４画素×４画素×４画素）で行っても良い。
【００８７】
（第２の変形例）ステップ６０５の処理を複数回行って修正効果を高める旨を説明したが、ステップ６０３からステップ６０５までの処理を複数回（例えば２〜３回）繰り返しても良い。すなわち、チャイルドボックスＣの設定位置を変えて３ＤマスクデータＲの修正を行うということである。
【００８８】
特に、設定されたチャイルドボックスＣの一部（または全部）が真の境界面から外れている場合に有効である。繰り返し時にチャイルドボックスＣを再設定することで、チャイルドボックスＣが真の境界面を通るように設定される可能性が高くなるからである。
【００８９】
（第３の変形例）これまでは、動画像の各フレームを時間方向に連結した３次元時空間画像を扱ってきたが、より一般的に３次元画像を扱うことができる。つまり、今までは（Ｘ方向、Ｙ方向、時間方向）という３次元時空間であったが、（Ｘ方向、Ｙ方向、Ｚ方向）という３次元空間でも物体形状の算出を行える。
【００９０】
３次元画像としては、例えばＣＴやＭＲＩで取得した３次元物体のある方向に対する断面画像の集合を用いても良い。すなわち、入力部１０１で断面画像の集合とマスクデータとを入力し、３次元化部１０２で断面画像を断面に垂直な方向に連結して３次元画像を生成する。
【００９１】
（第４の変形例）ステップ６０４でペアレントボックスＰ’を探索する処理を高速に行うために、３Ｄ画像Ｉを、（ペアレントボックスの大きさ）／（チャイルドボックスの大きさ）倍に拡大した画像を用意しておいても良い。
【００９２】
ステップ６０４では各チャイルドボックスｃ’（ｉ）が示す領域を拡大してペアレントボックス候補が示す領域と比較していたが、これとは逆にペアレントボックス候補が示す領域を各チャイルドボックスｃ’（ｉ）のサイズに縮小して比較しても良い。この場合、３Ｄ画像Ｉを（チャイルドボックスの大きさ）／（ペアレントボックスの大きさ）倍に縮小した画像を用意しておいても良い。
【００９３】
（第５の変形例）概略形状データはベクトル情報であっても良い。すなわち、概略形状データは各フレームにおける被写体の形状を表す線分の集合であっても良い。
【００９４】
この場合、ステップ６０５における置換処理のたびに線分の集合を構成する要素を更新していく。
【００９５】
（第２の実施形態）以下、本発明の第２の実施形態について説明する。基本的な構成は第１の実施形態と同様である。
【００９６】
上述のステップ６０４においてペアレントボックスＰ’を探索する際に、３Ｄ画像Ｉの端の部分において３次元画像の外側を参照したくなることがある。このようなことは、例えば動画像の各フレームの端の方に被写体が写っている場合に起こり易い。
【００９７】
３Ｄ画像Ｉの外側への参照が起こらないように探索範囲を制限することも可能であるが、このような制限を課すとペアレントボックスＰ’とチャイルドボックスＣ’との相似性が低くなり、境界面の精度が悪くなることがある。
【００９８】
特に３次元時空間画像では、開始数フレーム及び終端数フレームの形状が不自然なものになるため、その影響が大きい。
【００９９】
そこで、３Ｄ画像Ｉの各方向について外側に拡張する。拡張した部分の画素については、最も近い端の画素をコピーするパディング処理を行うことにより補う。また、３次元マスクデータＲについても同様にパディング処理を行う。そして、パディング処理を行ってからペアレントボックスＰの探索を行う。
【０１００】
「最も近い端の画素をコピーする」とは、例えば、時間ｔ＝０が始点で時間ｔ＝１００が終点の時空間画像において、時間ｔ＝−１の位置に時間ｔ＝０と同じ画像をコピーする処理のことである。
【０１０１】
尚、本実施形態は３次元時空間画像に限らず、一般的な３次元画像においても適用可能である。
【０１０２】
このように、パディングを行った３次元画像及び３次元マスクデータを利用することにより、３次元マスクデータをより精度良く修正して、より正確な物体の形状を求めることができる。
【０１０３】
（第３の実施形態）以下、本発明の第３の実施形態について説明する。
【０１０４】
（概要）これまでは、動画像全体を一度に扱って被写体の形状を求めていた。本実施形態では動画像の数フレーム単位で逐次的に被写体の形状を求めていく。
【０１０５】
これにより、一度に扱うデータ量を減らすことができる。例えば非常に長い時間にわたる動画像を、少ないメモリの計算機で処理するのに有用である。
【０１０６】
図７は、本実施形態における処理の概要を説明する図である。図７は、動画像における被写体７０１のみを時間順に並べた図である。被写体７０１は動いているので、時間の経過とともに位置がずれている。
【０１０７】
本実施形態では、まず動画像の数フレーム分について形状を求める（第１の処理）。それから、次の数フレームに関して形状を求める（第２の処理）。このような動作を繰り返して（第３の処理、第４の処理・・・）、最終的に動画像全体にわたって、被写体の形状を求める。
【０１０８】
（動作）図８は本実施形態の物体形状算出装置の動作を説明する図である。以下の説明では一度にｎフレームずつ扱うものとしている。
【０１０９】
（ステップ８０１）動画像のうち、形状算出処理対象となるｎフレームの先頭位置の初期値を設定する。本実施形態では、第１フレームから処理を行っていく。
【０１１０】
（ステップ８０２）動画像のうち、形状算出処理対象となるｎフレーム分の画像及びマスクデータをメモリ上に読み込む。
【０１１１】
ｎフレーム分の画像を３Ｄ画像Ｉとして扱う。また、ｎフレーム分のマスクデータを３ＤマスクデータＲとして扱う。
【０１１２】
（ステップ８０３）第１の実施形態と同様にして、３ＤマスクデータＲと３Ｄ画像Ｉに対して、チャイルドボックスＣ、Ｃ’を設定する。
【０１１３】
（ステップ８０４）第１の実施形態と同様にして、３Ｄ画像Ｉにおいて、各チャイルドボックスＣ’に対応するペアレントボックスＰ’を探索する。探索したペアレントボックスＰ’と同位置にある３ＤマスクデータＲ上の領域をペアレントボックスＰとする。
【０１１４】
（ステップ８０５）第１の実施形態と同様にして、３ＤマスクデータＲにおいて、チャイルドボックスＣの画素情報をペアレントボックスＰを縮小したもので置換し、３ＤマスクデータＲの修正を行う。
【０１１５】
（ステップ８０６）３ＤマスクデータＲをフレーム単位に分割して出力する。
【０１１６】
（ステップ８０７）形状算出処理対象となるｎフレームの先頭位置をシフトする。
【０１１７】
（ステップ８０８）最終フレームまで形状算出処理が完了した場合は終了する。未完了の場合は次のｎフレームについてステップ８０２から処理を行う。
【０１１８】
（変形例）ここでは、先頭フレームから時間軸の順方向に対して処理を行う例を示したが、順序はこれに限定されず、例えば逆方向であっても良い。
【０１１９】
また、入力されている動画像及びマスクデータは、パディングしてあっても良い。
【０１２０】
また、ステップ８０３からステップ８０５の処理を、チャイルドボックスＣのサイズを変えながら複数回繰り返しても良い。例えば、最初は大きいチャイルドボックスで処理を行い、チャイルドボックスを小さくしながら繰り返すと良い。
【０１２１】
上述の説明では、ステップ８０２からステップ８０５の処理では、形状算出処理対象フレームが重複するため、同じフレームに対して何度も行うことになる。そこで、ステップ８０５で一度修正したマスクデータに対して、さらに修正を行うことになっても構わない。すなわち、ステップ８０２で読み込むマスクデータは、前にステップ８０５で修正済みのものであっても構わない。マスクデータの精度を向上させる効果が期待できる。
【０１２２】
また、形状算出処理対象フレームが重複するため、同じフレームのマスクデータの修正を何度も行うことになるので、前に配置したチャイルドボックスＣ及びペアレントボックスＰをそのまま用いても良い。このようにすると処理速度の面で有利であるし、３Ｄマスクデータの時間方向の滑らかさの点でも有利である。
【０１２３】
（本実施形態の効果）本実施形態ならば、動画像全体を一度に扱う必要がないので、少ないメモリでも形状算出処理を行うことができる。
【０１２４】
また、逐次的な処理方式なので、リアルタイムに入力される画像に対して、例えばフレーム間差分に基づく方法で被写体の初期形状を与え、その形状を修正しながら、被写体を切り出して出力するというリアルタイム処理に応用できる。
【０１２５】
（第４の実施形態）以下、本発明の第４の実施形態について説明する。
【０１２６】
ここでは、第３の実施形態で説明したマスクデータの逐次的に修正する手法と、第２の実施形態で説明したマスクデータ・画像に対してパディングを行ってから処理を行う手法とを組み合わせた実施形態について説明する。
【０１２７】
（動作）図９は本実施形態における処理の流れを説明する図である。
【０１２８】
（ステップ９０１）まず、動画像、マスクデータ及びボックス設定済みフラグをそれぞれ格納するためのキュー（先入れ先出し）バッファを初期化する。さらに、チャイルドボックスの座標とペアレントボックスの座標とを格納するためのバッファも初期化する。
【０１２９】
本実施形態では動画像、マスクデータ及びボックス設定済みフラグを格納するキューバッファのサイズは、１０フレーム分のデータを格納できるサイズとする。キューバッファのサイズは使用可能なメモリの大きさに応じて適宜変更してよい。
【０１３０】
ボックス設定済みフラグとは、マスクデータのどの部分にチャイルドボックスを設定されているかを確認するためのものである。マスクデータと同じデータ量のキューバッファで、マスクデータにチャイルドボックスを設定したら、それに対応するボックス設定済みフラグの領域のフラグを立てる。
【０１３１】
動画像のキューバッファ及びマスクデータのキューバッファの初期化では、先頭フレームの画像データ及びマスクデータで埋め尽くす処理を行う。これは、先頭フレーム以前における時間方向のパディングを行ったのと等価になる。
【０１３２】
ボックス設定済みフラグのキューバッファは初期化でクリアしておく。また、チャイルドボックスの座標及びペアレントボックスの座標を格納するバッファも初期化でクリアしておく。
【０１３３】
（ステップ９０２）キューバッファに読み込むフレームの番号ｋを初期化する。ここでは先頭フレームから読み込むので番号ｋに１をセットしておく。
【０１３４】
（ステップ９０３）動画像のキューバッファ及びマスクデータのキューバッファに、ｋ番目のフレームの画像及びマスクデータを読み込む。
【０１３５】
もしｋの値が最終フレームの番号を超えている場合は、最終フレームのデータをコピーする。これは、最終フレーム以降における時間方向のパディングを行うのと等価になる。
【０１３６】
（ステップ９０４）フレームｋに対応するマスクデータの被写体と背景との境界面付近にチャイルドボックスが設定されているかを調べ、設定されていない場合はチャイルドボックスを設定する。
【０１３７】
ボックス設定済みフラグのキューバッファにおいて境界面付近に対応する領域のフラグを調べる。フラグがクリアされていれば、チャイルドボックスが未設定である。
【０１３８】
チャイルドボックスが未設定ならば、マスクデータ及び動画像のキューバッファにおいて、チャイルドボックスを以下の条件を満たすように設定する。
・チャイルドボックスの重心がフレームｋの時間方向の中心付近を通る
・チャイルドボックスの重心が境界面の近傍を通る
チャイルドボックスを設定したら、ボックス設定済みフラグのキューバッファにおいて対応する領域のフラグをセットする。
【０１３９】
（ステップ９０５）ステップ９０４で新たに追加されたチャイルドボックスの各々に対し、対応するペアレントボックスの探索を行う。
【０１４０】
（ステップ９０６）全てのチャイルドボックスと、それぞれに対応するペアレントボックスとを用いてマスクデータの修正を行う。
【０１４１】
（ステップ９０７）キューから１フレーム分のマスクデータを取り出す。これは修正後のマスクデータであり、被写体の形状を精度良く表すものである。
【０１４２】
（ステップ９０８）次のフレームに注目するために、ｋに１を加算する。動画像のキューから１フレーム分のデータを削除する。また、削除されたフレームへの参照を含むチャイルドボックス及びペアレントボックスの座標をバッファから削除する。
【０１４３】
（ステップ９０９）処理終了条件の判定を行う。処理終了の条件は、例えば、「ステップ９０７で最終フレームのマスクデータを出力すること」である。この条件は、ｋの値が最後のフレーム番号＋キューバッファに格納可能なフレーム数より大きくなったか否かで判定すればよい。
【０１４４】
（第５の実施形態）以下、本発明の第５の実施形態について説明する。
【０１４５】
（概要）第３の実施形態、第４の実施形態では、全てのフレームについて予め概略形状のマスクデータを与えていた。これに対して、本実施形態は、先頭のフレームについてのみ概略形状のマスクデータを与えるだけで、全体のマスクデータを求めることができるというものである。
【０１４６】
（動作）図１０は、本実施形態の物体形状算出装置の処理の流れを説明する図である。
【０１４７】
（ステップ１００１）まず、動画像、マスクデータ及びボックス設定済みフラグをそれぞれ格納するためのキュー（先入れ先出し）バッファを初期化する。さらに、チャイルドボックスの座標とペアレントボックスの座標とを格納するためのバッファも初期化する。
【０１４８】
本実施形態では動画像、マスクデータ及びボックス設定済みフラグを格納するキューバッファのサイズは、１０フレーム分のデータを格納できるサイズとする。キューバッファのサイズは使用可能なメモリの大きさに応じて適宜変更してよい。
【０１４９】
動画像のキューバッファ及びマスクデータのキューバッファには、先頭フレームの画像データ及びマスクデータで埋め尽くす処理を行っておく。また、ボックス設定済みフラグのキューバッファはクリアしておく。そして、チャイルドボックスの座標及びペアレントボックスの座標を格納するバッファもクリアしておく。
【０１５０】
（ステップ１００２）キューバッファに読み込むフレームの番号ｋを初期化する。ここでは先頭フレームから読み込むので番号ｋに１をセットしておく。
【０１５１】
（ステップ１００３）動画像のキューバッファに、ｋ番目のフレームの画像を読み込む。
【０１５２】
もしｋの値が最終フレームの番号を超えている場合は、最終フレームのデータをコピーする。
【０１５３】
（ステップ１００４）マスクデータのキューバッファに、ｋ番目のフレームのマスクデータを設定する。
【０１５４】
ｋ番目のフレームに関するマスクデータを予測してキューバッファに設定する。予測は、ｋ番目以前で時間的に直近のフレーム（多くの場合はｋ−１番目のフレーム）のマスクデータをコピーすることで行う。
【０１５５】
尚、ｋ番目のフレームのマスクデータが既に存在する場合、例えば概略形状として与えられている場合等、はそれを用いても良い。
【０１５６】
（ステップ１００５）フレームｋに対応するマスクデータの被写体と背景との境界面付近にチャイルドボックスが設定されているかを調べ、設定されていない場合はチャイルドボックスを設定する。
【０１５７】
（ステップ１００６）ステップ１００５で新たに追加されたチャイルドボックスの各々に対し、対応するペアレントボックスの探索を行う。
【０１５８】
（ステップ１００７）全てのチャイルドボックスと、それぞれに対応するペアレントボックスとを用いてマスクデータの修正を行う。
【０１５９】
（ステップ１００８）キューから１フレーム分のマスクデータを取り出す。これは修正後のマスクデータであり、被写体の形状を精度良く表すものである。
【０１６０】
（ステップ１００９）次のフレームに注目するために、ｋに１を加算する。動画像のキューから１フレーム分のデータを削除する。また、削除されたフレームへの参照を含むチャイルドボックス及びペアレントボックスの座標をバッファから削除する。
【０１６１】
（ステップ１０１０）処理終了条件の判定を行う。終了していない場合はステップ１００３へ戻る。
【０１６２】
【発明の効果】
以上のように、本発明によれば、動画像中の対象となる被写体の概略形状情報から、時間方向に滑らかでかつ精度の高い形状情報を得ることが可能となる。
【図面の簡単な説明】
【図１】第１の実施形態の物体形状算出装置の構成を説明する図。
【図２】動画像を３次元時空間画像として扱う様子。
【図３】（Ａ）被写体と背景との境界面にチャイルドボックスｃ（ｉ）を設定する様子を説明する図。（Ｂ）境界面を埋め尽くすようにチャイルドボックスｃ（ｉ）を設定する様子を説明する図。
【図４】（Ａ）修正前のマスクデータを説明する図。（Ｂ）マスクデータにおいてペアレントボックスが示す領域を表示した図。（Ｃ）ペアレントボックスが示す領域の画素をチャイルドボックスの大きさに縮小したものを示した図。（Ｄ）修正後のマスクデータを説明する図。
【図５】第１の実施形態の物体形状算出装置を実現するプログラムを動作させるコンピュータの構成を説明する図。
【図６】第１の実施形態の物体形状算出装置の動作を説明する流れ図。
【図７】第３の実施形態における処理の概要を説明する図。
【図８】第３の実施形態における処理の流れを説明する図。
【図９】第４の実施形態における処理の流れを説明する図。
【図１０】第５の実施形態における処理の流れを説明する図。
【符号の説明】
１０１入力部
１０２３次元化部
１０３チャイルドボックス設定部
１０４画像用ペアレントボックス探索部
１０５マスク修正部
１０６２次元化部
１０７マスク出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus, a method, and a program for determining a shape of a desired object appearing in a moving image.
[0002]
[Prior art]
An object area extraction technique for extracting an area in which only a certain specific object (subject) is captured from an image in which an object and a background are mixed is used for processing and editing not only a still image but also a moving image. For example, there is a usage method in which a person appearing in a moving image is extracted and combined with another background.
[0003]
When extracting a region where a subject is captured from an image, pixels in a region where the subject is present in the image are extracted using a boundary between the subject and the background (the shape of the subject) that has been checked in advance. Therefore, it is important to acquire the shape of the subject with high accuracy.
[0004]
2. Description of the Related Art Conventionally, one of techniques for acquiring a shape of a subject in an image with high accuracy is as follows. First, a rough shape of the subject is given (for example, the shape of the subject is manually input). Then, a given shape is corrected by using the self-similarity of the boundary portion of the subject to obtain a highly accurate shape (for example, Patent Document 1).
[0005]
[Patent Document 1]
JP 2000-82145 A
[0006]
[Problems to be solved by the invention]
In the method disclosed in Patent Literature 1, shape correction of a subject is performed independently for each frame of a moving image. That is, in the method of Patent Document 1, shape correction is performed two-dimensionally, and shape correlation between frames is not considered, so that a shape that is not smooth in the time direction may be obtained.
[0007]
If a subject is extracted from a moving image using a shape that is not smooth in the time direction, a problem may occur that part or all of the subject appears to flicker during reproduction.
[0008]
Therefore, an object of the present invention is to provide an apparatus and a method for obtaining a smooth and highly accurate shape in the time direction from a schematic shape of a target object in a moving image. .
[0009]
[Means for Solving the Problems]
In order to solve the above-mentioned problems, an object shape calculation device according to the present invention includes an image input unit for inputting a three-dimensional image, and a shape for inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image. Using the input unit and the three-dimensional shape data, a plurality of child boxes, which are three-dimensional regions serving as units of the shape data correction processing, are set along a boundary surface between the object and a background other than the object. A child box setting unit that searches for an image parent box that is similar to the pixel information of an area at the same position as the child box in the three-dimensional image and is larger in volume than the child box; A parent box search unit for an image to be processed, and the shape data of an area indicated by the child box in the three-dimensional shape data is converted into the three-dimensional shape data. A shape data correction unit that corrects the shape data of an area at the same position as the parent box for images in the data by reducing it to the size of the child box; and an output unit that outputs the corrected three-dimensional shape data. Prepare.
[0010]
The object shape calculation method according to the present invention includes an image input step of inputting three-dimensional image data, a shape input step of inputting three-dimensional shape data representing a general area occupied by the object in the three-dimensional image, A child box setting step of setting a plurality of child boxes, which are units serving as a unit of correction processing of the three-dimensional shape data, along a boundary surface between the object and a background other than the object using data; An image parent box search step for searching for an image parent box, which is similar to the pixel information of an area located at the same position as the child box in the three-dimensional image and has a larger volume than the child box; And the shape data of the area indicated by the child box in the three-dimensional shape data Comprising a substitution step of replacing the shape data of a region in the same position as the image for the parent box in the original shape data obtained by reducing the size of the child boxes, and an output step of outputting the three-dimensional shape data.
[0011]
An object shape calculation program according to the present invention includes: an image input step of inputting three-dimensional image data to a computer; a shape input step of inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image; A child box setting step of setting a plurality of child boxes, which are units that are units of correction processing of the three-dimensional shape data, along a boundary surface between the object and a background other than the object using the three-dimensional shape data; An image parent for searching for an image parent box, which is similar to the pixel information of an area at the same position as the child box in the three-dimensional image and has a larger volume than the child box. A box search step and an area indicated by the child box in the three-dimensional shape data Replacing the shape data with the shape data of an area at the same position as the parent box for the image in the three-dimensional shape data, reduced to the size of the child box; and outputting the three-dimensional shape data. And an output step.
[0012]
The object shape calculation program according to the present invention may further comprise: a computer configured to input image data to a three-dimensional image, shape input device to input three-dimensional shape data representing a general area occupied by the object in the three-dimensional image, Child box setting means for setting a plurality of child boxes, which are three-dimensional regions serving as a unit of correction processing of the shape data, along a boundary surface between the object and a background other than the object using three-dimensional shape data; An image parent box search for searching for an image parent box which is similar to the pixel information of an area at the same position as the child box in the three-dimensional image and has a larger volume than the child box in the three-dimensional image Means for converting the shape data of the area indicated by the child box in the three-dimensional shape data into the three-dimensional shape data Shape data correcting means for correcting shape data of an area at the same position as the image parent box in the data by reducing it to the size of the child box, and outputting means for outputting corrected three-dimensional shape data Function as
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
(First Embodiment) Hereinafter, a first embodiment of the present invention will be described with reference to the drawings.
[0014]
(Summary) The object shape calculation device of the present embodiment is a device suitable for obtaining the shape of a desired object in a moving image.
[0015]
The apparatus receives, as inputs, a moving image in which an object (subject) whose shape is to be obtained and data on the approximate shape of the object in all frames of the moving image. Then, the general shape data is corrected using the moving image to calculate and output highly accurate shape data.
[0016]
In this apparatus, the general shape data is a bitmap. For example, an image has a pixel value of 1 for a pixel corresponding to a subject, and 0 for a non-subject (eg, background) pixel. In this apparatus, it is assumed that the number of pixels of each frame of the moving image is equal to the number of pixels of the schematic shape data of each frame.
[0017]
Note that the number of pixels of the schematic shape data does not need to match the number of pixels of the image. For example, if the number of vertical and horizontal pixels of the schematic shape data is larger than the number of vertical and horizontal pixels of the image (for example, each of the vertical and horizontal pixels is twice as large), it is expected that the shape calculation processing will be more accurate. On the other hand, if the number of vertical and horizontal pixels of the approximate shape data is smaller than the number of vertical and horizontal pixels of the image (for example, each of the vertical and horizontal pixels is half), it is expected that the shape calculation processing will be speeded up.
[0018]
In this apparatus, a moving image is handled as a three-dimensional spatio-temporal image in which frames are overlapped in the time direction. Then, the subject becomes a three-dimensional object that spreads in the time direction in addition to the vertical and horizontal directions of the frame (hereinafter, a 3D object). Similarly, the outline shape data of each frame is similarly accumulated in a time series and handled as outline shape data of a three-dimensional object (hereinafter, 3D outline shape data).
[0019]
The present apparatus obtains a smooth shape in the time direction by correcting the 3D schematic shape data using the self-similarity of the surface of the 3D object in the three-dimensional image. A three-dimensional image is an image having a spread in three directions. For example, the image has a spread in three directions (vertical, horizontal, and depth). A three-dimensional spatio-temporal image having a spread in three directions (vertical, horizontal, time) is also a kind of three-dimensional image.
[0020]
Specifically, the following processing is performed. First, in the 3D schematic shape data, a plurality of rectangular parallelepiped regions are set on the boundary surface between the object and the non-object (background). Each of these rectangular parallelepiped regions is called a child box. When setting the child box, make sure that the boundary surface is included inside.
[0021]
Next, in the three-dimensional image, attention is paid to a region at the same position as each child box set by the 3D schematic shape data. These areas are referred to as “image child boxes”.
[0022]
In the three-dimensional image, for each image child box, the volume of the image child box is larger than that of each image child box, and the pixel information contained therein is the rectangular parallelepiped region most similar to each image child box. Ask for parent box.
[0023]
"Similar" between two regions having different sizes means that when the sizes of the two regions are made uniform, the correlation between pixel information contained therein becomes high.
[0024]
Therefore, in order to obtain the most similar image parent box, for example, for a plurality of image parent box candidate areas, after reducing the size of each candidate area to the size of the image child box, And the candidate area having the highest correlation may be selected as the parent box for the image. The correlation is determined based on the similarity of pixel values included therein.
[0025]
Finally, in the 3D schematic shape data, attention is paid to an area at the same position as the parent box for each image searched in the three-dimensional image. These areas are called "parent boxes". Then, in the 3D schematic shape data, each child box is replaced with a corresponding parent box reduced to the size of the child box. By the replacement, the boundary surface of the 3D schematic shape data approaches the surface of the 3D object. The corrected 3D schematic shape data is referred to as 3D shape data.
[0026]
As described above, the 3D general shape data is obtained by accumulating the general shape data of each frame in time series. Therefore, the shape data of the subject in each frame can be obtained by cutting out the 3D shape data in time series.
[0027]
(Configuration) Hereinafter, the configuration of the object shape calculation device of the present embodiment will be described with reference to the drawings.
[0028]
This device has the following configuration.
An input unit 101 for inputting images of all frames of a moving image in which a subject is captured, and mask data of all frames representing a schematic shape of the subject;
A three-dimensional conversion unit 102 that connects all the frames of the moving image and the mask data for all the frames in the time direction to generate a three-dimensional spatiotemporal image and three-dimensional mask data;
[0029]
In the three-dimensional mask data, a mask child box setting unit 103 for setting a rectangular parallelepiped region called a child box along a boundary surface between the subject and the rest (hereinafter, background).
-In the three-dimensional spatiotemporal image, pay attention to the image child box, which is an area at the same position as the child box set by the three-dimensional mask data, and have a larger volume than each image child box and are included inside. An image parent box search unit 104 for searching for an image parent box which is a rectangular parallelepiped region whose pixel information is most similar to each image child box.
[0030]
In the three-dimensional mask data, attention is paid to the parent box, which is an area located at the same position as the parent box for the image searched in the three-dimensional spatiotemporal image, and in the three-dimensional mask data, each child box corresponds to each. A mask correction unit 105 that corrects the three-dimensional mask data by replacing the parent box with the one reduced in size to the child box.
A two-dimensional conversion unit 106 that returns the corrected three-dimensional mask data to data arranged in chronological order;
A mask output unit 107 that outputs two-dimensional mask data;
[0031]
Hereinafter, each unit will be described.
[0032]
(Input Unit 101) The input unit 101 inputs all frames of a moving image in which a subject is captured and all frames of mask data representing a schematic shape of the subject. The mask data may be given in any manner as long as the region where the subject exists and the region where the subject does not exist can be identified. In this embodiment, a schematic shape manually drawn by a human is used.
[0033]
(Three-dimensional part 102) In the three-dimensional part 102, all the frames of the moving image and all the mask data are connected in the time direction, respectively, and the three-dimensional spatio-temporal image (3D image) and the three-dimensional spatio-temporal mask data are respectively obtained. (3D mask data) is generated. “Connected (in the time direction)” means that two-dimensional images of each frame are regarded as a three-dimensional spatio-temporal image having a thickness Δt in the time direction and are connected in time order in the time direction. .
[0034]
FIG. 2 is a diagram for explaining the connection. The moving subject 203 is shown in the frames 201-1 to 201-5. Each frame has a schematic shape 202. In FIG. 2, the schematic shape 202 and the subject 203 are represented as being on the same image for the sake of simplicity, but actually exist as separate data.
[0035]
The subject 204 is extracted and expressed only from the subject 203 in the frames 201-1 to 201-5. Since the subject 204 moves with time, when the frames 201-1 to 201-5 are connected in the time direction to form a three-dimensional spatio-temporal image, a curved cylindrical three-dimensional image such as a three-dimensional object image 205 is obtained. Expressed (if it is not moving, it will be straight cylindrical). In FIG. 2, the subject 204 does not move in the horizontal direction (moves only in the vertical direction) in the screen, and thus the three-dimensional object image 205 also bends only in the vertical direction (vertical direction).
[0036]
The same applies to the schematic shape 202. When the connection is performed, the region where the object exists is expressed as a cylindrical three-dimensional image.
[0037]
(Child Box Setting Unit 103) The mask child box setting unit 103 sets a rectangular parallelepiped region of interest called a child box along the boundary surface between the subject and the background in the 3D mask data. Make settings so that the boundary surface is inside the child box. Desirably, the boundary surface passes through or near the center of gravity of the child box.
[0038]
FIGS. 3A and 3B are diagrams for explaining how a child box is set. In this figure, for simplicity of description, a part of the 3D mask data is cut out and represented along a plane perpendicular to the time direction. As shown in FIG. 3A, a child box 302 is set along a boundary surface of the 3D mask data 301. At this time, it is preferable that the center of gravity of the child box passes through the boundary surface. Then, as shown in FIG. 3B, a plurality of child boxes 302 are set so as to fill the boundary of the 3D mask data 301.
[0039]
(Image Parent Box Searching Unit 104) The image parent box searching unit 104 focuses on an area in the same position as each child box set by the 3D mask data in the 3D image. These areas are called image child boxes. Then, for each image child box, a search is made for an image parent box which is a region most similar to each image child box and which is a rectangular parallelepiped region having a larger volume than each image child box.
[0040]
As described above, in the search, a candidate area most similar to the image child box among the plurality of image parent box candidate areas is set as the image parent box. More specifically, the calculation is performed by calculating the similarity between pixels included in the parent box candidate area for image and the child box for image. The similarity may be calculated by, for example, expanding the child box for the image to the size of the parent box for the image, and then calculating the sum of absolute differences (SAD) or the sum of squares of the differences (SSD) of the pixel values. Alternatively, the SAD or SSD of the pixel value may be obtained by reducing each candidate area to the size of the image child box.
[0041]
An image parent box candidate having a center of gravity within a predetermined range (for example, 10 pixels × 10 pixels × 10 pixels) around the center of gravity of the image child box is used.
[0042]
(Mask Correction Unit 105) The mask correction unit 105 focuses on an area in the same position as the parent box for each image searched in the 3D image in the 3D mask data. These areas are called parent boxes. Then, the 3D mask data is corrected using each parent box and each child box.
[0043]
FIGS. 4A to 4D are diagrams for explaining correction of mask data. Note that the drawings are expressed in two dimensions for the sake of simplicity.
[0044]
FIG. 4A shows a state where the setting of the parent box 401 and the child box 402 on the 3D mask data 400 is completed. From now on, the boundary surface 403 of the 3D mask data 400 will be corrected. Here, a true boundary surface 404 between the subject and the background is also virtually displayed.
[0045]
The position where the parent box 401 is set is a position where the similarity with the child box 402 is the highest in the 3D image. High similarity in the 3D image means high similarity with respect to the true interface 404. Accordingly, the true boundary surface 404 of the parent box 401 and the child box 402 almost coincide with each other.
[0046]
The modification of the 3D mask data 400 is performed as follows. First, an area specified by the parent box 401 is extracted from the 3D mask data 400. FIG. 4B is a diagram showing only an area designated by the parent box 401.
[0047]
Next, a replacement box 405 is generated by reducing the extracted area to the same size as the child box 402. FIG. 4C is a diagram illustrating the replacement box 405.
[0048]
Then, in the 3D mask data 400, the pixels in the area indicated by the child box 402 are replaced with the pixels in the replacement box 405. FIG. 4D shows the 3D mask data 400 after the replacement. By performing the replacement, the interface 403 approaches the true interface 404.
[0049]
The above-mentioned correction is made for all child boxes set in the 3D mask data.
[0050]
(Two-dimensional part 106) The two-dimensional part 106 cuts out the corrected 3D mask data at intervals of Δt in the time direction to obtain two-dimensional mask data.
[0051]
(Mask Data Output Unit 107) The mask data output unit 107 outputs two-dimensional mask data.
[0052]
(Implementation Means) The object shape calculation device of the present embodiment is implemented as a program operated by a computer. That is, it is an object shape calculation program that causes a computer to realize the functions of the above-described units. A part or all of the present device may be realized as hardware such as a semiconductor integrated circuit.
[0053]
FIG. 5 is a diagram illustrating an example of a computer that operates the object shape calculation program according to the present embodiment.
[0054]
The computer includes a central processing unit 501, a memory 502 for temporarily storing the texture compression program and data being processed, a magnetic disk drive 503 for storing the texture compression program, data before compression and data after compression. And an optical disk drive 504.
[0055]
Also, a display device 508 such as an LCD or a CRT, an image output unit 505 which is an interface for outputting an image signal to the display device 508, an input device 509 such as a keyboard or a mouse, and an interface for receiving an input signal from the input device 509 And an input receiving unit 506.
[0056]
In addition, an input / output unit 507 which is a connection interface (for example, USB, network interface, etc.) with an external device is provided.
[0057]
The object shape calculation program according to the present embodiment is stored in the magnetic disk drive 502 in advance. Then, when calculating the object shape, it is read from the magnetic disk drive 503 and stored in the memory 302 and executed by the central processing unit 501.
[0058]
The mask data generated as an execution result is stored in the magnetic disk drive 503. Further, in the process of calculating the object shape, a GUI for presenting information such as the processing status to the user or prompting some input is displayed on the display device 508 via the image output unit 505 as appropriate.
[0059]
The mask data generated as a result of the object shape calculation is not only stored in the magnetic disk drive 503, but also output to other programs that require the mask data using an inter-process communication function (shared memory, pipe, etc.). You may.
[0060]
In the present embodiment, the object shape calculation program operates in cooperation with an OS (Operating System) running on a computer.
[0061]
(Operation) Hereinafter, the operation of the present embodiment will be described with reference to the drawings. FIG. 6 is a diagram illustrating the operation of the object shape calculation device according to the present embodiment.
[0062]
(Step 601) Moving image I = ｛I showing subject ₁ , I ₂ ... I _n And mask data R = ｛R of the schematic shape of the subject ₁ , R ₂ ... R _n Enter｝. Then, the input moving image and the rough shape mask data are sequentially stored frame by frame. In addition, I _k And R _k Represents data for one frame, respectively.
[0063]
(Step 602) Each frame I of the moving image _k And the schematic shape mask data R of each frame _k Are connected in the time direction to generate a three-dimensional spatio-temporal image I (3D image I) and three-dimensional spatio-temporal mask data R (3D mask data R).
[0064]
(Step 603) In the 3D mask data R, a child box C = {c (1), c (2)... C (N)} is set along the boundary between the subject and the background. In the 3D image I, attention is also paid to regions located at the same positions as the child boxes C (i), and these are marked as child boxes C ′ = {c ′ (1), c ′ (2),. Called｝.
[0065]
In the present embodiment, the shape of each of the child boxes c (i) and c ′ (i) is a rectangular parallelepiped, and the size is 8 pixels vertically (one of the spatial directions of the 3D mask data) and horizontal (the spatial direction of the 3D mask data). On the other hand, there are eight pixels and four pixels in the depth (time direction of 3D mask data). However, other shapes and sizes may be used. For example, a cube having eight pixels on one side may be used.
[0066]
Each of the child boxes c (i) and c ′ (i) is set such that the respective centers of gravity pass through the boundary surface. Also, a number of child boxes C are set so as to fill the boundary surface.
[0067]
(Step 604) In the 3D image I, parent box P ′ = {p ′ (1), p ′ (2)... P ′ (N)} is searched and set. Also, in the 3D mask data, attention is paid to the region at the same position as each parent box p ′ (i), and these are set as parent box P = {p (1), p (2)... P (N)}. Call.
[0068]
In the 3D image I, the parent box p '(i) of each child box c' (i) is searched. The search is performed within a predetermined range (for example, a range of 10 pixels × 10 pixels × 10 pixels) around the center of gravity of each child box c ′ (i). The similarity between the candidate of the parent box p '(i) having the center of gravity within this range and the child box c' (i) is obtained, and the most similar candidate is set as the parent box p '(i).
[0069]
The calculation of the similarity is performed by calculating the similarity of the pixels included inside the candidate of the parent box p ′ (i) and the child box c ′ (i). The similarity may be calculated, for example, by enlarging the child box c ′ (i) to the size of the parent box p ′ (i), and then calculating the sum of absolute differences (SAD) of the pixel values.
[0070]
As another calculation of the similarity, not only the sum of absolute value differences but also the sum of squares of differences (SSD) of pixel values may be used. Since so-called block matching is performed three-dimensionally, it is possible to apply a method of obtaining a similarity or a correlation in block matching.
[0071]
The relationship between the parent box P 'and the child box C' is free except that the parent box P 'has a larger volume than the child box C'. The three sides of the child box C 'need not be enlarged at an equal magnification. In the present embodiment, all parent boxes P are (16 pixels × 16 pixels × 8 pixels). The relationship between the parent box P and the child box C is the same.
[0072]
For example, when the child box C ′ is a cube of (8 pixels × 8 pixels × 8 pixels), the parent box P ′ may be a cube of (12 pixels × 16 pixels × 24 pixels). Alternatively, when the child box C is a rectangular parallelepiped of (8 pixels × 8 pixels × 4 pixels), the parent box P may be a rectangular parallelepiped of (12 pixels × 12 pixels × 4 pixels).
[0073]
(Step 605) In the 3D mask data R, the 3D mask data R is corrected using each child box c (i) and the corresponding parent box p (i).
[0074]
First, pixel information of an area indicated by a parent box p (i) corresponding to each child box c (i) is extracted from the 3D mask data R. Next, a replacement box R ′ is generated by reducing the extracted pixel information to the same size as the child box. Then, in the 3D mask data, the pixel information of the area indicated by the child box is replaced with the pixel information of the replacement box R ′ and corrected.
[0075]
It should be noted that the replacement process may be performed once, but the effect of correcting the mask data becomes higher when the replacement process is performed a plurality of times.
[0076]
By performing the replacement process once, the error between the true boundary surface and the boundary surface in the 3D mask data R is determined by the size of the parent box p (i) corresponding to the size of each child box c (i). It is the reciprocal multiple of the ratio.
[0077]
The number of replacements required for the error to converge, that is, the condition under which the error is one pixel or less, depends on the size ratio between each child box c (i) and the corresponding parent box p (i).
[0078]
Assuming that the maximum length of the side of each child box c (i) is L, the error remaining after the replacement process is L / 2 and the corresponding parent box p (i) for each child box c (i) Is obtained by multiplying the reciprocal of the size ratio by the number of times the replacement process is performed. When the remaining error becomes less than 1, it can be considered that the error has converged.
[0079]
As described above, in the present embodiment, each child box c (i) is a rectangular parallelepiped and has a size of 8 pixels / 8 pixels / 4 pixels in the vertical, horizontal, and temporal directions, respectively. Since the size is 16 pixels, 16 pixels, and 8 pixels in the horizontal and temporal directions, respectively, three replacement processes are required to reduce the remaining error to less than one.
[0080]
In the present embodiment, the process of this step is repeated three times from the viewpoint of converging the above-mentioned remaining error.
[0081]
(Step 606) From the corrected 3D mask data R to the mask data R _k And output.
[0082]
A process opposite to the process of linking the mask data in step 602 is performed. In step 602, two-dimensional mask data is connected in the time direction. In this step, two-dimensional mask data is separated in the time direction.
[0083]
(Effects of the present embodiment) As described above, the object shape calculation apparatus of the present embodiment calculates the shape in the time direction, so that when reproducing as a moving image, it is possible to obtain shape information in which flicker is suppressed as compared with the related art. be able to.
[0084]
(First Modification) As the size of each child box c (i) increases, the range in which the 3D mask data R can be corrected at one time increases, but there is a problem that the correction accuracy deteriorates.
[0085]
Therefore, the processes of Steps 603 to 605 are performed with larger c (i) and c ′ (i), and then performed with smaller c (i) and c ′ (i), and further smaller c (i). And c ′ (i), the 3D mask data R can be corrected efficiently and with high accuracy over a wider range.
[0086]
For example, it is conceivable that the size of c (i) and c ′ (i) is initially set to (8 pixels × 8 pixels × 8 pixels), and the size of the second time is set to (4 pixels × 4 pixels × 4 pixels). Can be Alternatively, (16 pixels × 16 pixels × 4 pixels) → (8 pixels × 8 pixels × 4 pixels) → (4 pixels × 4 pixels × 4 pixels) may be performed.
[0087]
(Second Modification) In the above description, the processing in step 605 is performed a plurality of times to enhance the correction effect. However, the processing from step 603 to step 605 may be repeated a plurality of times (for example, two to three times). That is, the 3D mask data R is corrected by changing the setting position of the child box C.
[0088]
This is particularly effective when a part (or all) of the set child box C is out of the true boundary surface. This is because resetting the child box C at the time of repetition increases the possibility that the child box C is set to pass through a true boundary surface.
[0089]
(Third Modification) So far, a three-dimensional spatio-temporal image in which frames of a moving image are connected in the time direction has been handled, but a more general three-dimensional image can be handled. That is, until now, the object shape can be calculated in the three-dimensional space (X direction, Y direction, Z direction).
[0090]
As the three-dimensional image, for example, a set of cross-sectional images in a certain direction of the three-dimensional object acquired by CT or MRI may be used. That is, a set of cross-sectional images and mask data are input by the input unit 101, and the three-dimensional image generating unit 102 connects the cross-sectional images in a direction perpendicular to the cross-section to generate a three-dimensional image.
[0091]
(Fourth Modification) An image obtained by enlarging the 3D image I to (parent box size) / (child box size) times in order to perform a process of searching for the parent box P ′ in step 604 at high speed. May be prepared.
[0092]
In step 604, the area indicated by each child box c '(i) is enlarged and compared with the area indicated by the parent box candidate. Conversely, the area indicated by the parent box candidate is indicated by each child box c' (i). ) May be reduced for comparison. In this case, an image in which the 3D image I is reduced by (child box size) / (parent box size) times may be prepared.
[0093]
(Fifth Modification) The schematic shape data may be vector information. That is, the rough shape data may be a set of line segments representing the shape of the subject in each frame.
[0094]
In this case, the elements constituting the set of line segments are updated each time the replacement processing in step 605 is performed.
[0095]
(Second Embodiment) Hereinafter, a second embodiment of the present invention will be described. The basic configuration is the same as in the first embodiment.
[0096]
When searching for the parent box P ′ in the above-described step 604, it may be desired to refer to the outside of the three-dimensional image at the end of the 3D image I. Such a case is likely to occur, for example, when a subject is captured near the end of each frame of a moving image.
[0097]
Although it is possible to limit the search range so that reference to the outside of the 3D image I does not occur, the similarity between the parent box P ′ and the child box C ′ is reduced when such a restriction is imposed, and the boundary is reduced. The accuracy of the surface may deteriorate.
[0098]
In particular, in the case of a three-dimensional spatiotemporal image, the shapes of the start several frames and the end several frames become unnatural, so that the influence is great.
[0099]
Therefore, the 3D image I is extended outward in each direction. Pixels in the expanded portion are compensated for by performing padding processing for copying the nearest pixel. In addition, padding processing is similarly performed on the three-dimensional mask data R. Then, after performing the padding process, the parent box P is searched.
[0100]
“Copying the pixel at the closest end” means, for example, that in a spatio-temporal image in which the time t = 0 is the start point and the time t = 100 is the end point, the same image as the time t = 0 is located at the time t = −1. This is the process of copying.
[0101]
Note that the present embodiment is not limited to a three-dimensional spatiotemporal image, and can be applied to a general three-dimensional image.
[0102]
As described above, by using the padded three-dimensional image and three-dimensional mask data, the three-dimensional mask data can be corrected with higher accuracy, and a more accurate object shape can be obtained.
[0103]
(Third Embodiment) Hereinafter, a third embodiment of the present invention will be described.
[0104]
(Summary) Up to now, the shape of the subject has been obtained by treating the entire moving image at once. In the present embodiment, the shape of the subject is sequentially obtained in units of several frames of a moving image.
[0105]
As a result, the amount of data handled at a time can be reduced. For example, it is useful for processing a moving image for a very long time with a computer having a small memory.
[0106]
FIG. 7 is a diagram illustrating an outline of a process according to the present embodiment. FIG. 7 is a diagram in which only the subject 701 in the moving image is arranged in chronological order. Since the subject 701 is moving, the position of the subject 701 is shifted with the passage of time.
[0107]
In the present embodiment, shapes are first obtained for several frames of a moving image (first processing). Then, a shape is obtained for the next several frames (second processing). By repeating such an operation (third processing, fourth processing,...), The shape of the subject is finally obtained over the entire moving image.
[0108]
(Operation) FIG. 8 is a diagram for explaining the operation of the object shape calculation device of the present embodiment. In the following description, n frames are handled at a time.
[0109]
(Step 801) The initial value of the head position of the n frames to be subjected to the shape calculation processing in the moving image is set. In the present embodiment, processing is performed from the first frame.
[0110]
(Step 802) Among the moving images, images and mask data for n frames to be subjected to shape calculation processing are read into the memory.
[0111]
Images of n frames are handled as 3D images I. Also, mask data for n frames is treated as 3D mask data R.
[0112]
(Step 803) Child boxes C and C ′ are set for the 3D mask data R and the 3D image I in the same manner as in the first embodiment.
[0113]
(Step 804) As in the first embodiment, a parent box P 'corresponding to each child box C' is searched in the 3D image I. An area on the 3D mask data R located at the same position as the searched parent box P ′ is defined as a parent box P.
[0114]
(Step 805) As in the first embodiment, in the 3D mask data R, the pixel information of the child box C is replaced with a reduced version of the parent box P, and the 3D mask data R is corrected.
[0115]
(Step 806) The 3D mask data R is divided and output in units of frames.
[0116]
(Step 807) The head position of the n frame to be subjected to the shape calculation processing is shifted.
[0117]
(Step 808) If the shape calculation processing has been completed up to the last frame, the processing ends. If not completed, the processing is performed from step 802 for the next n frames.
[0118]
(Modification) Here, an example in which processing is performed in the forward direction of the time axis from the first frame has been described.
[0119]
The input moving image and mask data may be padded.
[0120]
Further, the processing from step 803 to step 805 may be repeated a plurality of times while changing the size of the child box C. For example, it is good to start with a large child box and repeat the process while reducing the size of the child box.
[0121]
In the above description, in the processing from step 802 to step 805, the shape calculation processing target frames overlap, so that the same frame is repeatedly processed. Therefore, the mask data once corrected in step 805 may be further corrected. That is, the mask data read in step 802 may be the data that has been previously corrected in step 805. The effect of improving the accuracy of mask data can be expected.
[0122]
In addition, since the shape calculation processing target frames overlap, the mask data of the same frame is corrected many times. Therefore, the child box C and the parent box P arranged before may be used as they are. This is advantageous in terms of processing speed and also in terms of smoothness of 3D mask data in the time direction.
[0123]
(Effects of the present embodiment) According to the present embodiment, it is not necessary to handle the entire moving image at one time, so that the shape calculation processing can be performed with a small memory.
[0124]
In addition, since the processing is a sequential processing method, a real-time processing is performed in which an initial shape of a subject is given to an image input in real time by, for example, a method based on an inter-frame difference, and the shape of the subject is cut out and output. Applicable to
[0125]
(Fourth Embodiment) Hereinafter, a fourth embodiment of the present invention will be described.
[0126]
Here, the method of sequentially correcting the mask data described in the third embodiment is combined with the method of performing the processing after padding the mask data / image described in the second embodiment. An embodiment will be described.
[0127]
(Operation) FIG. 9 is a view for explaining the flow of processing in this embodiment.
[0128]
(Step 901) First, a queue (first-in first-out) buffer for storing a moving image, mask data, and a box setting flag is initialized. Further, a buffer for storing the coordinates of the child box and the coordinates of the parent box is also initialized.
[0129]
In the present embodiment, the size of the queue buffer for storing moving images, mask data, and the box setting completed flag is set to a size that can store data for 10 frames. The size of the queue buffer may be appropriately changed according to the size of the available memory.
[0130]
The box setting completed flag is for confirming in which part of the mask data the child box is set. When a child box is set in the mask data in the queue buffer having the same data amount as the mask data, a flag is set in the box set flag corresponding to the child box.
[0131]
In the initialization of the moving image cue buffer and the mask data cue buffer, processing is performed to completely fill in the image data and mask data of the first frame. This is equivalent to performing padding in the time direction before the first frame.
[0132]
The queue buffer of the box setting completed flag is cleared by initialization. Also, the buffer for storing the coordinates of the child box and the coordinates of the parent box is cleared by initialization.
[0133]
(Step 902) The number k of the frame to be read into the queue buffer is initialized. Here, 1 is set to the number k since the data is read from the first frame.
[0134]
(Step 903) The image and mask data of the k-th frame are read into the moving image queue buffer and the mask data queue buffer.
[0135]
If the value of k exceeds the number of the last frame, the data of the last frame is copied. This is equivalent to performing padding in the time direction after the last frame.
[0136]
(Step 904) It is checked whether a child box is set near the boundary between the subject and the background of the mask data corresponding to the frame k, and if not set, a child box is set.
[0137]
Check the flag of the area corresponding to the vicinity of the boundary surface in the queue buffer of the box set flag. If the flag is cleared, the child box has not been set.
[0138]
If the child box has not been set, the child box is set in the mask data and moving image queue buffer so as to satisfy the following conditions.
・ The center of gravity of the child box passes near the center of the frame k in the time direction
・ The center of gravity of the child box passes near the boundary
After setting the child box, the flag of the corresponding area in the queue buffer of the box setting completed flag is set.
[0139]
(Step 905) For each of the child boxes newly added in Step 904, a search for a corresponding parent box is performed.
[0140]
(Step 906) Mask data is corrected using all child boxes and their corresponding parent boxes.
[0141]
(Step 907) The mask data for one frame is extracted from the queue. This is the corrected mask data, which accurately represents the shape of the subject.
[0142]
(Step 908) In order to pay attention to the next frame, 1 is added to k. One frame of data is deleted from the moving image queue. Further, the coordinates of the child box and the parent box including the reference to the deleted frame are deleted from the buffer.
[0143]
(Step 909) The processing termination condition is determined. The condition for terminating the process is, for example, “outputting the mask data of the last frame in step 907”. This condition may be determined based on whether or not the value of k is larger than the last frame number + the number of frames that can be stored in the queue buffer.
[0144]
(Fifth Embodiment) Hereinafter, a fifth embodiment of the present invention will be described.
[0145]
(Summary) In the third embodiment and the fourth embodiment, mask data of a general shape is given in advance for all frames. On the other hand, in the present embodiment, the entire mask data can be obtained only by giving the mask data of the approximate shape only for the first frame.
[0146]
(Operation) FIG. 10 is a diagram for explaining the flow of processing of the object shape calculation device of the present embodiment.
[0147]
(Step 1001) First, a queue (first-in first-out) buffer for storing a moving image, mask data, and a box setting flag is initialized. Further, a buffer for storing the coordinates of the child box and the coordinates of the parent box is also initialized.
[0148]
In the present embodiment, the size of the queue buffer for storing moving images, mask data, and the box setting completed flag is set to a size that can store data for 10 frames. The size of the queue buffer may be appropriately changed according to the size of the available memory.
[0149]
In the moving image queue buffer and the mask data queue buffer, a process of filling up the image data and the mask data of the first frame is performed. Also, the queue buffer of the box setting completed flag is cleared. The buffer for storing the coordinates of the child box and the coordinates of the parent box is also cleared.
[0150]
(Step 1002) Initialize the frame number k to be read into the queue buffer. Here, 1 is set to the number k since the data is read from the first frame.
[0151]
(Step 1003) The image of the kth frame is read into the moving image queue buffer.
[0152]
If the value of k exceeds the number of the last frame, the data of the last frame is copied.
[0153]
(Step 1004) The mask data of the k-th frame is set in the queue buffer of the mask data.
[0154]
The mask data for the k-th frame is predicted and set in the queue buffer. The prediction is performed by copying the mask data of the temporally latest frame before the k-th frame (in most cases, the (k-1) -th frame).
[0155]
When the mask data of the k-th frame already exists, for example, when it is given as a schematic shape, it may be used.
[0156]
(Step 1005) It is checked whether a child box is set near the boundary between the subject and the background of the mask data corresponding to the frame k, and if not set, a child box is set.
[0157]
(Step 1006) For each of the child boxes newly added in Step 1005, a search is made for a corresponding parent box.
[0158]
(Step 1007) Mask data is corrected using all child boxes and their corresponding parent boxes.
[0159]
(Step 1008) One frame of mask data is extracted from the queue. This is the corrected mask data, which accurately represents the shape of the subject.
[0160]
(Step 1009) In order to pay attention to the next frame, 1 is added to k. One frame of data is deleted from the moving image queue. Further, the coordinates of the child box and the parent box including the reference to the deleted frame are deleted from the buffer.
[0161]
(Step 1010) A process termination condition is determined. If not, the process returns to step 1003.
[0162]
【The invention's effect】
As described above, according to the present invention, it is possible to obtain smooth and highly accurate shape information in the time direction from the general shape information of a target subject in a moving image.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of an object shape calculation device according to a first embodiment.
FIG. 2 shows a situation where a moving image is treated as a three-dimensional spatiotemporal image.
FIG. 3A is a diagram illustrating a state in which a child box c (i) is set on a boundary surface between a subject and a background. (B) The figure explaining a mode that child box c (i) is set up so that a boundary surface may be filled up.
FIG. 4A illustrates mask data before correction. FIG. 5B is a view showing an area indicated by a parent box in the mask data. FIG. 4C is a diagram illustrating a pixel in an area indicated by a parent box reduced to the size of a child box. FIG. 4D is a view for explaining mask data after correction.
FIG. 5 is an exemplary view for explaining the configuration of a computer which operates a program for realizing the object shape calculation device according to the first embodiment;
FIG. 6 is a flowchart illustrating the operation of the object shape calculation device according to the first embodiment.
FIG. 7 is a view for explaining an outline of a process according to a third embodiment;
FIG. 8 is a view for explaining the flow of processing in the third embodiment.
FIG. 9 is a view for explaining the flow of processing in the fourth embodiment.
FIG. 10 is a view for explaining the flow of processing in the fifth embodiment.
[Explanation of symbols]
101 Input unit
102 Three-dimensional part
103 Child box setting section
104 Image Parent Box Search Unit
105 Mask correction unit
106 Two-dimensional part
107 Mask output section

Claims

An image input unit for inputting a three-dimensional image,
A shape input unit for inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image;
A child box setting unit that sets a plurality of child boxes that are three-dimensional regions that are units of correction processing of the shape data along the boundary between the object and a background other than the object using the three-dimensional shape data When,
An image parent box search for searching for an image parent box that is similar to the pixel information of the area at the same position as the child box in the three-dimensional image and has a larger volume than the child box in the three-dimensional image Department and
The shape data of the area indicated by the child box in the three-dimensional shape data, and the shape data of the area at the same position as the parent box for image in the three-dimensional shape data reduced to the size of the child box. A shape data correction unit to correct with
An output unit that outputs the corrected three-dimensional shape data;
An object shape calculation device comprising:

An image input unit for inputting a moving image,
A shape input unit for inputting two-dimensional shape data representing a general area occupied by an object in each frame of a moving image;
A three-dimensional conversion unit that connects the moving image and the two-dimensional shape data with each other in a time direction to form a three-dimensional spatio-temporal image and three-dimensional spatio-temporal shape data;
Using the three-dimensional spatio-temporal shape data, a plurality of child boxes are set along the boundary surface between the object and the background other than the object, which are regions that are units of correction processing of the three-dimensional spatio-temporal shape data. Child box setting section,
An image for searching for an image parent box that is similar to the pixel information of an area at the same position as the child box in the three-dimensional spatiotemporal image and has a larger volume than the child box in the three-dimensional spatiotemporal image Parent box search unit for
The shape data of the area indicated by the child box in the three-dimensional spatio-temporal shape data, and the shape data of the area at the same position as the parent box for the image in the three-dimensional spatio-temporal shape data are the size of the child box. A shape data correction unit that corrects with the reduced size,
An output unit for outputting the corrected three-dimensional spatiotemporal shape data,
An object shape calculation device comprising:

An image input step of inputting three-dimensional image data;
A shape input step of inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image;
A child box setting step of setting a plurality of child boxes, which are units serving as a unit of correction processing of the three-dimensional shape data, along the boundary between the object and a background other than the object using the three-dimensional shape data When,
An image parent box for searching for an image parent box, which is similar to the pixel information of the area located at the same position as the child box in the three-dimensional image and is larger in volume than the child box. A search step;
The shape data of the area indicated by the child box in the three-dimensional shape data, and the shape data of the area at the same position as the parent box for image in the three-dimensional shape data reduced to the size of the child box. A replacement step to replace
An output step of outputting the three-dimensional shape data;
An object shape calculation method comprising:

An image input step of inputting a moving image,
A shape input step of inputting time-series two-dimensional shape data representing a general area occupied by an object in each frame of the moving image;
A three-dimensional step of generating a three-dimensional spatio-temporal image by connecting each frame of the moving image in the time direction to generate a three-dimensional spatio-temporal image and connecting the time-series two-dimensional shape data in the time direction; When,
Using the three-dimensional spatio-temporal shape data, a plurality of child boxes are set along the boundary surface between the object and the background other than the object, which are regions that are units of correction processing of the three-dimensional spatio-temporal shape data. Child box setting step,
Search for an image parent box, which is a region in the three-dimensional spatiotemporal image that is similar to the pixel information of the region at the same position as the child box in the three-dimensional spatiotemporal image and has a larger volume than the child box. An image parent box search step;
The shape data of the area indicated by the child box in the three-dimensional spatio-temporal shape data, and the shape data of the area at the same position as the parent box for the image in the three-dimensional spatio-temporal shape data are the size of the child box. A replacement step to replace with a reduced version of
A two-dimensional conversion step of converting the three-dimensional spatio-temporal shape data into time-series two-dimensional shape data;
An output step of outputting the time-series two-dimensional shape data;
An object shape calculation method comprising:

The method according to claim 4, wherein the replacing step is repeated a plurality of times.

The child box setting step, the parent box search step, the replacement step,
5. The method according to claim 4, wherein the method is repeatedly executed a plurality of times while reducing the size of the child box.

In the image input step,
Over a predetermined range outside from the end of each frame of the moving image, and copy the data of the pixels of the end,
A predetermined number of frames at the beginning of the moving image are added to the beginning of the moving image,
The object shape calculation method according to claim 4, wherein a predetermined number of frames at the end of the moving image are added to the end of the moving image.

In the shape input step,
Along with copying the data of the pixel at the end over a predetermined range from the end of the time-series two-dimensional shape data,
Adding a predetermined number of the time-series two-dimensional shape data corresponding to the first frame of the moving image to the beginning of the time-series two-dimensional shape data;
5. The object shape calculation method according to claim 4, wherein a predetermined number of the time-series two-dimensional shape data corresponding to the end frame of the moving image is added to the end of the time-series two-dimensional shape data.

In the three-dimensionalization step, a three-dimensional spatio-temporal image is generated using the moving image of the section while changing the range of the section of interest while paying attention to a partial section of the moving image and corresponding to the section. Generate three-dimensional spatio-temporal shape data from time-series two-dimensional shape data,
Performing the child box setting step, the parent box searching step, the replacing step, the two-dimensionalizing step, and the outputting step for each of the generated set of the three-dimensional spatio-temporal image and the three-dimensional spatio-temporal shape data. Item 5. The object shape calculation method according to Item 4.

The object shape calculation method according to claim 9, wherein
The method according to claim 9, wherein when changing the range of the section, at least a part of the range is set to overlap with a range set immediately before.

11. The object shape calculation method according to claim 10, wherein, in the child box setting step, the setting of a previously set child box is used as it is.

In the three-dimensionalization step, when there is no time-series two-dimensional shape data corresponding to a certain frame, the three-dimensional spatio-temporal shape data is converted using the two-dimensional shape data of the frame predicted from the known two-dimensional shape data. The method according to claim 9, wherein the object shape is generated.

On the computer,
An image input step of inputting three-dimensional image data;
A shape input step of inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image;
A child box setting step of setting a plurality of child boxes, which are units serving as a unit of correction processing of the three-dimensional shape data, along the boundary between the object and a background other than the object using the three-dimensional shape data When,
An image parent box for searching for an image parent box, which is similar to the pixel information of the area located at the same position as the child box in the three-dimensional image and is larger in volume than the child box. A search step;
The shape data of the area indicated by the child box in the three-dimensional shape data, and the shape data of the area at the same position as the parent box for image in the three-dimensional shape data reduced to the size of the child box. A replacement step to replace
An output step of outputting the three-dimensional shape data;
Object shape calculation program for executing

On the computer,
An image input step of inputting a moving image,
A shape input step of inputting time-series two-dimensional shape data representing a general area occupied by an object in each frame of the moving image;
A three-dimensional step of generating a three-dimensional spatio-temporal image by connecting each frame of the moving image in the time direction to generate a three-dimensional spatio-temporal image and connecting the time-series two-dimensional shape data in the time direction; When,
Using the three-dimensional spatio-temporal shape data, a plurality of child boxes are set along the boundary surface between the object and the background other than the object, which are regions that are units of correction processing of the three-dimensional spatio-temporal shape data. Child box setting step,
Search for an image parent box, which is a region in the three-dimensional spatiotemporal image that is similar to the pixel information of the region at the same position as the child box in the three-dimensional spatiotemporal image and has a larger volume than the child box. An image parent box search step;
The shape data of the area indicated by the child box in the three-dimensional spatio-temporal shape data, and the shape data of the area at the same position as the parent box for the image in the three-dimensional spatio-temporal shape data are the size of the child box. A replacement step to replace with a reduced version of
A two-dimensional conversion step of converting the three-dimensional spatio-temporal shape data into time-series two-dimensional shape data;
An output step of outputting the time-series two-dimensional shape data;
Object shape calculation program for executing

Computer
Image input means for inputting a three-dimensional image,
Shape input means for inputting three-dimensional shape data representing a general area occupied by an object in the three-dimensional image;
Child box setting means for setting a plurality of child boxes, which are three-dimensional regions serving as a unit of correction processing of the shape data, along the boundary between the object and a background other than the object using the three-dimensional shape data ,
An image parent box search for searching for an image parent box that is similar to the pixel information of the area at the same position as the child box in the three-dimensional image and has a larger volume than the child box in the three-dimensional image means,
The shape data of the area indicated by the child box in the three-dimensional shape data, and the shape data of the area at the same position as the parent box for image in the three-dimensional shape data reduced to the size of the child box. Shape data correcting means for correcting by; and
Output means for outputting the corrected three-dimensional shape data;
Object shape calculation program to function as

Computer
Image input means for inputting a moving image,
Shape input means for inputting two-dimensional shape data representing a general area occupied by an object in each frame of a moving image;
A three-dimensional conversion unit that connects the moving image and the two-dimensional shape data in the time direction to form a three-dimensional spatio-temporal image and three-dimensional spatio-temporal shape data;
Using the three-dimensional spatio-temporal shape data, a plurality of child boxes are set along the boundary surface between the object and the background other than the object, which are regions that are units of correction processing of the three-dimensional spatio-temporal shape data. Child box setting means,
An image for searching for an image parent box that is similar to the pixel information of an area at the same position as the child box in the three-dimensional spatiotemporal image and has a larger volume than the child box in the three-dimensional spatiotemporal image Parent box search means for
The shape data of the area indicated by the child box in the three-dimensional spatio-temporal shape data, and the shape data of the area at the same position as the parent box for the image in the three-dimensional spatio-temporal shape data are the size of the child box. Shape data correction means to correct with reduced size,
Output means for outputting the corrected three-dimensional spatiotemporal shape data,
Object shape calculation program to function as