JP5561786B2

JP5561786B2 - Three-dimensional shape model high accuracy method and program

Info

Publication number: JP5561786B2
Application number: JP2011074340A
Authority: JP
Inventors: 浩嗣三功; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-03-30
Filing date: 2011-03-30
Publication date: 2014-07-30
Anticipated expiration: 2031-03-30
Also published as: JP2012208759A

Description

本発明は、被写体を撮影した画像と背景のみを撮影した画像から、被写体３次元モデルを高精度に復元する方法およびプログラムに関する。 The present invention relates to a method and program for restoring a subject three-dimensional model with high accuracy from an image obtained by photographing a subject and an image obtained by photographing only a background.

自由視点映像の画質は、モデルベースの合成方式を考えた場合、多視点映像より生成される３次元形状モデルの精度に大きく左右される。多視点映像をもとに被写体の３次元形状モデル（３次元ボクセルデータ）を構築する代表的な手法として視体積交差法がある（非特許文献１）。しかしながら、この方法には、物体の凹領域を復元できないという原理的課題があった。 The image quality of the free viewpoint video greatly depends on the accuracy of the three-dimensional shape model generated from the multi-view video when considering a model-based synthesis method. There is a visual volume intersection method as a typical method for constructing a three-dimensional shape model (three-dimensional voxel data) of a subject based on a multi-viewpoint image (Non-Patent Document 1). However, this method has a fundamental problem that the concave area of the object cannot be restored.

上記の原理的課題に対して、非特許文献２で提案されるｓｐａｃｅｃａｒｖｉｎｇを始めとした様々な手法が提案されている。また、特願２００９−１９５３３４号では、各カメラ画像間でステレオマッチングをもとに高精度化する手法が提案されている。 Various methods such as space carving proposed in Non-Patent Document 2 have been proposed with respect to the above-described principle problem. Japanese Patent Application No. 2009-195334 proposes a technique for improving the accuracy of each camera image based on stereo matching.

ＶｉｓｕａｌＨｕｌｌ高精度化に関する研究として、ＰｈｏｔｏＣｏｎｓｉｓｔｅｎｃｙのみならず、ＶｉｓｕａｌＨｕｌｌ表面形状の安定性を考慮したエネルギー関数を定義し、最適化の枠組みにより形状を補正する手法が提案されている。非特許文献３では、ステレオマッチングによる整形手法が提案されている。当該手法は、ＰｈｏｔｏＣｏｎｓｉｓｔｅｎｃｙによりカメラから被写体までの距離値をＶｉｓｕａｌＨｕｌｌ内側へ押し込むための外力と、ＶｉｓｕａｌＨｕｌｌ表面の局所的な形状特徴を維持するための内力の線形結合により評価関数を定義し、最小化する距離値を求めることで、被写体の表面形状を決定する。Ｓｐａｃｅｃａｒｖｉｎｇ等の単一ボクセルに注目する手法に比べて、表面上に不自然な凹凸が発生することが少なく、滑らかな形状を復元できる結果が示されている。 As a study on high accuracy of Visual Hull, not only Photo Consistency but also a method of defining an energy function considering the stability of the Visual Hull surface shape and correcting the shape by an optimization framework has been proposed. Non-Patent Document 3 proposes a shaping method based on stereo matching. The method defines an evaluation function by a linear combination of an external force for pushing the distance value from the camera to the subject into the Visual Hull inside by Photo Consistency and an internal force for maintaining the local shape feature of the Visual Hull surface. The surface shape of the subject is determined by obtaining the distance value to be minimized. Compared to a method that focuses on a single voxel such as Space carving, the result is that unnatural irregularities are less likely to occur on the surface, and a smooth shape can be restored.

一方、ボクセル空間中での隣接関係を考慮したグラフカットによりＶｉｓｕａｌＨｕｌｌの整形を行う手法が提案されている。非特許文献４では、ＶｉｓｕａｌＨｕｌｌ表面近傍のボクセルに関して、ボクセル単体のＰｈｏｔｏＣｏｎｓｉｓｔｅｎｃｙに基づくエネルギーと、ボクセル間の隣接関係に基づくエネルギー関数を定義し、グラフカットを適用することで、滑らかな形状を復元できることを示している。さらに、ＶｉｓｕａｌＨｕｌｌの内側に、被写体が確実に存在すると仮定するＣｏｒｅ領域を設け、グラフカットの適用範囲をＶｉｓｕａｌＨｕｌｌ表面とＣｏｒｅ領域間に限定することで、被写体領域が誤って削られる可能性を低減することに成功している。 On the other hand, a method for shaping Visual Hull by graph cut in consideration of the adjacency relationship in the voxel space has been proposed. In Non-Patent Document 4, for voxels near the Visual Hull surface, energy based on Photo Consistency of voxels alone and energy function based on adjacency relationship between voxels is defined, and smooth shape is restored by applying graph cut It shows what you can do. Furthermore, by providing a Core area that assumes that the subject is surely present inside the Visual Hull, and limiting the application range of the graph cut between the Visual Hull surface and the Core area, the subject area may be accidentally shaved. It has succeeded in reducing.

豊浦正広、飯山将晃、舩冨卓哉、角所考、美濃導彦、「欠損および過抽出を含む時系列シルエットからの三次元形状獲得」、電子情報通信学会技術研究報告，PRMU2007-168，Vol.107，No.427，pp.69-74，2008-1Masahiro Toyoura, Masatsugu Iiyama, Takuya Tsuji, Kakudo Kou, Tetsuhiko Mino, “Acquisition of three-dimensional shape from time series silhouette including defect and over-extraction”, IEICE Technical Report, PRMU2007-168, Vol .107, No.427, pp.69-74, 2008-1 Kutulakos他「ATheory of Shape by Space Carving 」 International Journal of Computer Vision ２０００年Kutulakos et al. "ATheory of Shape by Space Carving" International Journal of Computer Vision 2000 冨山仁博他「局所的形状特徴に拘束された3次元形状復元手法とそのリアルタイム動画表示」映像情報メディア学会誌, 61, 4, pp. 471-481 (2007)Ninohiro Hiyama et al. “Reconstruction method of 3D shape constrained by local shape features and its real-time video display” Journal of the Institute of Image Information and Television Engineers, 61, 4, pp. 471-481 (2007) Hisatomi他「Methodof 3D reconstruction using graph cuts, and its application to preservingintangible cultural heritage」 IEEE conference on ICCV, pp. 923 - 930 (2009).Hisatomi et al. `` Methodof 3D reconstruction using graph cuts, and its application to preserving intangible cultural heritage '' IEEE conference on ICCV, pp. 923-930 (2009).

しかしながら、非特許文献２に記載の提案方式等のほとんどの手法は、ボクセル単体のＰｈｏｔｏＣｏｎｓｉｓｔｅｎｃｙのみに注目して除去するか否かを決定するため、最終的に獲得されるＶｉｓｕａｌＨｕｌｌの復元精度は、被写体のテクスチャ状態に大きく依存するという問題があった。また、特願２００９−１９５３３４号では、マッチング誤差の影響により復元精度が低下するという問題が顕著であった。 However, since most of the methods such as the proposed method described in Non-Patent Document 2 determine whether or not to remove by focusing only on Photo Consistency of a single voxel, the restoration accuracy of Visual Hall finally obtained is There is a problem that it largely depends on the texture state of the subject. In Japanese Patent Application No. 2009-195334, the problem that the restoration accuracy is lowered due to the influence of the matching error is significant.

また、非特許文献３の手法では、最終的な復元結果は、ステレオマッチングの探索精度に大きく依存するため、被写体のテクスチャ変化が少ない領域では十分な復元精度が得られないという問題が残っている。 In the method of Non-Patent Document 3, since the final restoration result largely depends on the search accuracy of stereo matching, there remains a problem that sufficient restoration accuracy cannot be obtained in a region where the texture change of the subject is small. .

また、非特許文献４の手法では、ボクセル間の隣接関係に基づくエネルギー関数が、単純にボクセル単体のエネルギー値の平均で定義されているため、ボクセル空間中の隣接関係を十分に考慮できているとは言えない。また、グラフカットの適用が一回に限定されているのに加え、適用結果を評価するプロセスが存在しないため、最終的に復元される３次元形状モデルに、背景領域が含まれる可能性が高いと考えられる。 In the method of Non-Patent Document 4, since the energy function based on the adjacency relationship between voxels is simply defined by the average of the energy values of a single voxel, the adjacency relationship in the voxel space can be sufficiently considered. It can not be said. In addition to the fact that the application of the graph cut is limited to one time, there is no process for evaluating the application result, so there is a high possibility that the background region is included in the finally restored three-dimensional shape model it is conceivable that.

以上の問題点を踏まえ、本発明では、ボクセル空間中での連続性を十分に考慮するとともに、整形過程の３次元形状モデルをもとに、実カメラ視点で生成される自由視点画像の画質を評価するプロセスを導入することで、３次元形状モデルのテクスチャ状態を考慮したＶｉｓｕａｌＨｕｌｌ高精度化方法およびプログラムを提供することを目的とする。 Based on the above problems, the present invention fully considers the continuity in the voxel space and, based on the three-dimensional shape model of the shaping process, determines the image quality of the free viewpoint image generated from the real camera viewpoint. It is an object of the present invention to provide a Visual Hull high-accuracy method and program in consideration of the texture state of a three-dimensional shape model by introducing a process for evaluation.

上記目的を実現するため本発明による３次元形状モデル高精度化方法は、多視点のカメラ画像から被写体の３次元形状モデルを復元する方法であって、各撮像カメラの被写体シルエット画像から視体積交差法により復元されたＶｉｓｕａｌＨｕｌｌあるいは整形されたＶｉｓｕａｌＨｕｌｌを入力とし、該ＶｉｓｕａｌＨｕｌｌの表面に存在するボクセルのオブジェクトらしさに関する尤度を算出する算出ステップと、前記オブジェクトらしさに関する尤度をもとに当該ＶｉｓｕａｌＨｕｌｌの整形を行う第１の整形ステップと、前記第１の整形ステップで整形されたＶｉｓｕａｌＨｕｌｌと前記算出ステップで入力とした直前のＶｉｓｕａｌＨｕｌｌとの比較をもとにＶｉｓｕａｌＨｕｌｌの整形が収束したか否かを判定し、ＶｉｓｕａｌＨｕｌｌの整形が収束したと判定されるまで、前記第１の整形ステップで整形されたＶｉｓｕａｌＨｕｌｌを前記算出ステップの入力として、前記算出ステップと前記第１の整形ステップを繰り返し適用する第１の収束ステップと、前記第１の収束ステップで整形が収束したと判定されたＶｉｓｕａｌＨｕｌｌから獲得される３次元形状モデルあるいは整形された３次元形状モデルを入力とし、該３次元形状モデルのテクスチャ状態を評価する評価ステップと、前記３次元形状モデルのテクスチャ状態の評価をもとに該３次元形状モデルの整形を行う第２の整形ステップと、前記第２の整形ステップで整形された３次元形状モデルと前記評価ステップで入力とした直前の３次元形状モデルとの比較をもとに３次元形状モデルの整形が収束したか否かを判定し、３次元形状モデルの整形が収束したと判定されるまで、前記第２の整形ステップで整形された３次元形状モデルを前記評価ステップの入力として、前記評価ステップと前記第２の整形ステップを繰り返し適用する第２の収束ステップとを含む。 3-dimensional shape model accuracy enhancement method according to the present invention for achieving the above object, a method for restoring a three-dimensional shape model of the object from the multi-view camera image, the volume intersection viewed from the object silhouette image of each imaging camera as input Visual Hull or the shaped Visual Hull restored by law, a calculation step of calculating the likelihood for an object ness of voxels on the surface of the Visual Hull, the based on the likelihood for said object ness Based on the comparison between the first shaping step for shaping the Visual Hull, the Visual Hull shaped in the first shaping step, and the last Visual Hull input in the calculation step, the shaping of the Visual Hull is converged. and whether or not the decision was, Vis it is determined that shaping al Hull has converged until a Visual Hull shaped by the first shaping step as the input of the calculation step, a first applying repeatedly said first shaping step and the calculation step The three-dimensional shape model acquired from the Visual Hull that is determined to have converged in the convergence step and the first convergence step or the shaped three-dimensional shape model is input, and the texture state of the three-dimensional shape model is input. An evaluation step for evaluation, a second shaping step for shaping the three-dimensional shape model based on the evaluation of the texture state of the three-dimensional shape model, and the three-dimensional shape model shaped in the second shaping step And 3D shape model shaping based on the comparison with the previous 3D shape model input in the evaluation step. It is determined whether or not the bundle, it is determined that the shaping of a three-dimensional shape model has converged until a three-dimensional shape model that is shaped by the second shaping step as the input of said evaluation step, and the evaluation step A second convergence step that repeatedly applies the second shaping step.

また、前記算出ステップは、前記ＶｉｓｕａｌＨｕｌｌの表面に存在する各ボクセルを各撮影カメラ視点に投影し、撮影カメラ間における投影画素値の分散を算出し、前記分散を正規化することで前記オブジェクトらしさに関する尤度を算出することも好ましい。 Further, the calculating step projects each voxel existing on the surface of the Visual Hull onto each photographing camera viewpoint, calculates a variance of projection pixel values between the photographing cameras, and normalizes the variance to thereby make the object likeness. It is also preferable to calculate the likelihood for.

また、前記第１の整形ステップは、３次元ボクセル空間における各ボクセル間の隣接関係を考慮したエネルギー関数を定義し、前記エネルギー関数を最小化する枠組みで、各ボクセルを被写体領域または背景領域のいずれかに割り当てることで被写体領域を決定し、背景領域を不要部として除去することで、整形を行うことも好ましい。 In the first shaping step, an energy function is defined that considers the adjacent relationship between the voxels in the three-dimensional voxel space, and each voxel is defined as a subject area or a background area in a framework that minimizes the energy function. It is also preferable to perform shaping by deciding the subject area by assigning it and removing the background area as an unnecessary part.

また、前記第１の収束ステップは、前記第１の整形ステップで整形されたＶｉｓｕａｌＨｕｌｌと前記算出ステップで入力とした直前のＶｉｓｕａｌＨｕｌｌを比較し、表面として指定したボクセルより内側のボクセルが削られていない場合を整形が収束したと判定することも好ましい。 In the first convergence step, the Visual Hull shaped in the first shaping step is compared with the Visual Hull just input as input in the calculation step, and the voxel inside the voxel designated as the surface is cut. It is also preferable to determine that the shaping has converged when it is not.

また、前記評価ステップは、各撮影カメラ視点で、前記３次元形状モデルから生成される自由視点画像と前記カメラ画像との差分画像を抽出することで、該３次元形状モデルのテクスチャ状態を評価することも好ましい。 In addition, the evaluation step, in the imaging camera viewpoint, by extracting a difference image between the free viewpoint image generated from the 3-dimensional shape model and the camera image, evaluating the texture state of the three-dimensional shape model It is also preferable to do .

また、前記第２の整形ステップは、各撮影カメラ視点において、前記差分画像の画素間での隣接関係を考慮したエネルギー関数を定義し、前記エネルギー関数を最小化する枠組みで、各画素を被写体領域または背景領域のいずれに割り当てるかを決定し、背景領域と判断される整形候補内に存在する画素の光線を探索し、前記３次元形状モデルとの交点を不要部として除去することで、前記３次元形状モデルの整形を行うことも好ましい。 In the second shaping step, an energy function that considers the adjacent relationship between the pixels of the difference image is defined at each photographing camera viewpoint, and each pixel is applied to a subject area in a framework that minimizes the energy function. Alternatively, it is determined which one of the background areas is to be assigned, the light rays of the pixels existing in the shaping candidates determined to be the background area are searched, and the intersection with the three-dimensional shape model is removed as an unnecessary part, thereby It is also preferable to shape the dimensional shape model.

また、前記第２の収束ステップは、前記第２の整形ステップで整形された３次元形状モデルと前記評価ステップで入力とした直前の３次元形状モデルを比較し、削られるボクセル数が一定数以下である場合を整形が収束したと判定することも好ましい。 The second convergence step is to compare the three-dimensional shape model immediately before was input in the second of said evaluation step and the shaped three-dimensional shape model by shaping step, the number of voxels is than one constant to be scraped It is also preferable to determine that the shaping has converged .

上記目的を実現するため本発明によるプログラムは、多視点のカメラ画像から被写体の３次元形状モデルを復元するためのコンピュータを、各撮像カメラの被写体シルエット画像から視体積交差法により復元されたＶｉｓｕａｌＨｕｌｌあるいは整形されたＶｉｓｕａｌＨｕｌｌを入力とし、該ＶｉｓｕａｌＨｕｌｌの表面に存在するボクセルのオブジェクトらしさに関する尤度を算出する算出手段と、前記オブジェクトらしさに関する尤度をもとに当該ＶｉｓｕａｌＨｕｌｌの整形を行う第１の整形手段と、前記第１の整形手段で整形されたＶｉｓｕａｌＨｕｌｌと前記算出手段で入力とした直前のＶｉｓｕａｌＨｕｌｌとの比較をもとにＶｉｓｕａｌＨｕｌｌの整形が収束したか否かを判定し、ＶｉｓｕａｌＨｕｌｌの整形が収束したと判定されるまで、前記第１の整形手段で整形されたＶｉｓｕａｌＨｕｌｌを前記算出手段の入力として、前記算出手段と前記第１の整形手段を繰り返し適用する第１の収束手段と、前記第１の収束手段で整形が収束したと判定されたＶｉｓｕａｌＨｕｌｌから獲得される３次元形状モデルあるいは整形された３次元形状モデルを入力とし、該３次元形状モデルのテクスチャ状態を評価する評価手段と、前記３次元形状モデルのテクスチャ状態の評価をもとに該３次元形状モデルの整形を行う第２の整形手段と、前記第２の整形手段で整形された３次元形状モデルと前記評価手段で入力とした直前の３次元形状モデルとの比較をもとに３次元形状モデルの整形が収束したか否かを判定し、前記３次元形状モデルの整形が収束したと判定されるまで、前記第２の整形手段で整形された３次元形状モデルを前記評価手段の入力として、前記評価手段と前記第２の整形手段を繰り返し適用する第２の収束手段として機能させ、３次元形状モデルを復元する。 In order to achieve the above object, a program according to the present invention uses a computer for restoring a three-dimensional shape model of a subject from multi-viewpoint camera images, and a visual hull restored from the subject silhouette images of each imaging camera by a visual volume intersection method. or as input the shaped Visual Hull, first performs a calculation means for calculating the likelihood for an object ness of voxels on the surface of the Visual Hull, the original to shaping of the Visual Hull a likelihood for said object ness It is determined whether or not the Visual Hull shaping has converged based on a comparison between the first shaping means and the Visual Hull shaped by the first shaping means and the last Visual Hull input by the calculation means. , convergence is shaping of the Visual Hull Is determined that until the Visual Hull shaped by the first shaping means as an input of said calculation means, a first converging means for repeatedly applying the said calculating means first shaping means, said first The evaluation means for evaluating the texture state of the three-dimensional shape model, using as input the three-dimensional shape model acquired from the Visual Hull that has been determined that the shaping has converged by the convergence means, or the shaped three-dimensional shape model ; a second shaping means based on the evaluation of the texture state of the three-dimensional shape model performs shaping of the three-dimensional shape model, and input in the second three-dimensional geometric model and the evaluation means being shaped by the shaping means Based on the comparison with the immediately preceding three-dimensional shape model, it is determined whether or not the shaping of the three-dimensional shape model has converged, and it is determined that the shaping of the three-dimensional shape model has converged. Re until the three-dimensional shape model that is shaped by the second shaping means as an input of the evaluation unit, to serve as a second converging means for repeatedly applying said second shaping means and the evaluation means, 3 Restore the dimensional shape model.

本発明により、ＶｉｓｕａｌＨｕｌｌの高精度化を実現でき、最終的に復元される３次元形状をもとに生成される自由視点映像を高画質化することが可能となる。 According to the present invention, it is possible to achieve high accuracy of Visual Hull, and it is possible to improve the quality of a free viewpoint video generated based on the finally restored three-dimensional shape.

本発明によるフローチャートを示す。2 shows a flowchart according to the invention. ボクセル空間内のＶｉｓｕａｌＨｕｌｌの表面近傍を各視点に投影することを示す。It shows that the vicinity of the surface of the Visual Hull in the voxel space is projected to each viewpoint. 多視点カメラ画像の例を示す。An example of a multi-viewpoint camera image is shown. オリジナルの３次元形状モデルを示す。The original three-dimensional shape model is shown. 入力のＶｉｓｕａｌＨｕｌｌを示す。Indicates the input Visual Hull. ステップ４を終了した時点で整形されたＶｉｓｕａｌＨｕｌｌを示す。The Visual Hull that has been shaped when Step 4 is completed is shown. 最終的に整形された３次元形状モデルを示す。The final shaped three-dimensional shape model is shown.

本発明を実施するための最良の実施形態について、以下では図面を用いて詳細に説明する。提案手法は、ボクセル空間中での連続性を考慮した整形、および各視点での自由視点画像の画質を考慮した整形を特徴とする。本発明によるフローチャートを図１に示す。ある時刻１フレーム分の多視点カメラ画像と、各視点のカメラパラメータ、および各カメラの被写体シルエット画像から視体積交差法で復元されるＶｉｓｕａｌＨｕｌｌを入力として、最終的に整形済みの３次元形状モデルを出力し、処理を終了する。以下、本フローチャートに基づいて説明する。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings. The proposed method is characterized by shaping in consideration of continuity in the voxel space and shaping in consideration of the image quality of the free viewpoint image at each viewpoint. A flowchart according to the present invention is shown in FIG. A three-dimensional shape model that has been finally shaped by inputting a multi-view camera image for one frame of time, a camera parameter of each viewpoint, and a Visual Hull restored by the visual volume intersection method from the subject silhouette image of each camera. Is output and the process ends. Hereinafter, description will be given based on this flowchart.

ステップ１：ＶｉｓｕａｌＨｕｌｌ表面近傍のボクセルのオブジェクトらしさを算出する。図２に示すように、ボクセル空間内のＶｉｓｕａｌＨｕｌｌ表面近傍を各カメラ視点に投影する。視点ｉ（ｉ＝１，…，Ｎ）に投影された、カメラ画像内の座標をｖ_ｉとし、この点での投影画素値は、特定の色空間における多次元ベクトルｘ（ｖ_ｉ）として表される。色空間は例えば、ＲＧＢ空間が挙げられる。オブジェクトらしさに関する尤度を算出するため、投影画素値の各カメラ視点での平均および分散を算出する。
平均ベクトルｕ（ｖ_ｉ）は、
で算出される。ここで、＃ｎ_ｖｉｓ（ｖ_ｉ）は、オクルージョンのない視点（表面近傍のボクセルが見える視点）の数を表し、Σは、オクルージョンのない視点で行われる。ＲＧＢ空間の場合、平均ベクトルｕ（ｖ_ｉ）は、ｕ_ｒ（ｖ_ｉ）、ｕ_ｇ（ｖ_ｉ）、ｕ_ｂ（ｖ_ｉ）の３つの成分を有する。
分散ベクトルσ（ｖ_ｉ）は、
で算出される。同様に、ＲＧＢ空間の場合、分散ベクトルσ^２（ｖ_ｉ）は、σ^２ _ｒ（ｖ_ｉ）、σ^２ _ｇ（ｖ_ｉ）、σ^２ _ｂ（ｖ_ｉ）の３つの成分を有する。 Step 1: The object-likeness of the voxel near the Visual Hull surface is calculated. As shown in FIG. 2, the vicinity of the Visual Hull surface in the voxel space is projected to each camera viewpoint. Viewpoint i (i = 1, ..., N) projected on the coordinates in the camera image and v _i, the projection pixel value at this point, the table as a multidimensional vector x (v _i) in a particular color space Is done. An example of the color space is an RGB space. In order to calculate the likelihood related to the object likeness, the average and variance of the projected pixel values at the camera viewpoints are calculated.
The mean vector u (v _i ) is
Is calculated by Here, # n _{vis (v} _i) represents the number of occlusion without viewpoint (viewpoint voxels in the vicinity of the surface is visible), sigma is performed in perspective without occlusion. In the case of the RGB space, the average vector u (v _i ) has three components, u _r (v _i ), u _g (v _i ), and u _b (v _i ).
The variance vector σ (v _i ) is
Is calculated by Similarly, in the case of RGB space, the dispersion vector σ ² (v _i ) has three components, σ ² _r (v _i ), σ ² _g (v _i ), and σ ² _b (v _i ).

この分散を正規化することにより、オブジェクトらしさに関する尤度を求める。なお、正規化とは、σ^２（ｖ_ｉ）（ｉ＝１，…，Ｎ）の最大値でσ^２（ｖ_ｉ）を除算して、最大値を１．０にしたものである。以下の式に現れているσ^２は、すべて正規化された後の分散を表している。 By normalizing this variance, the likelihood related to object-likeness is obtained. Note that normalization is obtained by dividing σ ² (v _i ) by the maximum value of σ ² (v _i ) (i = 1,..., N) and setting the maximum value to 1.0. [Sigma] ² appearing in the following expression represents the variance after normalization.

ステップ２：ボクセル空間中でのエネルギー関数を定義する。
エネルギー関数は、
で定義される。ここでｖ＝（ｖ_１，…，ｖ_ｉ，…，ｖ_Ｎ）は、カメラ画像内の座標をｖ_ｉを表し、λ_ｖは正の定数であり、ａ_ｖは、ボクセル空間中の各ボクセルを背景領域または被写体領域のいずれかに割り当てるかをそれぞれ０または１で表している。 Step 2: Define an energy function in the voxel space.
The energy function is
Defined by Where _{_{v = (v 1, ...,}} v i, ..., v N) , the coordinates in the camera image represents _{v i,} lambda _v is a positive constant, _{a v,} each voxel in the voxel space Is assigned to either the background area or the subject area by 0 or 1, respectively.

Ｕ（ｖ；ａ_ｖ）は、各ボクセル単体の尤度値のみに依存するデータ項であり、ａ_ｖで指定される領域に割り当てる場合のエネルギー値は、以下の式で与えられる。
ここで、Σは視点ｉ（ｉ＝１，…，Ｎ）について総和を取り、ｔｈ_ｖ（ｃ）は、一定の閾値である。上式はＲＧＢ空間の場合の式を示している。上式で、例えば、ｔｈ_ｖ（ｃ）＝１．２とした場合、σ_ｃ ^２＝０．５であるとき、ｔｈ_ｖ（ｃ）−σ_ｃ ^２＝０．７、σ_ｃ ^２＝０．５であるため、ａ＝１、つまり被写体領域に割り当てた方が、エネルギー値が低くなる。σ_ｃ ^２＝０．７であるとき、ｔｈ_ｖ（ｃ）−σ_ｃ ^２＝０．５、σ_ｃ ^２＝０．７であるため、ａ＝０、つまり背景領域に割り当てた方が、エネルギー値が低くなる。つまり、分散が小さいとき、被写体領域に割り当てた方が、エネルギー値が低くなる。 U (v; a _v ) is a data term that depends only on the likelihood value of each voxel alone, and the energy value when allocating to the region specified by a _v is given by the following equation.
Here, Σ takes the sum for the viewpoint i (i = 1,..., N), and th _{v (c)} is a constant threshold value. The above formula shows the formula in the case of RGB space. In the above formula, for example, when th _{v (c)} = 1.2, when σ _c ² = 0.5, th _{v (c)} −σ _c ² = 0.7, σ _c ² = 0. Since it is 5, the energy value is lower when a = 1, that is, when assigned to the subject area. When σ _c ² = 0.7, th _{v (c)} −σ _c ² = 0.5 and σ _c ² = 0.7, so that a = 0, that is, the energy allocated to the background region The value becomes lower. In other words, when the variance is small, the energy value is lower when assigned to the subject area.

一方、Ｖ（ｖ；ａ_ｖ）は、平滑化項であり、隣接ボクセル間の尤度値の差をもとに以下の式で算出される．
ここで、ｄｉｓ（ｉ，ｊ）はボクセル間のユークリッド距離を表し、κ_ｖは正の定数である。定数κ_ｖは隣接ボクセルの全組み合わせＮ_ｖに関する期待値の演算子＜・＞を用いて以下のように算出される。
On the other hand, V (v; a _v ) is a smoothing term, and is calculated by the following equation based on the difference in likelihood values between adjacent voxels.
Here, dis (i, j) represents the Euclidean distance between the voxels, the kappa _v is a positive constant. The constant κ _v is calculated as follows using the operator <·> of expected values for all combinations N _v of adjacent voxels.

ステップ３：エネルギー最小化に基づくＶｉｓｕａｌＨｕｌｌの整形を行う。上記エネルギー値Ｅ（ｖ；ａ_ｖ）が最小になるように、画素値に０または１を割り当てる。これにより、ＶｉｓｕａｌＨｕｌｌの整形を行う。エネルギー値の最小化は、例えば、Graph-cutのアルゴリズムを用いる。ここで、ａ_ｖ＝０、つまり背景領域に割り当てられたボクセルを不要部として除去する。 Step 3: Perform Visual Hull shaping based on energy minimization. 0 or 1 is assigned to the pixel value so that the energy value E (v; a _v ) is minimized. As a result, Visual Hull is shaped. For example, Graph-cut algorithm is used to minimize the energy value. Here, a _v = 0, that is, voxels assigned to the background area are removed as unnecessary portions.

ステップ４：収束判定を行う。ステップ３で得られたＶｉｓｕａｌＨｕｌｌとステップ１の入力のＶｉｓｕａｌＨｕｌｌとを比較して、表面近傍のボクセルより内側のボクセルが削れていない場合、または削れたボクセルの数が一定の閾値内になったかどうかで収束判定を行う。収束が十分でないとき、ステップ３で得られたＶｉｓｕａｌＨｕｌｌをステップ１の入力としてステップ１からステップ３を繰り返す。 Step 4: Determine convergence. If the Visual Hull obtained in Step 3 is compared with the Visual Hull input in Step 1, the voxels inside the voxels near the surface are not shaved, or the number of shaved voxels falls within a certain threshold. Convergence judgment is done by somehow. When the convergence is not sufficient, Step 1 to Step 3 are repeated with Visual Hull obtained in Step 3 as input of Step 1.

ステップ５：各カメラ視点において、ステップ３で得られたＶｉｓｕａｌＨｕｌｌから獲得された３次元形状モデルをもとに自由視点画像を生成し、自由視点画像と撮影画像との差分画像を算出する。以下のステップでは、３次元形状モデルのテクスチャ状態を評価することにより、３次元形状モデルの整形を行う。 Step 5: At each camera viewpoint, a free viewpoint image is generated based on the three-dimensional shape model acquired from the Visual Hull obtained in Step 3, and a difference image between the free viewpoint image and the captured image is calculated. In the following steps, the three-dimensional shape model is shaped by evaluating the texture state of the three-dimensional shape model.

ステップ６：各撮影カメラ視点で、差分画像中でのエネルギー関数を定義する。各撮影カメラ視点において、前記差分画像の画素間での隣接関係を考慮したエネルギー関数を以下のように定義する。
で定義される。ここでｐは、差分画像の各画素を表し、λ_ｐは正の定数であり、ａ_ｐは、差分画像の各画素を背景領域または被写体領域のいずれかに割り当てるかをそれぞれ０または１で表している。 Step 6: Define an energy function in the difference image at each camera viewpoint. At each photographing camera viewpoint, an energy function that considers the adjacent relationship between pixels of the difference image is defined as follows.
Defined by Here, p represents each pixel of the difference image, λ _p is a positive constant, and _ap represents 0 or 1 to assign each pixel of the difference image to either the background region or the subject region. ing.

Ｕ（ｖ；ａ_ｐ）は、差分画像の各画素の尤度値のみに依存するデータ項であり、ａ_ｐで指定される領域に割り当てる場合のエネルギー値は、以下の式で与えられる。
ここで、Σは差分画像の各画素について総和を取り、ｘ_ｃ（ｐ_ｉ）は、画素ｐ_ｉでの画素値（ｘ_ｒ、ｘ_ｇ、またはｘ_ｂ）であり、ｔｈ_ｐ（ｃ）は、一定の閾値である。上式はＲＧＢ空間の場合の式を示している。 U (v; a _p ) is a data term that depends only on the likelihood value of each pixel of the difference image, and the energy value when allocating to the region specified by a _p is given by the following equation.
Here, Σ is the sum for each pixel of the difference image, x _c (p _i ) is the pixel value (x _r , x _g , or x _b ) at pixel p _i , and th _{p (c)} is , A certain threshold. The above formula shows the formula in the case of RGB space.

一方、Ｖ（ｖ；ａ_ｐ）は、平滑化項であり、隣接画素間の画素値の差をもとに以下の式で算出される．
ここで、ｄｉｓ_ｐ（ｉ，ｊ）は画素間のユークリッド距離を表し、κ_ｐは正の定数であり、ベクトルｘ（ｐ_ｉ）は、画素ｐ_ｉでの画素値ベクトル（ｘ_ｒ、ｘ_ｇ、ｘ_ｂ）である。定数κ_ｐは隣接画素間の全組み合わせＮ_ｖに関する距離の期待値の演算子＜・＞を用いて以下のように算出される。
On the other hand, V (v; a _p ) is a smoothing term, and is calculated by the following equation based on the difference in pixel values between adjacent pixels.
Here, dis _p (i, j) represents the Euclidean distance between the pixels, κ _p is a positive constant, and the vector x (p _i ) is the pixel value vector (x _r , x _g ) at the pixel p _i. , X _b ). Is a constant kappa _p using operator of an expected distance for all combinations N _v between adjacent pixels <-> is calculated as follows.

ステップ７：エネルギー最小化に基づき、ＶｉｓｕａｌＨｕｌｌの整形候補を特定する。上記エネルギー値Ｅ（ｖ；ａ_ｐ）が最小になるように、画素値に０または１を割り当てる。これにより、ＶｉｓｕａｌＨｕｌｌの整形候補を特定する。エネルギー値の最小化は、例えば、Graph-cutのアルゴリズムを用いる。 Step 7: Identify Visual Hull shaping candidates based on energy minimization. 0 or 1 is assigned to the pixel value so that the energy value E (v; a _p ) is minimized. Thereby, the Visual Hull shaping candidate is specified. For example, Graph-cut algorithm is used to minimize the energy value.

ステップ８：ＶｉｓｕａｌＨｕｌｌ整形候補に含まれる各画素の光線を探索し、３次元形状モデルとの交点を除去する。上記のステップ５から７は、各視点ｉ（ｉ＝１，…，Ｎ）のカメラ画像に行われる。ａ_ｐ＝０、つまり背景領域に割り当てた画素の光線を探索し、３次元形状モデルとの交点を不要部として除去する。これをすべてのカメラ画像に対して行い、不要なボクセルを削る。 Step 8: Search the ray of each pixel included in the Visual Hull shaping candidate and remove the intersection with the three-dimensional shape model. The above steps 5 to 7 are performed on the camera image of each viewpoint i (i = 1,..., N). a _p = 0, that is, search for the ray of the pixel assigned to the background area, and remove the intersection with the three-dimensional shape model as an unnecessary part. This is performed for all camera images, and unnecessary voxels are removed.

ステップ９：収束判定を行う。ステップ８で得られた３次元形状モデルとステップ５の入力の３次元形状モデルとを比較して、表面近傍の内側のボクセルが削れていない場合、または削れたボクセルの数が一定の閾値内になったかどうかで収束判定を行う。収束が十分でないとき、ステップ８で得られた３次元形状モデルをステップ５の入力としてステップ５からステップ８を繰り返す。 Step 9: Perform convergence determination. The three-dimensional shape model obtained in step 8 is compared with the input three-dimensional shape model in step 5. If the inner voxel in the vicinity of the surface is not shaved, or the number of shaved voxels is within a certain threshold. Convergence is determined by whether or not When the convergence is not sufficient, Steps 5 to 8 are repeated using the three-dimensional shape model obtained in Step 8 as the input of Step 5.

次に、本発明の処理結果を実験結果により示す。実験は、凹領域を含む被写体を含む多視点画像を対象に、視体積交差法で復元されるＶｉｓｕａｌＨｕｌｌに本発明の手法を適用した結果得られる３次元形状の精度を評価する。実験データとして、ＣＧモデルを２３視点に投影した画像と各視点のカメラパラメータ（中心射影行列）を用いた。 Next, the processing results of the present invention are shown by experimental results. In the experiment, the accuracy of the three-dimensional shape obtained as a result of applying the method of the present invention to the Visual Hull restored by the visual volume intersection method is evaluated for a multi-viewpoint image including a subject including a concave region. As experimental data, an image obtained by projecting the CG model onto 23 viewpoints and camera parameters (center projection matrix) of each viewpoint were used.

図３は、多視点カメラ画像の例を示す。図３ａは視点０２からのカメラ画像を、図３ｂは視点０６からのカメラ画像を示す。図４は、オリジナルの３次元形状モデルを示す。図５は、入力のＶｉｓｕａｌＨｕｌｌを示す。図６は、ステップ４を終了した時点で整形されたＶｉｓｕａｌＨｕｌｌを示す。図７は、最終的に整形された３次元形状モデルを示す。 FIG. 3 shows an example of a multi-viewpoint camera image. FIG. 3 a shows a camera image from the viewpoint 02, and FIG. 3 b shows a camera image from the viewpoint 06. FIG. 4 shows the original three-dimensional shape model. FIG. 5 shows the input Visual Hull. FIG. 6 shows the Visual Hull that has been shaped when Step 4 is completed. FIG. 7 shows the finally shaped three-dimensional shape model.

図４と図５を比較すると、視体積交差法のみで作成されたＶｉｓｕａｌＨｕｌｌは、物体の凹領域を復元できていないことが分かる。図４と図６を比較すると、本発明のステップ１から４の処理で、物体の凹領域が復元されていることが分かる。図６と図７を比較すると、本発明のステップ５から９の処理で、物体の凹領域がより正確に再現できていることが分かる。 Comparing FIG. 4 and FIG. 5, it can be seen that Visual Hull created only by the visual volume intersection method cannot restore the concave area of the object. Comparing FIG. 4 and FIG. 6, it can be seen that the concave region of the object is restored by the processing of steps 1 to 4 of the present invention. Comparing FIG. 6 and FIG. 7, it can be seen that the concave region of the object can be reproduced more accurately by the processing of steps 5 to 9 of the present invention.

上記の実験結果を定量的に示す。表１は、オリジナルの３次元形状モデルに対するＰｒｅｃｉｓｉｏｎ／Ｒｅｃａｌｌ／Ｆ値を示す。ここで、Ｒｅｃａｌｌは誤って削られたボクセルの比率を示し、Ｐｒｅｃｉｓｉｏｎは、本来削るべきボクセルが削れていない比率を示す。Ｆ値は、ＲｅｃａｌｌとＰｒｅｃｉｓｉｏｎをもとに計算された、３次元形状モデルの正確さを表す指標である。
The above experimental results are shown quantitatively. Table 1 shows the Precision / Recall / F values for the original 3D shape model. Here, “Recall” indicates the ratio of voxels that are accidentally cut, and “Precision” indicates the ratio that the voxels that should be originally cut are not cut. The F value is an index representing the accuracy of the three-dimensional shape model calculated based on the Recall and the Precision.

表１によると、Ｆ値は、最終的に整形された３次元形状モデルが最も良い値を示し、本発明のステップ１から９を実行することにより、物体の凹領域が復元されていることが分かる。 According to Table 1, the three-dimensional shape model finally shaped shows the best F value, and the concave region of the object is restored by executing steps 1 to 9 of the present invention. I understand.

以上のように、本発明では、物体の凹領域の復元を行うことができ、ＶｉｓｕａｌＨｕｌｌの高精度化を実現でき、最終的に復元される３次元形状をもとに生成される自由視点映像を高画質化することが可能となる。 As described above, according to the present invention, the concave region of the object can be restored, the Visual Hull can be improved in accuracy, and the free viewpoint video generated based on the finally restored three-dimensional shape. Can be improved in image quality.

また、以上述べた実施形態は全て本発明を例示的に示すものであって限定的に示すものではなく、本発明は他の種々の変形態様および変更態様で実施することができる。従って本発明の範囲は特許請求の範囲およびその均等範囲によってのみ規定されるものである。 Moreover, all the embodiments described above are illustrative of the present invention and are not intended to limit the present invention, and the present invention can be implemented in other various modifications and changes. Therefore, the scope of the present invention is defined only by the claims and their equivalents.

Claims

A method for restoring a three-dimensional shape model of a subject from a multi-viewpoint camera image,
A calculation step for calculating a likelihood related to the object likeness of a voxel existing on the surface of the Visual Hull, using the Visual Hull restored by the visual volume intersection method or the shaped Visual Hull from the subject silhouette image of each imaging camera as an input ;
A first shaping step of based on shaping of the Visual Hull a likelihood for said object ness,
Based on the comparison between the Visual Hull shaped in the first shaping step and the previous Visual Hull input in the calculation step, it is determined whether or not the Visual Hull shaping has converged, and the Visual Hull shaping is completed. until it is determined converged with the Visual Hull shaped by the first shaping step as the input of the calculation step, a first converging step of repeatedly applying the said calculating step first shaping step,
An evaluation step for evaluating a texture state of the three-dimensional shape model by using the three-dimensional shape model acquired from the Visual Hull determined to have converged in the first convergence step or the shaped three-dimensional shape model as input. When,
A second shaping step of shaping of the three-dimensional shape model based on the evaluation of the texture state of the three-dimensional shape model,
It is determined whether or not the shaping of the 3D shape model has converged based on a comparison between the 3D shape model shaped in the second shaping step and the immediately preceding 3D shape model input in the evaluation step. , it is determined that the shaping of a three-dimensional shape model has converged until a three-dimensional shape model that is shaped by the second shaping step as the input of said evaluation step, repeat the second shaping step and the evaluation step A second convergence step to apply;
A method for restoring a three-dimensional shape model characterized by comprising:

The calculating step includes:
Projecting each voxel present on the surface of the Visual Hull onto each photographing camera viewpoint,
Calculate the variance of the projected pixel values between the shooting cameras,
The method according to claim 1, wherein the likelihood related to the object-likeness is calculated by normalizing the variance.

The first shaping step includes
Define an energy function that considers the adjacency relationship between each voxel in the 3D voxel space,
In the framework of minimizing the energy function, the subject area is determined by assigning each voxel to either the subject area or the background area, and shaping is performed by removing the background area as an unnecessary part. The method according to claim 1 or 2.

The first convergence step includes:
The Visual Hull shaped in the first shaping step is compared with the Visual Hull just before input in the calculation step ,
The method according to claim 3, wherein the shaping is determined to have converged when a voxel inside the voxel designated as the surface is not cut.

The evaluation step includes
In each photographic camera viewpoint, by extracting a difference image between the free viewpoint image generated from the 3-dimensional shape model and the camera image, claims and evaluating the texture state of the three-dimensional shape model Item 5. The method according to any one of Items 1 to 4.

The second shaping step includes
At each shooting camera viewpoint, define an energy function that considers the adjacent relationship between the pixels of the difference image,
In a framework that minimizes the energy function, determine whether to assign each pixel to a subject area or a background area,
Searching for a ray of a pixel existing in a shaping candidate determined to be a background region, and removing the intersection with the three-dimensional shape model as an unnecessary part, thereby shaping the three-dimensional shape model The method of claim 5.

The second convergence step includes
Comparing the three-dimensional shape model shaped in the second shaping step with the previous three-dimensional shape model input in the evaluation step ;
The method of claim 6, shaping the case number of voxels to be scraped or less one constant and judging to have converged.

A computer for restoring a 3D shape model of a subject from multi-viewpoint camera images,
A calculation means for receiving a Visual Hull restored by a visual volume intersection method from a subject silhouette image of each imaging camera or a shaped Visual Hull, and calculating a likelihood related to the object likeness of a voxel existing on the surface of the Visual Hull;
A first shaping means for performing based on shaping of the Visual Hull a likelihood for said object ness,
Based on the comparison between the Visual Hull shaped by the first shaping means and the Visual Hull just before input by the calculation means, it is determined whether or not the Visual Hull shaping has converged, and the Visual Hull shaping is completed. until it is determined converged with, as an input of said calculating means the shaped Visual Hull by the first shaping means, a first converging means for repeatedly applying said first shaping means and the calculating means,
Evaluation means for evaluating the texture state of the three-dimensional shape model by using as input the three-dimensional shape model acquired from the Visual Hull that has been determined that the shaping has converged by the first convergence means or the shaped three-dimensional shape model When,
A second shaping means for performing a shaping of the three-dimensional shape model based on the evaluation of the texture state of the three-dimensional shape model,
It is determined whether or not the shaping of the 3D shape model has converged based on a comparison between the 3D shape model shaped by the second shaping unit and the immediately preceding 3D shape model input by the evaluation unit. , until it is determined that the shaping of the three-dimensional shape model has converged, the three-dimensional shape model that is shaped by the second shaping means as an input of the evaluation unit, the said evaluation means second shaping means A second convergence means to apply repeatedly;
And a program that restores a three-dimensional shape model.