JP2005317000A

JP2005317000A - Method for determining set of optimal viewpoint to construct 3d shape of face from 2d image acquired from set of optimal viewpoint

Info

Publication number: JP2005317000A
Application number: JP2005120041A
Authority: JP
Inventors: Baback Moghaddam; ババック・モグハッダム; Hanspeter Pfister; ハンスピーター・フィスター; Jinho Lee; ジノー・リー
Original assignee: Mitsubishi Electric Research Laboratories Inc
Current assignee: Mitsubishi Electric Research Laboratories Inc
Priority date: 2004-04-30
Filing date: 2005-04-18
Publication date: 2005-11-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for determining a set of optimal viewpoints to acquire a 3D shape of a face. <P>SOLUTION: A view-sphere is tessellated with a plurality of viewpoint cells. The face is at an approximate center of the view-sphere. Selected viewpoint cells are discarded. The remaining viewpoint cells are clustered to a predetermined number of viewpoint cells according to a silhouette difference metric. The predetermined number of viewpoint cells are searched for a set of optimal viewpoint cells to construct a 3D model of the face. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、包括的には画像処理に関し、より詳細には３Ｄモデルおよび２Ｄ画像に従って顔をモデリングし認識することに関する。 The present invention relates generally to image processing, and more particularly to modeling and recognizing faces according to 3D models and 2D images.

コンピュータグラフィックスにおいて、リアルな人の頭部、特に顔の部分を合成的に構築することは、依然として基本的な課題である。以下、「頭部」または「顔」に言及する場合、本発明は顎から眉、耳から耳にかけて延在する頭部の部分を最大の対象とする。大半の従来技術による方法では、熟練したアーティストによるかなりの手作業、高価なアクティブ３Ｄスキャナ（Lee他著「Realistic Modeling for Facial Animations」Proceedings of SIGGRAPH 95, 第55-62頁、August 1995）、あるいは正確な顔の幾何学的形状の代わりとしての高品質テクスチャ画像が利用可能であること（Guenter他著「Making Faces」Proceedings of SIGGRAPH 98, 第55-66頁、July 1998、Lee他著「Fast Head Modeling for Animation」Image and Vision Computing, Vol. 18, No. 4, 第355-364頁、March 2000、Tarini他著「Texturing Faces」Proceedings Graphics Interface 2002, 第89-98頁、May 2002参照）のいずれかが必要である。 In computer graphics, it is still a fundamental challenge to synthetically construct a real human head, particularly the face. In the following, when referring to “head” or “face”, the present invention targets the largest portion of the head extending from the chin to the eyebrows and from the ears to the ears. Most prior art methods require considerable manual work by skilled artists, expensive active 3D scanners (Lee et al. "Realistic Modeling for Facial Animations" Proceedings of SIGGRAPH 95, pp. 55-62, August 1995) or accurate High quality texture images can be used as an alternative to simple face geometry (Guenter et al. “Making Faces” Proceedings of SIGGRAPH 98, pp 55-66, July 1998, Lee et al. “Fast Head Modeling” for Animation ”Image and Vision Computing, Vol. 18, No. 4, pp. 355-364, March 2000, Tarini et al.“ Texturing Faces ”Proceedings Graphics Interface 2002, pp. 89-98, May 2002) is required.

アクティブセンシングによって人の顔の３Ｄモデルを取得するには、高価なスキャンを装置する必要がある。このため、２Ｄ画像、すなわち「投射」から顔の３Ｄ形状を復元するするいくつかの技法が開発されている。こういった方法によっては、画像の密な２Ｄ対応を使用して顔面の基準点の３Ｄ位置を得る直接的な手法に基づくものがある（P. Fua著「Regularized bundle-adjustment to model heads from image sequences without calibration data」International Journal of Computer Vision, 38(2) pp. 153-171, 2000、F. Pighin、J. Hecker、D. Lischinski、R. Szeliski、およびD. Salesin著「Synthesizing realistic facial expressions from photographs」Proceedings of SIGGRAPH 98, 1998、ならびにY. Shan、Z. Liu、およびZ. Zhang著「Model-based bundle adjustment with application to face modeling」Proceedings of ICCV 01, pp.644-651, July 2001）。 To obtain a 3D model of a human face by active sensing, it is necessary to install an expensive scan. For this reason, several techniques have been developed to restore the 3D shape of a face from a 2D image, or “projection”. Some of these methods are based on a direct method that uses the dense 2D correspondence of the image to obtain the 3D position of the reference point of the face ("Regularized bundle-adjustment to model heads from image" by P. Fua. sequences without calibration data '' International Journal of Computer Vision, 38 (2) pp. 153-171, 2000, F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. Salesin, `` Synthesizing realistic facial expressions from photographs "Proceedings of SIGGRAPH 98, 1998, and" Model-based bundle adjustment with application to face modeling "by Y. Shan, Z. Liu and Z. Zhang, Proceedings of ICCV 01, pp. 644-651, July 2001).

３Ｄ顔モデルをパラメータ化し、２Ｄ入力画像を最良に記述する最適なパラメータを探す方法もある（V. BlanzおよびT. Vetter著「Face recognition based on fitting a 3D morphable model」PAMI 25(9), 2003、J. Lee、B. Moghaddam、H. Pfister、およびR. Machiraju著「Silhouette-based 3D face shape recovery」Proc. of Graphics Interface, pp. 21-30, 2003、ならびにB. Moghaddam、J. Lee、 H. Pfister、およびR. Machiraju著「Model-based 3D face capture using shape-from-silhouettes」Proc. of Advanced Modeling of Faces & Gestures, 2003）。 There is also a method of parametrizing a 3D face model and searching for an optimum parameter that best describes a 2D input image (“Face recognition based on fitting a 3D morphable model” by P. 25 (9), 2003. J. Lee, B. Moghaddam, H. Pfister, and R. Machiraju, "Silhouette-based 3D face shape recovery," Proc. Of Graphics Interface, pp. 21-30, 2003, and B. Moghaddam, J. Lee, H. Pfister and R. Machiraju “Model-based 3D face capture using shape-from-silhouettes” Proc. Of Advanced Modeling of Faces & Gestures, 2003).

いずれの場合でも、視点および２Ｄ入力画像の数は、高品質の３Ｄモデルを認識するために重要なパラメータである。直観的に、異なる視点からとられる入力画像が多いほど、３Ｄモデルおよび続く再構築の品質は高くなる。しかし、これは処理時間および機器のコストを増大させる。 In any case, the viewpoint and the number of 2D input images are important parameters for recognizing a high quality 3D model. Intuitively, the more input images taken from different viewpoints, the higher the quality of the 3D model and subsequent reconstruction. However, this increases processing time and equipment costs.

しかし、最適な視点セットを求めることができる場合、使用するカメラをより少数にすることが可能になり、結果得られる２Ｄ画像がより良好な３Ｄモデリング精度を提供できる。 However, if the optimal viewpoint set can be determined, fewer cameras can be used, and the resulting 2D images can provide better 3D modeling accuracy.

今までのところ、顔の３Ｄモデルを構築する目的で視点、ひいては入力画像の最適な数を求める系統的な方法は利用できない。また、ビデオ中の一連の画像の中から、特定の画像、顔認識を向上させるための最適な視点に対応して選択された画像を自動的に選択することも有利であろう。 So far, a systematic method for obtaining the optimal number of viewpoints and hence input images for the purpose of constructing a 3D model of a face cannot be used. It would also be advantageous to automatically select a specific image, an image selected corresponding to the optimal viewpoint for improving face recognition, from a series of images in the video.

異なるオブジェクトが異なる原型またはアスペクト視点を有することが既知である（C. M. CyrおよびB. B. Kimia著「Object recognition using shape similarity-based aspect graph」Proc. of ICCV, pp. 254-261, 2001）。 It is known that different objects have different prototypes or aspect viewpoints (C. M. Cyr and B. B. Kimia, “Object recognition using shape similarity-based aspect graph” Proc. Of ICCV, pp. 254-261, 2001).

顕著に高いクラス内相似を有する特定クラスのオブジェクト、特に人の顔の最適な視点の正準セットを求めることが望ましい。 It would be desirable to determine a canonical set of optimal viewpoints for a particular class of objects with particularly high intraclass similarity, particularly human faces.

照明だけに対処する場合、様々な照明下での顔の一般的な部分空間に広がる９つの点光源の最適な構成を経験的に求めることが可能である（K. Lee、J. Ho、およびD. Kriegman著「Nine points of light: Acquiring subspaces for face recognition under variable lighting」Proc. of CVPR, pp. 519-526, 2001）。 When dealing only with lighting, it is possible to empirically determine the optimal configuration of nine point sources that span the general subspace of the face under various lighting (K. Lee, J. Ho, and D. Kriegman, “Nine points of light: Acquiring subspaces for face recognition under variable lighting” Proc. Of CVPR, pp. 519-526, 2001).

被写体のポーズ、または等価としてカメラ視点に関連する問題を解決することが望ましい。すなわち、視点からの投射により３Ｄ人面を最良に記述する、すなわち２Ｄ画像においてシルエットを形成する、Ｋ台のカメラの空間的構成に対応する最適なＫ個の視点セットを求めることが望ましい。 It would be desirable to solve problems related to the subject's pose, or equivalently, the camera viewpoint. That is, it is desirable to obtain the optimal K viewpoint set corresponding to the spatial configuration of K cameras that best describes the 3D human face by projection from the viewpoint, that is, forms a silhouette in the 2D image.

多視点３Ｄモデリングにおける重要な課題は、「一般的な」顔を正確に３Ｄ形状推定するために必要な、最適な視点または「ポーズ」のセットを求めることである。今までに、この課題に対する解析的な解決策はない。それどころか、部分的な解決策では、徹底的な（exhaustive）組み合わせ探索が必要である。 An important challenge in multi-view 3D modeling is finding the optimal set of viewpoints or “poses” needed to accurately 3D shape-estimate “generic” faces. To date, there is no analytical solution to this problem. On the contrary, partial solutions require an exhaustive combinatorial search.

３Ｄモデリング方法に基づき、本発明は、輪郭ベースのシルエットマッチング方法を使用し、視点のクラスタ化および他の様々な撮像制約を利用して観測球を積極的に取り除くことによって本方法を拡張する。多視点最適化探索は、モデルベース（固有頭部（eigenhead））法およびデータ導出（視体積）法の両方を使用して行われ、同等に最適な視点のセットをもたらす。 Based on a 3D modeling method, the present invention uses a contour-based silhouette matching method and extends the method by actively removing the observation sphere using viewpoint clustering and various other imaging constraints. Multi-view optimization searches are performed using both model-based (eigenhead) and data derivation (view volume) methods, resulting in equally optimal sets of viewpoints.

最適な視点のセットは、顔の３Ｄ形状の取得に使用することができ、３Ｄ顔認識システムに有用な経験的ガイドラインを提供することができる。 The optimal set of viewpoints can be used to obtain the 3D shape of the face and can provide useful empirical guidelines for 3D face recognition systems.

解析的な公式化が不可能なため、本発明は経験的な手法を用いる。オブジェクトを中心とした観測球がサンプリング（テセレート）されて、有限セットの視点構成が生成される。各視点は、個々の顔の代表的なデータセットに対して結果として生じる集団誤差（ensemble error）に従って評価される。集団誤差は平均再構築誤差に関する。 Since analytical formulation is not possible, the present invention uses an empirical approach. An observation sphere centered on the object is sampled (tessellated) to generate a finite set of viewpoint configurations. Each viewpoint is evaluated according to the resulting ensemble error for a representative data set of individual faces. Collective error relates to mean reconstruction error.

潜在的な視点が多数あるため、観測球は、用途に応じ得る所定の無関係または非実用的な視点のセットを破棄することによって積極的に取り除かれる。本発明は、アスペクト視点を一般的な３Ｄオブジェクト認識に使用することができる。アスペクト視点は、視点からのオブジェクトのシルエットの投影であり、均等にサンプリングされた観測球の空間内の近傍の相似視点の範囲を表す。しかし、当該方法は目立ったものとされるいずれの所定の視野セットとでもうまくいくため、本発明はこのようなアスペクト視野の使用に限定されない。 Due to the large number of potential viewpoints, the observation sphere is actively removed by discarding a predetermined set of irrelevant or impractical viewpoints that may depend on the application. The present invention can use the aspect viewpoint for general 3D object recognition. The aspect viewpoint is a projection of the silhouette of an object from the viewpoint, and represents the range of similar viewpoints in the vicinity of the space of the observation sphere sampled uniformly. However, the present invention is not limited to the use of such aspect fields, since the method will work with any given set of fields that will be prominent.

視点空間のサイズは、オブジェクトのクラス、たとえば顔に縮小される。観測球を均等にサンプリングし、顔の対称性および撮像幾何等の高レベルモデル固有制約を適用した後、当該方法は、シルエット差分測度、または対照的に相似性測度を使用して近傍の視点セルを結合することによって視点クラスタを生成し、各クラスタの原型的な「重心」をアスペクト視点として選択する。所与の数の別個の視野（カメラ）に対するこられアスペクト視点の組み合わせサブセットの数の低減を探求することにより、オブジェクトの形状のモデリングに最適な視点のセットが構成される。 The size of the viewpoint space is reduced to an object class, for example a face. After sampling the observation sphere equally and applying high-level model-specific constraints such as facial symmetry and imaging geometry, the method uses a silhouette difference measure, or in contrast, a similarity measure, and Are combined to generate viewpoint clusters, and the original “centroid” of each cluster is selected as the aspect viewpoint. By exploring a reduction in the number of combined aspect viewpoint combinations for a given number of distinct fields of view (cameras), a set of viewpoints optimal for modeling the shape of the object is constructed.

多視点３Ｄ顔モデリング
本発明は、視点のセットからとられた２Ｄ画像から得られる、人の顔の正確な３Ｄモデルを構築する際に必要な、最適な視点のセットを求める方法を提供する。本発明の３Ｄモデルを構築する一般的な方法は、２００３年８月７日出願のLee他著による米国特許出願第１０／６３６，３５５号「Reconstructing Heads from 3D Models and 2D Silhouetts」に記載され、これを参照により本明細書に援用する。 Multi-view 3D Face Modeling The present invention provides a method for determining the optimal set of viewpoints required when building an accurate 3D model of a human face obtained from a 2D image taken from a set of viewpoints. A general method for constructing the 3D model of the present invention is described in US patent application Ser. No. 10 / 636,355 “Reconstructing Heads from 3D Models and 2D Silhouetts” by Lee et al., Filed Aug. 7, 2003, This is incorporated herein by reference.

図６に示すように、本発明の方法は、人の頭部２１０の周囲の「観測球」２００上に配置された或る構成の多数のカメラ６００、たとえば１１台のカメラを使用する。カメラの配置により、モデリングされる頭部の部分のサイズが決まる。実際用途では、観測球は、カメラがドームの構造部材に取り付けられたジオデシックドームとして構築される。人はドーム内の椅子に座り、その間、その人の顔の画像が得られる。顔２１０は観測球２００の略中心にある。 As shown in FIG. 6, the method of the present invention uses a number of cameras 600, for example 11 cameras, arranged on a “observation sphere” 200 around a human head 210. The placement of the camera determines the size of the portion of the head that is modeled. In practical applications, the observation sphere is constructed as a geodesic dome with a camera attached to the structural member of the dome. A person sits on a chair in the dome, during which time an image of the person's face is obtained. The face 210 is substantially at the center of the observation sphere 200.

本明細書に述べるように、本発明の３Ｄモデリング方法は、テクスチャ、すなわち外観から独立した形状、すなわち幾何学的形状の復元を取り扱う。したがって、本発明の方法は、照明およびテクスチャのばらつきに関して堅牢である。本発明の方法は、「モーフィング可能」モデル（V. BlanzおよびT. Vetter著「Face recognition based on fitting a 3D morphable model」PAMI, 25(9), 2003）と以下のように区別することが可能である。 As described herein, the 3D modeling method of the present invention deals with the restoration of texture, ie appearance independent shape, ie geometric shape. Thus, the method of the present invention is robust with respect to illumination and texture variations. The method of the present invention can be distinguished from the “morphable” model (“Face recognition based on fitting a 3D morphable model” PAMI, 25 (9), 2003) by V. Blanz and T. Vetter as follows: It is.

形状は、テクスチャの推定と一緒にではなく直接復元される。形状は、テクスチャからではなく遮蔽輪郭またはシルエットから得られる。オブジェクトテクスチャの推定は必要ない。しかし、テクスチャは、形状が復元された後、標準的な技法を使用してオブジェクトから容易に得ることができる。本発明のモデル近似は、画像強度誤差ではなく２値シルエットを使用する。本発明の方法では、本質的に実際の画像が必要ない。本方法は、深度レイヤ情報を前景／背景セグメント化する他のいずれの手段も使用することが可能である。たとえば、シルエットは距離センサを使用して得ることができる。 The shape is restored directly rather than with texture estimation. The shape is obtained from the occlusion outline or silhouette rather than from the texture. No object texture estimation is required. However, the texture can be easily obtained from the object using standard techniques after the shape is restored. The model approximation of the present invention uses a binary silhouette rather than an image intensity error. The method of the present invention essentially does not require an actual image. The method can use any other means of segmenting depth layer information into foreground / background segments. For example, the silhouette can be obtained using a distance sensor.

さらに、本発明のシルエットマッチング最適化はより簡易であり、より少数の自由パラメータを有し、かなり高速、およそ１０倍高速である。 Furthermore, the silhouette matching optimization of the present invention is simpler, has fewer free parameters, and is considerably faster, approximately 10 times faster.

本発明による従来のモデリング方法では、カメラ６００の最適な配置が、試行錯誤により、またどの視点が形状取得にさらに有益であるかについての「直観」を用いてまず見つけられた。 In the conventional modeling method according to the present invention, the optimal placement of the camera 600 was first found by trial and error and using “intuition” as to which viewpoint is more useful for shape acquisition.

ここに、本発明を続ける。 The present invention continues here.

これより、視点選択プロセスから当て推量をなくし、所与の数Ｋ台のカメラに対して最適な幾何学的形状または視野構成を求めることを目標とする。モデル構築のために、本発明の方法は、様々な人種および年齢の成人女性および成人男性の顔のスキャンを使用した。スキャンを使用して、メッシュを生成することができる。各顔メッシュ中のポイント数はおよそ５０，０００〜１００，０００にばらつく。 Thus, the goal is to eliminate the guesswork from the viewpoint selection process and to find the optimal geometric shape or field of view configuration for a given number K of cameras. For model building, the method of the invention used facial scans of adult women and adult men of various races and ages. Scans can be used to generate meshes. The number of points in each face mesh varies from approximately 50,000 to 100,000.

データベース中のスキャンされたすべての顔はリサンプリングされて、点と点の間の対応が得られる。第２に、リサンプリングされた顔が基準の顔に位置合わせされ、ポーズのばらつきにおけるいずれのばらつきも、またはスキャンをする間のいずれの位置合わせずれも除去される。第３に、位置合わせされた３Ｄ顔のデータベースに対して主成分解析（ＰＣＡ）を行い、本発明の形状モデルの固有ベクトルおよびそれぞれに関連する固有値、すなわちそれぞれの潜在的なガウス分布の分散が得られる。この分解を使用して、「固有頭部」ベースの機能の線形結合を通して新しいまたは既存の顔を再構築することができる（全体として、J. J. Atick, P. A. Griffin、およびN. Redlich著「Statistical approach to shape from shading face surfaces from single 2D images」Neural Computation, 8(6)pp. 1321-1340, 1996参照）。 All scanned faces in the database are resampled to obtain correspondence between points. Second, the resampled face is aligned with the reference face, and any variation in pose variation or any misalignment during scanning is removed. Third, Principal Component Analysis (PCA) is performed on the registered 3D face database to obtain the eigenvectors of the shape model of the present invention and their associated eigenvalues, ie the variance of each potential Gaussian distribution. It is done. This decomposition can be used to reconstruct a new or existing face through a linear combination of “inherent head” based features (in general, JJ Atick, PA Griffin, and N. Redlich, “Statistical approach to shape from shading face surfaces from single 2D images "Neural Computation, 8 (6) pp. 1321-1340, 1996).

ＰＣＡ固有値スペクトルおよび結果得られる形状再構築の検査により、最初の６０個の固有頭部が、データベースにおける顔の目立った顔特徴の大半を取り込むために十分であることが示された。したがって、対応する形状パラメータａ_ｉが本発明の最適化パラメータである。 Examination of the PCA eigenvalue spectrum and the resulting shape reconstruction showed that the first 60 eigenheads were sufficient to capture most of the prominent facial features in the database. Therefore, the corresponding shape parameter a _i is the optimization parameter of the present invention.

任意の顔モデルＭ（ａ）が、パラメータベクトルａ＝｛ａ_１，ａ_２，・・・，ａ_ｎ｝を与えるポリゴンメッシュを生成する。入力シルエット画像はＳ^ｋ _{ｉｎｐｕｔ}であり、ｋ＝１，・・・，Ｋである。相似変換Ｔが、基準モデル顔をリアルな３Ｄ顔に位置合わせする。シルエット画像Ｓ^ｋ _{ｍｏｄｅｌ（ａ）}は、ｋ番目のシルエット画像のポーズ情報を使用してＴ（Ｍ（ａ））を画像平面上に投影することによってレンダリングされる。パラメータベクトルａが、総ペナルティ An arbitrary face model M (a) generates a polygon mesh that gives a parameter vector a = {a ₁ , a ₂ ,..., A _n }. The input silhouette image is S ^k _input , and k = 1 _,. A similarity transformation T aligns the reference model face with a realistic 3D face. The silhouette image S ^k _{model (a)} is rendered by projecting T (M (a)) onto the image plane using the pose information of the kth silhouette image. Parameter vector a is the total penalty

を最小化することによって推定される。但し、費用関数ｆは２つの２値シルエット間の差分を測定する。式（１）中の費用関数ｆは、２つの２値シルエット間の簡易な差分測度は、ピクセル単位の排他的論理和（ＸＯＲ）演算が適用された場合の「オン」のピクセルの数である。 Is estimated by minimizing. However, the cost function f measures the difference between two binary silhouettes. The cost function f in equation (1) is a simple difference measure between two binary silhouettes is the number of “on” pixels when a pixelwise exclusive OR (XOR) operation is applied. .

遮蔽輪郭上の正しいピクセルのマッチングに優先順位を付けるため、また費用関数ｆが大域的最小を有するような一意性を推進するために、入力シルエットの境界ピクセル近傍のいずれのミスマッチにもより高いペナルティを課す。具体的には、 Higher penalty for any mismatches near the boundary pixels of the input silhouette to prioritize the correct pixel matching on the occlusion contour and to promote uniqueness such that the cost function f has a global minimum Imposing. In particular,

である。但し、Ｄ（Ｓ）は２値画像Ｓのユークリッド距離変換であり、画像（〜）Ｓは画像Ｓの原像である。変数ｄはシルエット輪郭からの距離マップを表すことに留意する。事前処理ステップ後に、分散を求めることができる。この費用関数を境界重み付けＸＯＲと呼ぶ。この費用関数は、厳密な輪郭マッチングに対する簡易かつ効果的な代替を提供する。なお、（〜）Ｓは、Ｓの上に〜があることを表す。 It is. However, D (S) is Euclidean distance conversion of the binary image S, and the image (˜) S is an original image of the image S. Note that the variable d represents a distance map from the silhouette contour. After the preprocessing step, the variance can be determined. This cost function is called boundary weighting XOR. This cost function provides a simple and effective alternative to exact contour matching. In addition, (~) S represents that there is ~ on S.

したがって、エッジリンキング、曲線近似、および輪郭間の距離計算に対応する時間のかかる処理は必要ない。さらに、境界重み付けＸＯＲ演算はハードウェアで行うことができる。費用関数が本質的に複雑かつ非線形であり、かつ解析的な勾配がない場合、確率的滑降シンプレックス法を使用して式（１）を最小化する。 Therefore, time consuming processing corresponding to edge linking, curve approximation, and distance calculation between contours is not necessary. Further, the boundary weighting XOR operation can be performed by hardware. If the cost function is inherently complex and non-linear and there is no analytical gradient, then stochastic downhill simplex method is used to minimize equation (1).

３Ｄ顔モデリングに最適な視点を求める
これより、任意の数Ｋ台、たとえば５台以下のカメラまたは「視点」を使用して３Ｄ顔モデリングに最適な視点のセットを求める方法を説明することによって本発明を続ける。形状投影から得られるシルエット差分測度またはシルエット相似測度を使用して隣接する視点をクラスタ化することに基づき、観測球の均等なテセレーションによって得られる可能なすべての視点の空間を「取り除く」方法について説明する。次いで、選択されたアスペクト視点のセットが、本発明のモデルベースの方法およびデータ導出型視体積方法の両方を使用して調べられる。 Finding the Optimal Viewpoint for 3D Face Modeling From now on, we will explain how to find the optimal set of viewpoints for 3D face modeling using any number K, for example 5 cameras or less, or “viewpoints”. Continue invention. A method for “removing” all possible viewpoint spaces obtained by equal tessellation of the observation sphere based on clustering adjacent viewpoints using silhouette difference measures or silhouette similarity measures obtained from shape projections explain. The selected set of aspect viewpoints is then examined using both the model-based method and the data-derived view volume method of the present invention.

シルエット生成
本発明のデータベース中でリサンプリングされた顔のシルエットは、実際の被写体の画像から得られるシルエットとは大分異なる。これは、頭部および胴体上部の一部が欠けることによる。本発明のデータベースを使用して実際の被写体のシルエット画像をシミュレートするために、完全にスキャンされた３Ｄ頭部を本発明の原型頭部／胴体として使用する。 Silhouette Generation The face silhouette resampled in the database of the present invention is very different from the silhouette obtained from the actual subject image. This is due to the lack of part of the head and upper torso. In order to simulate a silhouette image of an actual subject using the database of the present invention, a fully scanned 3D head is used as the prototype head / torso of the present invention.

図１Ａはデータベース中の元の顔の画像であり、図１Ｂはリサンプリングされた顔の画像であり、図１Ｃはレーザスキャンされた完全な「原型」の頭部の画像であり、図１Ｄは、リサンプリングされた顔を、スキャンされた完全な頭部と結合することによって得られる、レンダリングされる顔の画像である。 FIG. 1A is an image of the original face in the database, FIG. 1B is an image of the resampled face, FIG. 1C is an image of the complete “prototype” head that was laser scanned, and FIG. A rendered face image obtained by combining the resampled face with the scanned full head.

結合は、滑らかな変形によって原型頭部の顔領域をリサンプリングされた顔に位置合わせし、頭部と顔を一緒にステッチングして、完全な頭部および両肩を備えた「仮想」試験被写体を合成することによって行われる。こうして、本発明により、被写体の適宜幾何学的形状を維持しながら、データベース中の顔とまったく同じ顔形状の完全なシルエット画像を生成することができる。 Coupling is a “virtual” test with a full head and shoulders, aligning the face area of the prototype head with the resampled face by smooth deformation, stitching the head and face together This is done by compositing the subject. Thus, according to the present invention, it is possible to generate a complete silhouette image having exactly the same face shape as the face in the database while maintaining an appropriate geometric shape of the subject.

この事前処理ステップは、完全な頭部スキャンの代わりにのみ使用され、完全な被写体スキャンが利用可能な場合は省くことができる。３Ｄ顔モデルを「ステッチング」して１つの共通の頭部形状にするプロセスは、続く解析および最適化において重要な「対象領域」として極めて重要な顔のエリアのみをハイライトしている。そうすることにより、頭部の裏側等のあまり重要ではないエリアは、顔エリアの正確な再構築に関して重要でない、または目立たないことを事実上示す。しかし、探索方法は、どのエリアがハイライトされるか、すなわち目立つとマークされるか否かに関わらず同じままであり得る。 This pre-processing step is only used instead of a full head scan and can be omitted if a full subject scan is available. The process of “stitching” the 3D face model into a common head shape highlights only those areas of the face that are critical as the “target region” that is important in subsequent analysis and optimization. By doing so, less important areas, such as the back of the head, effectively indicate that they are not important or inconspicuous with respect to the accurate reconstruction of the face area. However, the search method may remain the same regardless of which area is highlighted, i.e., marked as prominent.

観測球テセレーション
図２Ａおよび図５に示すように、被写体２１０の周囲の１２面体の区画を使用して、三角形で観測球２００を均等にテセレートする（５１０）。この手順により、１２０個の三角形２０１が生成され、これを視点セルと呼ぶ。各三角形２０１の頂点２０２は観測球２００の表面上にある。 Observation Sphere Tessellation As shown in FIGS. 2A and 5, the observation sphere 200 is equally tessellated with triangles using the dodecahedron section around the subject 210 (510). By this procedure, 120 triangles 201 are generated, which are called viewpoint cells. The vertex 202 of each triangle 201 is on the surface of the observation sphere 200.

図２Ｂおよび図５に示すように、選択された視点セルを破棄する（５２０）。破棄される視点セルには、カメラに対して観測球の後半球にあるセルが含まれる。これは、顔が後半球の視点から遮蔽されるためである。さらに、所定の高さの上および下にある視点セルを破棄する。これは、対応する視点がカメラの物理的な位置として可能性が低い、または実際的ではないためである。本発明の方法では、視点の高さを中心水平面から±４５度に制限する。 As shown in FIG. 2B and FIG. 5, the selected viewpoint cell is discarded (520). The viewpoint cells to be discarded include cells in the second half of the observation sphere with respect to the camera. This is because the face is shielded from the viewpoint of the second half sphere. Further, the viewpoint cells above and below a predetermined height are discarded. This is because the corresponding viewpoint is unlikely or impractical as the physical position of the camera. In the method of the present invention, the height of the viewpoint is limited to ± 45 degrees from the central horizontal plane.

さらに、多くの場合、正確な顔の輪郭を斜めの視点から得ることは、遮蔽、またその結果として髪および肩と区別がつかなくなるため非常に難しい。最後に、顔はおよそ左右対称であるため、残っている視点の全半分を破棄する。これにより、図２Ｂに示す４４個の視点が残る。 Furthermore, in many cases, obtaining an accurate facial contour from an oblique viewpoint is very difficult due to occlusion and consequently indistinguishable from hair and shoulders. Finally, because the face is roughly symmetrical, discard all the remaining viewpoints. This leaves the 44 viewpoints shown in FIG. 2B.

残りの視点でもなお、視点の組み合わせまたはサブセットが多くなりすぎる。たとえば、徹底的な探索により１１個の最適な視点を見つけるには、およそ７×１０^９個の視点の組み合わせを評価する必要がある。これはかなり処理しにくい。したがって、探索空間をまたさらに低減する必要がある。 The remaining viewpoints still have too many combinations or subsets of viewpoints. For example, in order to find eleven optimal viewpoints by exhaustive search, it is necessary to evaluate a combination of approximately 7 × 10 ⁹ viewpoints. This is quite difficult to handle. Therefore, it is necessary to further reduce the search space.

視点のクラスタ化
本発明では、２つの隣接する視点の２Ｄシルエット画像は実質的に相似する場合が多いことに気付く。したがって、２つの隣接する視点のシルエット差分を測定し、シルエット差分が所定のしきい値未満の場合は２つの対応する視点セルをクラスタ化する（５３０）。 Viewpoint Clustering In the present invention, we notice that 2D silhouette images of two adjacent viewpoints are often substantially similar. Accordingly, the silhouette difference between two adjacent viewpoints is measured, and if the silhouette difference is less than a predetermined threshold, two corresponding viewpoint cells are clustered (530).

次いで、視点セルのグループ（クラスタ）の位置を、視点セルのクラスタの重心で表すことができる。より重要なことには、ここでは、極めて重要な顔の各エリア、たとえば、鼻、目、耳、顎、および口付近のシルエット差分のみを考慮する。これは、顔の形状復元は、肩等の他の関係のないエリアのシルエット差分による影響を受けないためである。 Next, the position of the viewpoint cell group (cluster) can be represented by the center of gravity of the cluster of viewpoint cells. More importantly, only the silhouette differences near critical areas of the face, such as the nose, eyes, ears, chin, and mouth are considered here. This is because face shape restoration is not affected by silhouette differences in other unrelated areas such as shoulders.

クラスタ化するために、まず、探索空間中のあらゆる視点対間の部分的すなわち顔限定のＸＯＲシルエット距離を記憶したルックアップ表（Ｄ）を構築する。初めは、クラスタ内のあらゆる視点が考慮され、クラスタのアスペクト視点は視点自体である。 To cluster, a look-up table (D) is first built that stores partial or face-limited XOR silhouette distances between every pair of viewpoints in the search space. Initially, every viewpoint in the cluster is considered, and the aspect viewpoint of the cluster is the viewpoint itself.

２つのクラスタ間のシルエット差分を、それぞれのアスペクト視点間のシルエット距離により定義する。その情報は事前に計算され、ルックアップ表Ｄに記憶される。その他の隣接クラスタすべての中で最小のシルエット差分を有する２つの隣接クラスタを見つけ、これらのクラスタを結合する。２つのクラスタを結合した後、新たに結合されたクラスタの新しいアスペクト視点を求める。新しいアスペクト視点は、同じクラスタ中のその他のすべての視点と比較して、最大シルエット差分に最小の値を有する視点である。所定数のクラスタが残るまで、このプロセスを繰り返す。 The silhouette difference between two clusters is defined by the silhouette distance between each aspect viewpoint. That information is pre-calculated and stored in lookup table D. Find the two neighboring clusters with the smallest silhouette difference among all the other neighboring clusters and join these clusters. After combining the two clusters, a new aspect view of the newly combined cluster is determined. The new aspect viewpoint is the viewpoint that has the smallest value for the maximum silhouette difference compared to all other viewpoints in the same cluster. This process is repeated until a predetermined number of clusters remain.

図３は、クラスタ化ステップ５３０を使用して得られた１０個のクラスタ１〜１０およびおよそ対応するアスペクト視点３００を示す。結果得られるアスペクト視点は必ずしもクラスタの幾何重心である訳ではなく、クラスタの他の部分に対して最小のシルエット差分を有する視点であることに留意する。 FIG. 3 shows 10 clusters 1-10 obtained using the clustering step 530 and roughly corresponding aspect viewpoints 300. Note that the resulting aspect viewpoint is not necessarily the geometric centroid of the cluster, but the viewpoint with the smallest silhouette difference relative to other parts of the cluster.

いかなる被写体依存も避け、このクラスタ化を一般化するために、ルックアップ表Ｄ中のすべての記入事項は、データベース中の５０個の異なる合成頭部の対毎のシルエット差分距離を平均化することによって生成される。 In order to avoid any subject dependence and generalize this clustering, all entries in lookup table D average the silhouette difference distances for each pair of 50 different composite heads in the database. Generated by.

表Ａに、方位角｛９０°，０°，＋９０°｝が頭部中心基準枠での｛左、正面、右｝に対応するアスペクト視点１〜１０の座標を提供する。 Table A provides the coordinates of the aspect viewpoints 1-10 where the azimuth {90 °, 0 °, + 90 °} corresponds to {left, front, right} in the head center reference frame.

図４は、モデルシルエットおよび誤差評価に使用される極めて重要な顔のエリアと共に、１０個のアスペクト視点から得られる対応するシルエット４０１〜４１０を示す。すべての再構築誤差は極めて重要な顔のエリアに限定される。髪および肩からの無関係な入力は無視される。図３における視野１を破棄する。下向きの角度のため、対応する顔のシルエット４０１は部分的に隠され、胴体と混ざる。視野２もまた、正面視点は顔のテクスチャ取得に好ましいが、形状復元制約としての遮蔽輪郭を殆ど提供しないため破棄される。 FIG. 4 shows corresponding silhouettes 401-410 obtained from 10 aspect viewpoints, along with model silhouettes and critical facial areas used for error estimation. All reconstruction errors are limited to critical facial areas. Irrelevant input from hair and shoulders is ignored. Discard field of view 1 in FIG. Due to the downward angle, the corresponding face silhouette 401 is partially hidden and mixed with the torso. The field of view 2 is also preferred because the front viewpoint is preferred for facial texture acquisition, but is discarded because it provides little occlusion contours as shape restoration constraints.

最適な視点を求める
残りの８個のアスペクト視点３〜１０について、Ｋ≦８視点の最適なサブセットを求めて徹底的に探索し（５４０）、これにより、各ケースＫ毎に、形状復元プロセスにＫ個のシルエットを使用して、元の顔に最も近い３Ｄ形状再構築が生成される。デフォルトの再構築方法は、関連の米国特許出願第１０／６３６，３５５号に記載の本発明によるモデルベースの（固有頭部）３Ｄ顔形状復元方法である。 Finding the optimal viewpoint For the remaining 8 aspect viewpoints 3 to 10, an optimal subset of K ≦ 8 viewpoints is searched and searched thoroughly (540). Using the K silhouettes, the 3D shape reconstruction closest to the original face is generated. The default reconstruction method is a model-based (proper head) 3D face shape restoration method according to the present invention described in the related US patent application Ser. No. 10 / 636,355.

比較のため、視体積構築方法を使用して、純粋にデータ導出型の方法もテストした。視体積自体は、何百もの視点を使用する場合であっても正確な再構築に適さないことに留意されたい。本明細書の目標は、データ導出型の方法に基づく欲張り探索でも同様の最適な視点のセットが選択されることを示すことである。 For comparison, a purely data-derived method was also tested using the view volume construction method. Note that the view volume itself is not suitable for accurate reconstruction even when using hundreds of viewpoints. The goal of this specification is to show that a similar optimal set of viewpoints is selected in a greedy search based on a data-derived method.

最適な視点のセットを汎用顔モデリングおよび認識にとって妥当なものにするために、視点は全種の、たとえば、性別、人種、年齢の全般的な顔に適合すべきである。したがって、最適性は被写体から独立しているべきである。このため、データベースから２５人という代表的なサブセットを使用し、本発明の最適視点選択を、全被写体の総誤差または平均誤差を最小化する相対的配置に基づかせた。 In order to make the optimal set of viewpoints valid for general purpose face modeling and recognition, the viewpoints should be adapted to the general faces of all species, eg gender, race, age. Optimality should therefore be independent of the subject. For this reason, a representative subset of 25 people from the database was used and the optimal viewpoint selection of the present invention was based on a relative arrangement that minimized the total or average error of all subjects.

シルエット画像から３Ｄ形状を復元する場合、グラウンドトルースと再構築された３Ｄ幾何学的形状の間の誤差を測定する測度が必要である。本発明では、復元された形状の顔エリアに焦点をあてているため、復元された形状と元の顔との極めて重要な顔エリアにおける差分を測定する測度が必要である。この誤差測定の基本的な手法は以下である。 When restoring a 3D shape from a silhouette image, a measure is needed to measure the error between ground truth and the reconstructed 3D geometry. In the present invention, since the face area of the restored shape is focused, a measure for measuring the difference in the extremely important face area between the restored shape and the original face is required. The basic method of error measurement is as follows.

最初のステップは、復元された顔の幾何学的形状の顔エリア上の密なポイントのセットを見つけることである。固有頭部形状モデルの場合、メッシュパラメータ化を介して本発明のモデルの顔のポイントを見つける。 The first step is to find a dense set of points on the face area of the restored face geometry. In the case of a proper head shape model, the face points of the model of the present invention are found through mesh parameterization.

しかし、視体積上で同じ顔のポイントを見つけることは自明ではない。レイキャスティング法を使用して、視体積上での顔のポイントを見つける。元の３Ｄ頭部の画像を有し、これを使用して元の頭部上の顔のポイントから入力シルエット画像を生成することから、光線を視体積に向けて投射し、視体積表面上の対応するサンプルを得る。 However, finding the same facial point on the view volume is not obvious. Use ray casting to find facial points on the view volume. Since we have an original 3D head image and use it to generate an input silhouette image from the face points on the original head, we project the rays towards the view volume and on the view volume surface Get the corresponding sample.

顔のポイントを得た後、同じレイキャスティング方式を使用して、グラウンドトルースメッシュ表面上での対応するサンプルを得る。復元された顔上の顔のポイントおよびグラウンドトルース上の対応するポイントのＬ２距離を測定し、Ｌ２距離を顔エリアの３Ｄ誤差測度として使用する。 After obtaining the face points, the same ray casting scheme is used to obtain the corresponding sample on the ground truth mesh surface. The L2 distance of the face point on the restored face and the corresponding point on the ground truth is measured, and the L2 distance is used as the 3D error measure for the face area.

モデルベースの再構築
図４における残りの８個のアスペクト視点に対して徹底的な探索を行い（５４０）、Ｋ＝｛１，２，３，４，５｝カメラの最適な視点のサブセットのセットを見つけた。したがって、可能な再構築の総数は５４５０である。単一の個人の再構築誤差に固有の、データ従属性を除去するために、データベースから無作為に選択された２５人の被写体の平均再構築誤差を使用する。 Model-based reconstruction An exhaustive search is performed on the remaining 8 aspect viewpoints in FIG. I found Therefore, the total number of possible reconstructions is 5450. To remove the data dependency inherent in a single individual reconstruction error, we use the average reconstruction error of 25 subjects randomly selected from the database.

結果を表Ｂに提示し、表Ｂは、Ｋ＝｛１，２，３，４，５｝の場合に最適な視点のセットおよび対応する最小平均再構築誤差を示す。アスペクト視点の正確な座標については表Ａを参照する。 The results are presented in Table B, which shows the optimal set of viewpoints and the corresponding minimum average reconstruction error when K = {1, 2, 3, 4, 5}. See Table A for the exact coordinates of the aspect viewpoint.

最良構成下での２５人全員の被写体の個々の誤差の標準偏差も提示される。平均誤差平均および平均誤差標準偏差は、すべての視点全体の平均再構築誤差に基づく。視点が多いほど多くの制約が提供されるため、予想されるように、両方ともＫが増えるにつれて低減する傾向を有する。 The standard deviation of the individual errors of all 25 subjects under the best configuration is also presented. The mean error mean and mean error standard deviation are based on the mean reconstruction error across all viewpoints. Since more viewpoints provide more constraints, both have a tendency to decrease as K increases, as expected.

視体積再構築
３Ｄモデルベースの方法に関して上述したものと同じ探索戦略を用いて、これより、シルエット画像の所与のサブセットから得られた視体積構築を評価し、結果をグラウンドトルースと比較する。 View Volume Reconstruction Using the same search strategy described above for the 3D model based method, we will now evaluate the view volume reconstruction obtained from a given subset of silhouette images and compare the results to ground truth.

表Ｃに、Ｋ＝｛２，３，４，５｝の最適な視点および対応する誤差値を示す。単一のシルエット（Ｋ＝１）からの視体積は有限体積を有さないため省かれる。 Table C shows the optimal viewpoint for K = {2, 3, 4, 5} and the corresponding error value. The viewing volume from a single silhouette (K = 1) is omitted because it has no finite volume.

視体積再構築、特に少数の画像からのものはあまり正確な表現ではないことに留意する。モデルベースの結果とは対照的に、この場合、再構築品質は被写体依存よりもはるかに視野依存である。しかし、視野依存性は、視点の数（Ｋ）が増大するにつれ大幅に低減する。誤差標準偏差を参照のこと。両方法の場合において、視点３および１０が最も有益であるように見える。 Note that view volume reconstruction, especially from a small number of images, is not a very accurate representation. In contrast to the model-based results, in this case the reconstruction quality is much more field dependent than subject dependent. However, the visual field dependence decreases significantly as the number of viewpoints (K) increases. See error standard deviation. In both cases, viewpoints 3 and 10 appear to be most beneficial.

発明の効果
本発明による方法は、３Ｄ顔モデリング、特にシルエットから形状を復元する方法に最適な視点のセットを求める。本発明は３Ｄ顔認識システムの設計に有用なガイドラインを提供し、既存の実施および直観に合致する。たとえば、最も目立つ視点３は、多くのＩＤ写真に使用される、確立されている生体測定規格「３／４視野（顔の右から３／４を写す）」に非常に密接に対応し、視点１０は、「マグショット（警察での犯人記録用）」写真に使用されるプロファイル視野に対応する。本明細書での結果は、再構築は４から５個の視点を超えてからは実質的に向上しないことを示す。表Ｂおよび表Ｃに列挙した最良誤差を参照のこと。 Effects of the Invention The method according to the invention seeks a set of viewpoints that are optimal for 3D face modeling, in particular a method for restoring a shape from a silhouette. The present invention provides useful guidelines for designing 3D face recognition systems and is consistent with existing implementations and intuitions. For example, the most prominent viewpoint 3 corresponds very closely to the established biometric standard “3/4 field of view (photographs 3/4 from the right of the face)” used in many ID photographs. 10 corresponds to the profile view used for the “Mugshot (for police criminal record)” photo. The results herein show that reconstruction does not substantially improve beyond 4 to 5 viewpoints. See the best errors listed in Table B and Table C.

さらなる物理的および演算的な制約を本発明の方法に組み込むことが可能である。たとえば、真正面の視点は形状に関してはあまり目立たないが、テクスチャを取り込むには好ましい視野であり、したがって、この視野は略すべての２Ｄ顔認識システムに使用される。この視点は、探索前に予め選択することができる。 Additional physical and computational constraints can be incorporated into the method of the present invention. For example, a front-facing viewpoint is less conspicuous with respect to shape, but is a preferred field of view for capturing texture, and therefore this field of view is used in almost all 2D face recognition systems. This viewpoint can be selected in advance before searching.

ビデオベースで顔を補足すると、被写体の動きおよびポーズの変化により、カメラが固定されている場合であっても複数の仮想視点が提供される。したがって、本発明の方法を監視ビデオの一連の画像に適用して、最適なポーズ、すなわち顔認識に最良のビデオフレームを自動的に選択することができる。 When a face is supplemented on a video basis, a plurality of virtual viewpoints are provided by the movement of the subject and changes in poses even when the camera is fixed. Thus, the method of the present invention can be applied to a series of images of surveillance video to automatically select the best pose, i.e., the best video frame for face recognition.

本発明について好ましい実施形態の例として説明したが、他の様々な適合形態および変更形態を本発明の精神および範囲内で行い得ることを理解されたい。したがって、添付の特許請求の範囲の目的は、このような変形形態および変更形態をすべて、本発明の真の精神および範囲内にあるものとして包含することである。 Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the invention. Accordingly, it is the object of the appended claims to cover all such variations and modifications as fall within the true spirit and scope of the invention.

データベース中の元の顔の画像である。An image of the original face in the database. 図１Ａの顔をリサンプリングした画像である。It is the image which resampled the face of FIG. 1A. スキャンによって得られた頭部の３Ｄモデルの画像である。It is a 3D model image of the head obtained by scanning. 図１Ｂのリサンプリング画像を図１Ｃのモデルと結合することによって得られた画像である。1B is an image obtained by combining the resampled image of FIG. 1B with the model of FIG. 1C. テセレートされた観測球である。A tessellated observation sphere. 視点が破棄された状態の、テセレートされた観測球である。A tessellated observation sphere with the viewpoint discarded. 視点がクラスタ化された状態の観測球である。An observation sphere with the viewpoints clustered. １０のアスペクト視点から得られたシルエットである。It is a silhouette obtained from 10 aspect viewpoints. 本発明による方法のブロック図である。Fig. 2 is a block diagram of a method according to the invention. 本発明による観測球の図である。It is a figure of the observation sphere by the present invention.

Claims

A method for obtaining an optimal set of viewpoints for 3D shape construction of a face from a 2D image obtained by an optimal set of viewpoints,
Tessellating an observation sphere having a plurality of viewpoint cells, wherein the face is located approximately in the center of the observation sphere;
Discarding the selected viewpoint cell;
Clustering the remaining viewpoint cells into a predetermined number of viewpoint cells according to a silhouette difference measure determined from the 2D image;
And searching for the predetermined number of viewpoint cells of the optimal viewpoint set. A method for obtaining an optimal viewpoint set for 3D shape construction of a face from a 2D image.

The method of claim 1, wherein triangles are used for tessellation.

The method of claim 1, wherein the tessellation is a dodecahedron equal division.

The method according to claim 1, wherein the selected viewpoint cell includes a viewpoint cell in a second half sphere of the observation sphere.

The method according to claim 1, wherein the selected viewpoint cells include viewpoint cells that are above and below a predetermined height.

The method according to claim 5, wherein the predetermined height is ± 45 degrees from a central horizontal plane of the observation sphere.

The method according to claim 1, wherein the selected viewpoint cells include viewpoint cells in one half of the face since both halves of the face are approximately symmetrical.

The method of claim 1, wherein the silhouette difference measure is measured for each viewpoint cell pair.

The method of claim 1, wherein the predetermined number of viewpoint cells is ten.

The method according to claim 1, wherein the position of the viewpoint cell is obtained by a center of gravity of a cluster of the viewpoint cell.

The method according to claim 1, wherein only the silhouette difference near a very important face area is considered.

The method of claim 1, wherein the critical facial areas include the nose, eyes, ears, chin, and mouth.

The method of claim 1, wherein the silhouette difference is stored in a pre-calculated look-up table.

The method of claim 1, wherein the search is done thoroughly.

The method of claim 1, wherein the optimal set of viewpoint cells is applied to a series of images of a video to automatically select an optimal pose for facial modeling.

The method of claim 1, wherein the optimal set of viewpoint cells is applied to a series of images of a video to automatically select an optimal pose for face recognition.

The method of claim 1, wherein the optimal set of viewpoint cells is used to construct a 3D model of the face from 2D images obtained with the optimal set of viewpoints.

The method of claim 1, wherein the search is a combinatorial search of all possible subsets of the optimal set of viewpoint cells.

The method of claim 1, wherein optimality is determined by a minimum reconstruction error between the input face and the 3D model of the face.

The method of claim 19, wherein the minimum reconstruction error is based on a combination of shape and texture.

The method of claim 19, wherein the minimum reconstruction error is based on an average of a plurality of faces.