JP4850768B2

JP4850768B2 - Apparatus and program for reconstructing 3D human face surface data

Info

Publication number: JP4850768B2
Application number: JP2007082585A
Authority: JP
Inventors: サブリ・グルブズ; 直己井ノ上
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2007-03-27
Filing date: 2007-03-27
Publication date: 2012-01-11
Anticipated expiration: 2027-03-27
Also published as: JP2008242833A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for accurately reconfiguring a 3D human face. <P>SOLUTION: The device 86 comprises a module 150 for estimating a horopter range and a size of the human face based on a corrected image pair and calibration parameters of a stereo camera; a module 152 for estimating, for each pixel, a disparity value within the horopter range; a module 154 for reconfiguring 3D surface data of the human face; a module 156 for eliminating noise in the 3D surface data and interpolating an invalid data point with neighboring data points; a module 158 for applying a face plane to the 3D surface data and extracting those data of the 3D surface data within a predetermined distance from the face plane. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

この発明は、人の顔の表面データをリアルタイムで３Ｄ再構築するための装置及びコンピュータプログラムに関し、特に、とりわけ２Ｄ／３Ｄ顔検出、抽出、認識、３Ｄゲーム開発、アニメーション及び放送のための、ステレオ画像対から人の顔の平面データをリアルタイムで３Ｄ再構築するための装置及び方法に関する。 The present invention relates to an apparatus and computer program for real-time 3D reconstruction of human face surface data, and in particular, stereo for 2D / 3D face detection, extraction, recognition, 3D game development, animation and broadcasting. The present invention relates to an apparatus and a method for 3D reconstruction of human face plane data from image pairs in real time.

最近、３Ｄの表示技術とマルチメディア技術とが世界的に進歩しており、監視、認識、人とコンピュータとの対話及びマルチメディアへの応用に関して、人の顔データをリアルタイムに３Ｄで再生することが大いに関心を集めている（非特許文献１、２）。３Ｄの物体形状をディジタルで再構築するのに利用できる様々な商業製品もある。しかしながら、これらはレーザレンジスキャナ等の特別なセンサを必要とし、パブリックドメインでの顔認識又は３Ｄのテレビ放送といった一般的な用途での人の顔の３Ｄ再構築に好適とは言いがたい。 Recently, 3D display technology and multimedia technology have advanced worldwide, and human face data can be played back in 3D in real time for monitoring, recognition, human-computer interaction and multimedia applications. Has attracted much interest (Non-Patent Documents 1 and 2). There are also various commercial products that can be used to digitally reconstruct 3D object shapes. However, these require special sensors such as laser range scanners and are not suitable for 3D reconstruction of human faces in general applications such as face recognition in the public domain or 3D television broadcasting.

公知文献での顔再構築のアプローチは、センサデータの用途により、２つのカテゴリに分けられる。モノ（単一）カメラ画像データからの再構築（非特許文献３）と、ステレオ画像データからの再構築（非特許文献１）とである。再構築技術に関しては、シェーディングからの形状形成（非特許文献４）、３Ｄ顔モデル（非特許文献５、３）、マーカ又はモデル当てはめのためのマーカとしての顔と特徴の位置（非特許文献２、６）等の様々な方法を用いている。
Ｍ．チャン、Ｐ．デルマ、Ｇ．Ｌ．ギメルファーブ及びＰ．レクラルク、「３ｄ顔獲得技術の比較研究」、画像及びパターンのコンピュータ分析（ＣＡＩＰ）、２００５年、７４０−７４７ページ。（M. Chan, P. Delmas, G.L. Gimel'farb, and P. Leclercq, "Comparative study of 3d face acquisition techniques.," in Computer Analysis of Images and Patterns (CAIP), 2005, pp. 740-747.）Ｔ．Ａ．エルデム、「個人的ユーザ対話のための３ｄ顔モデル生成のための新たな方法」、ヨーロッパ信号処理会議予稿集、ＥＵＳＩＰＣＯ、２００５年。（T.A. Erdem, "A new method for generating 3d face models for personalized user interactions," in Proceedings of European Signal Processing Conference EUSIPCO, 2005.）Ｖ．ブランツ及びＴ．ベッター、「３ｄ顔合成のための変形可能モデル」、コンピュータグラフィックス及び対話技術に関する国際会議、１９９９年。（V. Blanz and T. Vetter, "A morphable model for the synthesis of 3d faces," in International Conference on Computer Graphics and Interactive Techniques, 1999.）Ｊ．Ｊ．アティック、Ｐ．Ａ．グリフィン及びＡ．Ｎ．レドリック、「シェーディングから形状への統計的アプローチ：単一の２次元画像からの３次元顔表面の再構築」、ニューラルコンピュテーション、第８巻、１３２１−１３４０ページ、１９９６年。（J.J. Atick, P.A. Griffin, and A.N. Redlich, "Statistical approach to shape from shading: reconstruction of three-dimensional face surfaces from single two-dimensional images," Neural Computation, vol. 8, pp. 1321-1340, 1996.）Ａ．Ｒ．チャウダリ、Ｒ．チェラパ、Ｓ．クリシュナマルチ、及びＴ．ヴオ、「ジェネリックモデルを用いたビデオからの３ｄ顔再構築」、マルチメディアに関するＩＥＥＥ国際会議予稿集及び展示会、２００２年。（A.R. Chowdhury, R. Chellappa, S. Krishnamurthy, and T. Vo, "3d face reconstruction from video using a generic model," in Proceedings of IEEE International Conference on Multimedia and Expo, 2002.）Ｂ．Ｗ．ハン、Ｖ．ブランツ、Ｔ．ベッター及びＳ−Ｗリー、「少数の特徴点からの顔の再構築」、第１５回パターン認識に関する国際会議予稿集、２０００年。（B-W Hwang, V. Blanz, T. Vetter, and S-W Lee, "Face reconstruction from a small number of feature points," in Proceedings of 15th International Conference on Pattern Recognition, 2000.） The face reconstruction approach in the known literature can be divided into two categories depending on the application of sensor data. Reconstruction from mono (single) camera image data (Non-Patent Document 3) and reconstruction from stereo image data (Non-Patent Document 1). Regarding the reconstruction technique, shape formation from shading (Non-Patent Document 4), 3D face model (Non-Patent Documents 5 and 3), position of a face and a feature as a marker or a marker for model fitting (Non-Patent Document 2) , 6) etc. are used.
M.M. Chang, P. Derma, G. L. Gimel Ferb and P.I. Recklarc, “Comparative Study of 3d Face Acquisition Technology”, Computer Analysis of Images and Patterns (CAIP), 2005, pages 740-747. (M. Chan, P. Delmas, GL Gimel'farb, and P. Leclercq, "Comparative study of 3d face acquisition techniques.," In Computer Analysis of Images and Patterns (CAIP), 2005, pp. 740-747.) T.A. A. Erdem, “A New Method for 3d Face Model Generation for Personal User Interaction”, European Signal Processing Conference Proceedings, EUSIPCO, 2005. (TA Erdem, "A new method for generating 3d face models for personalized user interactions," in Proceedings of European Signal Processing Conference EUSIPCO, 2005.) V. Brandz and T.W. Better, “Deformable Model for 3d Face Synthesis”, International Conference on Computer Graphics and Dialogue Technology, 1999. (V. Blanz and T. Vetter, "A morphable model for the synthesis of 3d faces," in International Conference on Computer Graphics and Interactive Techniques, 1999.) J. et al. J. et al. Attic, P.A. A. Griffin and A.I. N. Redrick, “Statistical Approach from Shading to Shape: Reconstructing 3D Face Surface from a Single 2D Image”, Neural Computation, Vol. 8, pp. 1321-1340, 1996. (JJ Atick, PA Griffin, and AN Redlich, "Statistical approach to shape from shading: reconstruction of three-dimensional face surfaces from single two-dimensional images," Neural Computation, vol. 8, pp. 1321-1340, 1996.) A. R. Chowdal, R.D. Chelapa, S.C. Krishna Multi, and T. Vuo, “3d face reconstruction from videos using generic models”, IEEE International Conference Proceedings and Exhibition on Multimedia, 2002. (AR Chowdhury, R. Chellappa, S. Krishnamurthy, and T. Vo, "3d face reconstruction from video using a generic model," in Proceedings of IEEE International Conference on Multimedia and Expo, 2002.) B. W. Han, V. Brandz, T. Better and SW Lee, “Reconstruction of faces from a few feature points”, Proceedings of the 15th International Conference on Pattern Recognition, 2000. (BW Hwang, V. Blanz, T. Vetter, and SW Lee, "Face reconstruction from a small number of feature points," in Proceedings of 15th International Conference on Pattern Recognition, 2000.)

従来のステレオアルゴリズム（非特許文献１を参照）では、３Ｄの人の顔を正確に再構築できないことがしばしば起こる。これは、ディスパリティ検索アルゴリズムの検索ドメインが未知で、そのために広い場合に、画像対のいくつかの画素の場所がステレオ対応の一致に関して一義的な情報を提供しないためである。 Conventional stereo algorithms (see Non-Patent Document 1) often fail to accurately reconstruct 3D human faces. This is because when the search domain of the disparity search algorithm is unknown and therefore wide, the location of some pixels in the image pair does not provide unambiguous information regarding stereo correspondence.

このため、この発明の目的の一つは、３Ｄの人の顔を正確に再構築することのできる装置を提供することである。 Thus, one of the objects of the present invention is to provide an apparatus that can accurately reconstruct a 3D human face.

この発明の別の目的は、画像対のいくつかの画素の場所がステレオ対応の一致に関して一義的な情報を与えない場合でも、３Ｄの人の顔を正確かつ頑健に再構築することのできる装置を提供することである。 Another object of the present invention is an apparatus that can accurately and robustly reconstruct a 3D person's face even if the location of some pixels in the image pair does not give unambiguous information about stereo correspondence. Is to provide.

この発明の第１の局面に従えば、ステレオカメラの補正された画像対から３次元の人の顔表面データを再構築するための装置は、ステレオカメラのキャリブレーションパラメータと補正された画像対とに基づいて人の顔のサイズとホロプタ範囲とを推定するためのホロプタ範囲推定手段と、画像対の一方の画像中の人の顔の各画素について、画像対の他方の画像におけるホロプタ範囲内のディスパリティ値を推定するためのディスパリティ推定手段と、補正された画像対、ディスパリティ値、及びステレオカメラのキャリブレーションパラメータに基づいて、人の顔の３次元表面データを再構築するための３次元表面再構築手段と、無効なディスパリティ値を有する無効なデータ点を見出し、無効なデータ点を隣接するデータ点で補間することによって、３次元表面データ内のノイズを消去するためのノイズ消去手段と、予め定められた当てはめアルゴリズムを利用して、顔平面を３次元表面データに当てはめるための当てはめ手段と、３次元表面データのうち顔平面から予め定められた距離以内にあるものを３次元の人の顔として抽出するための抽出手段と、を含む。 According to a first aspect of the present invention, an apparatus for reconstructing three-dimensional human face surface data from a stereo camera corrected image pair includes a stereo camera calibration parameter and a corrected image pair. A horopter range estimation means for estimating a human face size and a horopter range based on the image, and for each pixel of the human face in one image of the image pair, within the horopter range in the other image of the image pair 3 for reconstructing the three-dimensional surface data of the human face based on the disparity estimating means for estimating the disparity value and the corrected image pair, the disparity value, and the calibration parameter of the stereo camera Finds invalid data points with invalid disparity values and interpolates invalid data points with adjacent data points. By using a noise erasing unit for erasing noise in the three-dimensional surface data, a fitting unit for applying the face plane to the three-dimensional surface data using a predetermined fitting algorithm, Extraction means for extracting a face within a predetermined distance from the face plane as a three-dimensional human face.

補正されたステレオ画像対がこの装置に与えられる。ホロプタ範囲推定手段が、ステレオカメラのキャリブレーションパラメータと補正された画像対とに基づいて、ホロプタ範囲と人の顔のサイズとを推定する。ディスパリティ推定手段は、一方の画像における人の顔の各画素のディスパリティ値を、他方の画像のホロプタ範囲内で推定する。ディスパリティ値の探索区域がホロプタ範囲に限定されるため、探索は頑健かつ高速である。ディスパリティ値が推定されると、人の顔の３次元表面データが再構築される。表面データからノイズが消去された後、無効データが隣接するデータ点によって補間される。当てはめ手段は顔の平面を３次元の表面データに当てはめる。最後に、３次元表面データのうち、顔の平面から予め定められた距離内にあるものが抽出され、３次元の人の顔が再構築される。 A corrected stereo image pair is provided to the device. The horopter range estimation means estimates the horopter range and the size of the human face based on the calibration parameter of the stereo camera and the corrected image pair. The disparity estimating means estimates the disparity value of each pixel of the human face in one image within the horopter range of the other image. The search is robust and fast because the search area for disparity values is limited to the horopter range. Once the disparity value is estimated, the 3D surface data of the human face is reconstructed. After noise is eliminated from the surface data, invalid data is interpolated by adjacent data points. The fitting means applies the face plane to the three-dimensional surface data. Finally, among the three-dimensional surface data, data within a predetermined distance from the face plane is extracted, and the three-dimensional human face is reconstructed.

したがって、特別なセンサを何ら必要とせずに、３Ｄの人の顔を正確に再構築できる装置が提供される。 Thus, an apparatus is provided that can accurately reconstruct a 3D human face without requiring any special sensors.

好ましくは、ホロプタ範囲推定手段は、画像対の一方の画像中の両眼間パターンを特定するための第１の特定手段と、他方の画像において両眼間パターンに対応するエピポーラ線に沿った対応のパターンを特定するための第２の特定手段と、一方の画像における両眼間パターン付近の顔候補と対応のパターンとが検証条件を満たしているか否かを検証するための検証手段と、ある顔候補が認証条件を満たすまで、第１及び第２の特定手段と検証手段とを繰返し動作させるための手段と、検証条件を満たす顔候補によって、顔領域を規定するための手段と、を含む。 Preferably, the horopter range estimation means corresponds to the first specifying means for specifying the interocular pattern in one image of the image pair and the epipolar line corresponding to the interocular pattern in the other image. A second specifying unit for specifying the pattern of the image, and a verification unit for verifying whether the face candidate near the interocular pattern in one image and the corresponding pattern satisfy the verification condition, Means for repeatedly operating the first and second specifying means and the verification means until the face candidate satisfies the authentication condition, and means for defining the face area by the face candidate satisfying the verification condition .

さらに好ましくは、ディスパリティ推定手段は、ホロプタ範囲及び顔領域のサイズについて、相関ウィンドウサイズを規定するための手段と、予め選択された類似度尺度に従って、最も高い類似度尺度を生じる画素を、他方の画像で探索することによって、人の顔における各画素のディスパリティ値を計算するための手段と、を含む。 More preferably, the disparity estimation means determines the correlation window size for the size of the horopter range and the face area, and the pixel that produces the highest similarity measure according to the preselected similarity measure, Means for calculating the disparity value of each pixel in the human face by searching in the image of.

ノイズ消去手段は、顔のサイズについて偏差ウィンドウのサイズを規定して、偏差ウィンドウにおける画素の深さの値の標準偏差を求めるための手段と、一方の画像において顔領域内の各画素について、一方の画像における顔領域の画素の各々について計算されたディスパリティと画像対とを用いて、各画素付近のサイズの偏差ウィンドウ内の画素の深さの局所標準偏差と平均とを計算するための手段と、各画素付近の偏差ウィンドウの平均からの各画素の深さの値の偏差が各画素付近の偏差ウィンドウの標準偏差値より小さいか否かに依存して、顔領域内の各画素を、有効と無効とに分類するための手段と、無効と分類された画素の深さ値をそれぞれ隣接する画素の深さ値で補間するための手段と、を含んでもよい。 The noise erasing means defines the size of the deviation window with respect to the size of the face, and obtains a standard deviation of the pixel depth value in the deviation window, and for each pixel in the face area in one image, Means for calculating local standard deviation and average of pixel depth within a deviation window of size near each pixel using the disparity and image pair calculated for each pixel of the face region in the image of And depending on whether the deviation of the value of the depth of each pixel from the average of the deviation window near each pixel is smaller than the standard deviation value of the deviation window near each pixel, Means for classifying as valid and invalid and means for interpolating the depth values of the pixels classified as invalid with the depth values of the neighboring pixels may be included.

ノイズは３次元表面データから無効なデータとして消去され、欠落した無効データ点の深さ値は隣接するデータ点から補間されるので、画像対のいくつかの画素の場所がステレオ対応の一致に関して一義的な情報を与えない場合でも、３次元の人の顔が頑健かつ正確に再構築される。 Noise is erased as invalid data from the 3D surface data, and the depth values of missing invalid data points are interpolated from adjacent data points, so that the location of some pixels in the image pair is unambiguous with respect to stereo correspondence Even if no specific information is given, the face of a three-dimensional person is reconstructed robustly and accurately.

好ましくは、当てはめ手段は、３次元表面データにおいて左目と右目とをつなぐ目線を見出すための手段と、目線に直交する３次元の顔対称線を見出すための手段と、顔の平面に当てはめるべきデータ点を選択するための手段と、を含む。区域は顔対称線に沿ったたんざく形領域を除く、顔の目線から顎まで延びている。当てはめ手段はさらに、前期予め定められた当てはめアルゴリズムを用いて、顔平面を当てはめるべきデータ点に当てはめるための手段を含む。 Preferably, the fitting means includes means for finding an eye line connecting the left eye and the right eye in the three-dimensional surface data, means for finding a three-dimensional face symmetry line orthogonal to the eye line, and data to be applied to the face plane. Means for selecting points. The area extends from the eyes of the face to the chin, except for the tangled area along the face symmetry line. The fitting means further includes means for fitting the face plane to the data points to be fitted using a pre-determined fitting algorithm.

さらに好ましくは、当てはめ手段は、最小二乗誤差による当てはめを利用して、顔平面を当てはめるべきデータ点に当てはめるための手段と、顔平面と３次元表面データとの間の誤差が全て予め定められたしきい値内にあるか否かを判定するための手段と、この手段が、誤差の全てが予め定められたしきい値より小さくはないと判定したことに応じて、３次元表面からある高い誤差のある点を消去するための手段と、判定手段が誤差の全てが予め定められたしきい値より小さいと判断するまで、当てはめ手段、判定手段、及び消去手段を繰返し動作させるための手段と、を含む。 More preferably, the fitting means uses a least square error fitting to fit the face plane to the data point to be fitted, and all errors between the face plane and the three-dimensional surface data are predetermined. A means for determining whether or not it is within a threshold and a high from the three-dimensional surface in response to determining that all of the errors are not less than a predetermined threshold Means for erasing points with errors, and means for repeatedly operating the fitting means, the judging means, and the erasing means until the judging means judges that all of the errors are smaller than a predetermined threshold value. ,including.

この発明の第２の局面に従えば、コンピュータプログラムは、コンピュータ上で実行されると、当該コンピュータを、ステレオカメラのキャリブレーションパラメータと補正された画像対とに基づいて人の顔のサイズとホロプタ範囲とを推定するためのホロプタ範囲推定手段と、画像対の一方の画像中の人の顔の各画素について、画像対の他方の画像におけるホロプタ範囲内のディスパリティ値を推定するためのディスパリティ推定手段と、補正された画像対、ディスパリティ値、及びステレオカメラのキャリブレーションパラメータに基づいて、人の顔の３次元表面データを再構築するための３次元表面再構築手段と、無効なディスパリティ値を有する無効なデータ点を見出し、無効なデータ点を隣接するデータ点で補間することによって、３次元表面データ内のノイズを消去するためのノイズ消去手段と、予め定められた当てはめアルゴリズムを利用して、顔の平面を３次元表面データに当てはめるための当てはめ手段と、３次元表面データのうち顔の平面から予め定められた距離以内にあるものを３次元の人の顔として抽出するための抽出手段と、として機能させる。 According to the second aspect of the present invention, when the computer program is executed on the computer, the computer program is configured to determine the size of the human face and the horopter based on the calibration parameter of the stereo camera and the corrected image pair. A horopter range estimation means for estimating the range, and a disparity for estimating a disparity value in the horopter range in the other image of the image pair for each pixel of the human face in one image of the image pair An estimation means, a three-dimensional surface reconstruction means for reconstructing the three-dimensional surface data of the human face based on the corrected image pair, the disparity value, and the calibration parameters of the stereo camera; By finding invalid data points with parity values and interpolating invalid data points with adjacent data points Among the three-dimensional surface data, a noise erasing means for erasing noise in the three-dimensional surface data, a fitting means for applying a plane of the face to the three-dimensional surface data using a predetermined fitting algorithm, It functions as an extraction means for extracting a face within a predetermined distance from the face plane as a three-dimensional human face.

[構造]
この実施の形態は、リアルタイムの３Ｄ顔表面データの獲得のために、ステレオビジョン（立体視）に基づく３Ｄ顔表面再構築アルゴリズムを利用する。この実施の形態では、観察される個人の顔のホロプタ情報を利用して、ディスパリティアルゴリズムの探索ドメインを制限し、ディスパリティ探索範囲を選択する。 [Construction]
This embodiment utilizes a 3D face surface reconstruction algorithm based on stereo vision for the acquisition of real-time 3D face surface data. In this embodiment, the search domain of the disparity algorithm is limited and the disparity search range is selected using the horopter information of the observed individual face.

ステレオビジョンの分野における「ホロプタ」とは、ステレオ相関アルゴリズムの探索間隔によってカバーされる３Ｄ容積と定義される。この明細書で用いられる「ホロプタ範囲」は、ステレオカメラの前にある３Ｄ容積の範囲である。 A “holopter” in the field of stereovision is defined as a 3D volume covered by the search interval of the stereo correlation algorithm. As used in this specification, the “Horopter range” is the range of the 3D volume in front of the stereo camera.

この実施の形態ではさらに、探索のための相関ウィンドウのサイズを制限する。この実施の形態では、ホロプタ情報は高速ステレオアイトラッキングシステムによって推定される。提案されるアルゴリズムはさらに、３Ｄ顔表面データの高速で正確な抽出のために、顔の平面を推定しこれを利用する。この実施の形態によって再構築された３Ｄ顔表面は、自然な顔の表情での視覚的情報のほとんどを保持しているため、顔の視覚的ダイナミクスを保っている。 This embodiment further limits the size of the correlation window for the search. In this embodiment, the horopter information is estimated by a fast stereo eye tracking system. The proposed algorithm further estimates and uses the face plane for fast and accurate extraction of 3D face surface data. The 3D face surface reconstructed according to this embodiment retains the visual dynamics of the face because it retains most of the visual information with natural facial expressions.

この実施の形態は、観察される個人の人の顔データをリアルタイムで３Ｄで再構築し抽出することに関する。 This embodiment relates to reconstructing and extracting face data of an observed individual person in real time in 3D.

図１はこの実施の形態に従った３Ｄ顔姿勢推定システム５０の全体構造を示す図である。図１を参照して、３Ｄ顔姿勢推定システム５０は、ステレオカメラ６０と、モニタ６４と、ステレオカメラ６０からのステレオ画像対のストリームから、３Ｄの人の顔姿勢を推定し、３Ｄの人の顔を抽出するための、リアルタイム３Ｄ顔姿勢推定装置６２と、を含む。 FIG. 1 is a diagram showing an overall structure of a 3D face posture estimation system 50 according to this embodiment. Referring to FIG. 1, the 3D face posture estimation system 50 estimates a 3D human face posture from a stereo camera 60, a monitor 64, and a stream of stereo image pairs from the stereo camera 60, and determines the 3D human face posture. A real-time 3D face posture estimation device 62 for extracting a face.

リアルタイム３Ｄ顔姿勢推定装置６２は、後述の通りコンピュータのハードウェア及びソフトウェアで実現され、ステレオカメラ６０をキャリブレートし、ステレオカメラ６０のキャリブレーションパラメータを出力するためのキャリブレーションソフトウェア８０と、キャリブレーションソフトウェア８０によって出力されるキャリブレーションパラメータを記憶するためのキャリブレーションパラメータメモリ８２と、キャリブレーションパラメータメモリ８２に記憶されたキャリブレーションパラメータを利用してステレオカメラ６０からのステレオ画像対を補正するための補正ソフトウェア８４と、補正ソフトウェア８４からの補正されたステレオ画像対を分析することによって、抽出された３Ｄの顔を再構築するための３Ｄ顔再構築モジュール８６と、再構築された３Ｄ顔画像を利用して、３Ｄ空間における頭の姿勢、すなわち位置及び向きを推定するための、３Ｄ頭部姿勢推定モジュール８８と、を含む。 The real-time 3D face posture estimation device 62 is realized by computer hardware and software as described later, calibration software 80 for calibrating the stereo camera 60 and outputting calibration parameters of the stereo camera 60, and calibration software A calibration parameter memory 82 for storing calibration parameters output by 80 and a correction for correcting a stereo image pair from the stereo camera 60 using the calibration parameters stored in the calibration parameter memory 82 Software 84 and 3D face reconstruction to reconstruct the extracted 3D face by analyzing the corrected stereo image pair from the correction software 84 Includes a module 86, by using the 3D facial images reconstructed, head posture in 3D space, i.e. for estimating the position and orientation, the 3D head pose estimation module 88, a.

リアルタイム３Ｄ顔姿勢推定装置６２はさらに、キャリブレーションソフトウェア８０及び補正ソフトウェア８４からアクセス可能であって、３Ｄ顔再構築モジュール８６と３Ｄ頭部姿勢推定モジュール８８とに接続されたバス９２と、バス９２及びモニタ６４に接続された画像処理ユニット（Ｇｒａｐｈｉｃｐｒｏｃｅｓｓｉｎｇｕｎｉｔ：ＧＰＵ）９０とを含む。 The real-time 3D face pose estimation device 62 is further accessible from the calibration software 80 and the correction software 84 and is connected to a 3D face reconstruction module 86 and a 3D head pose estimation module 88; And an image processing unit (GPU) 90 connected to the monitor 64.

後述するように、キャリブレーションソフトウェア８０及び補正ソフトウェア８４の命令はコンピュータの中央処理装置（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：ＣＰＵ）によって実行され、所望の機能を達成する。ＣＰＵはバス９２に接続されているため、キャリブレーションソフトウェア８０及び補正ソフトウェア８４は３Ｄ顔再構築モジュール８６、３Ｄ頭部姿勢推定モジュール８８及びＧＰＵ９０に、バス９２を介してアクセスすることができる。 As will be described later, the instructions of the calibration software 80 and the correction software 84 are executed by a central processing unit (CPU) of the computer to achieve a desired function. Since the CPU is connected to the bus 92, the calibration software 80 and the correction software 84 can access the 3D face reconstruction module 86, the 3D head posture estimation module 88 and the GPU 90 via the bus 92.

図２はコンピュータによって実現されたこの実施の形態の３Ｄ顔姿勢推定システム５０のハードウェアブロック図である。図２を参照して、３Ｄ顔姿勢推定システム５０はハードウェア構成要素として、コンピュータ１００と、コンピュータ１００に接続された、マウス１０２と、キーボード１０４と、モニタ６４とを含む。 FIG. 2 is a hardware block diagram of the 3D face pose estimation system 50 of this embodiment realized by a computer. With reference to FIG. 2, the 3D face posture estimation system 50 includes a computer 100, a mouse 102, a keyboard 104, and a monitor 64 connected to the computer 100 as hardware components.

バス９２及びＧＰＵ９０に加えて、コンピュータ１００はさらに、ＣＰＵ１２０と、読出し専用メモリ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ：ＲＯＭ）１２２と、ランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ：ＲＡＭ）１２４と、ハードディスク１２６と、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）メディア１０６を駆動するためのディジタル多用途ディスクドライブ１２８と、ステレオカメラ６０からのステレオ画像対のストリームを受けるためのビデオキャプチャボード１３０と、半導体メモリ１０８を駆動するための半導体メモリドライブ１３４と、を含む。コンピュータ１００内の構成要素は全てバス９２に接続され、互いにアクセス可能である。 In addition to the bus 92 and the GPU 90, the computer 100 further includes a CPU 120, a read only memory (ROM) 122, a random access memory (RAM) 124, a hard disk 126, and a DVD (Digital Versatile). Disk) a digital versatile disk drive 128 for driving media 106, a video capture board 130 for receiving a stream of stereo image pairs from stereo camera 60, and a semiconductor memory drive 134 for driving semiconductor memory 108. ,including. All the components in the computer 100 are connected to the bus 92 and are accessible to each other.

ステレオカメラ６０は左カメラ６０Ｌと右カメラ６０Ｒとを含む。ステレオカメラ６０からの画像〜Ｌ_ｔと〜Ｒ_ｔ（ｔは時間を示す）（「〜」記号は本来文字の上に付されるものである）とは補正されていない。 Stereo camera 60 includes a left camera 60L and a right camera 60R. The images ~ L _t and ~ R _t (t indicates time) from the stereo camera 60 ("~" symbol is originally added on the character) are not corrected.

この実施の形態では、キャリブレーションソフトウェア８０はステレオカメラ６０のキャリブレーションパラメータＡ_１及びＡ_２を計算するためにオフラインで用いられる。キャリブレーションは、カメラ６０Ｌ及び６０Ｒの各々のラジアルディストーション、レンズの偏心、焦点距離、画素アスペクト比、ベースライン及び配向を訂正するために行われる。キャリブレーションパラメータはキャリブレーションパラメータメモリ８２に記憶される。この実施の形態のキャリブレーション処理では、ユーザがステレオカメラ６０に予め規定されたパターンを提示する。キャリブレーションソフトウェア８０はステレオカメラ６０の出力〜Ｌ_ｔ及び〜Ｒ_ｔを利用して、パラメータを計算する。キャリブレーションのためのソフトウェアは商業的に入手可能である。例えば、ＳＲＩインターナショナルの頒布するスモールビジョンシステム（ｓｍａｌｌｖｉｓｉｏｎｓｙｓｔｅｍ：ＳＶＳ）が利用できる。 In this embodiment, the calibration software 80 is used offline to calculate the calibration parameters A ₁ and A ₂ of the stereo camera 60. Calibration is performed to correct the radial distortion, lens eccentricity, focal length, pixel aspect ratio, baseline and orientation of each of the cameras 60L and 60R. The calibration parameters are stored in the calibration parameter memory 82. In the calibration process according to this embodiment, the user presents a predetermined pattern on the stereo camera 60. Calibration software 80 utilizes the output ~L _t and to R _t of the stereo camera 60, to calculate the parameters. Software for calibration is commercially available. For example, a small vision system (SVS) distributed by SRI International can be used.

補正ソフトウェア８４はステレオカメラ６０の出力ステレオ画像〜Ｌ_ｔ及び〜Ｒ_ｔを補正するのに用いられる。ここで補正とは、左右の画像の対応するエピポーラ線を同じレベルに整列させることを意味する。この処理を図４に示す。 Correction software 84 is used to correct the output stereo image ~L _t and to R _t of the stereo camera 60. Here, the correction means that the corresponding epipolar lines in the left and right images are aligned at the same level. This process is shown in FIG.

図４を参照して、ステレオカメラ６０の左右の画像１７０Ｌと１７０Ｒとが線１７２Ｌとこれに対応する線１７２Ｒとをそれぞれ含むと仮定する。レンズのディストーションとレンズの配向の差とにより、同じ線の画像が異なる位置をとり、パララックスを別として、左右の画像で異なる形状となっている。 Referring to FIG. 4, it is assumed that left and right images 170L and 170R of stereo camera 60 each include a line 172L and a corresponding line 172R. Due to the distortion of the lens and the difference in the orientation of the lens, images of the same line take different positions, and apart from the parallax, the left and right images have different shapes.

これらの画像を補正することにより、対応する線１７２Ｌと１７２Ｒとは補正後の左右の画像１８０Ｌ及び１８０Ｒではエピポーラ線１８２Ｌ及び１８２Ｒとして画像の行に整列している。補正なしの場合、３Ｄステレオ画像からのリアルタイムの３Ｄの顔再構築はほとんど不可能である。補正は、所定の計算によって実施できる。この計算では、キャリブレーションパラメータメモリ８２に記憶されたキャリブレーションパラメータが用いられる。補正ソフトウェアもまた、商業的に入手可能である。 By correcting these images, the corresponding lines 172L and 172R are aligned in the image row as epipolar lines 182L and 182R in the corrected left and right images 180L and 180R. Without correction, real-time 3D face reconstruction from 3D stereo images is almost impossible. The correction can be performed by a predetermined calculation. In this calculation, the calibration parameter stored in the calibration parameter memory 82 is used. Correction software is also commercially available.

補正された画像Ｌ_ｔ及びＲ_ｔが補正ソフトウェア８４から構成パラメータＡ_１及びＡ_２とともに３Ｄ顔再構築モジュール８６に与えられる。 The corrected images L _t and R _t are provided from the correction software 84 to the 3D face reconstruction module 86 along with the configuration parameters A ₁ and A ₂ .

図３は図１に示す３Ｄ顔再構築モジュール８６の全体構造を示す。３Ｄ顔再構築モジュール８６はコンピュータ１００上で実行されるソフトウェアで実現される。 FIG. 3 shows the overall structure of the 3D face reconstruction module 86 shown in FIG. The 3D face reconstruction module 86 is realized by software executed on the computer 100.

図３を参照して、３Ｄ顔再構築モジュール８６は、補正された画像対Ｌ_ｔ及びＲ_ｔとキャリブレーションパラメータＡ_１及びＡ_２とに基づいて、入力された顔画像のホロプタ範囲ｈｔ＝[ｄ_１,ｄ_２]を推定するためのホロプタ推定モジュール１５０と、左右の画像における対応する画素間のディスパリティを計算し、ディスパリティ画像Ｄ_ｔをキャリブレーションパラメータＡ_１及びＡ_２と左カメラ画像Ｌ_ｔとともに出力するためのディスパリティ画像モジュール１５２と、ディスパリティ画像モジュール１５２によって計算されたディスパリティ画像Ｄ_ｔから顔の粗い３Ｄ表面データ〜Ｓ_ｔを再構築するための３Ｄ表面再構築モジュール１５４と、を含む。 Referring to FIG. 3, the 3D face reconstruction module 86 uses the corrected image pair L _t and R _t and the calibration parameters A ₁ and A _2, and the horopter range ht = [ horopter estimation module 150 for estimating d ₁ , d ₂ ], disparity between corresponding pixels in the left and right images, and calculating the disparity image D _t with the calibration parameters A ₁ and A ₂ and the left camera image a disparity image module 152 to output together with L _t, the disparity image module 152 3D surface reconstruction module for reconstructing a rough 3D surface data to S _t of the face from the calculated disparity image D _t by 154 And including.

３Ｄ顔再構築モジュール８６はさらに、粗い３Ｄ顔データ〜Ｓｔからのアウトライア又はノイズを消去し、３Ｄ表面データを平滑化して３Ｄ顔データＳｔを出力するためのノイズ消去及び補間モジュール１５６と、補間され平滑化された３Ｄ顔データＳｔに基づいて顔平面を推定し、３Ｄの顔を顔平面から予め定められた距離内にあるボクセルからなるセグメント〜Ｆｔを抽出するための３Ｄ表面抽出モジュール１５８と、を含む。 The 3D face reconstruction module 86 further eliminates outliers or noise from the rough 3D face data to St, smooths the 3D surface data, and outputs the 3D face data St. A 3D surface extraction module 158 for estimating a face plane based on the smoothed 3D face data St, and extracting a segment to Ft consisting of voxels within a predetermined distance of the 3D face from the face plane; ,including.

ここで用いる「顔平面」とは、顔の表面を平坦な表面に近似する、３Ｄの顔画像に関連して規定される面を意味する。 As used herein, “face plane” means a surface defined in relation to a 3D face image that approximates the surface of the face to a flat surface.

この実施の形態では、３Ｄ顔再構築モジュール８６はさらに、肌のテクスチャを３Ｄ表面抽出モジュール１５８から出力された３Ｄ顔セグメント〜Ｆｔにマッピングするためのテクスチャマッピングモジュール１６０を含む。肌のテクスチャがマッピングされた３Ｄ顔画像Ｆｔはモニタ６４に与えられ、表示される。 In this embodiment, the 3D face reconstruction module 86 further includes a texture mapping module 160 for mapping the skin texture to the 3D face segment to Ft output from the 3D surface extraction module 158. The 3D face image Ft to which the skin texture is mapped is given to the monitor 64 and displayed.

＜アイトラッキング＞
ホロプタ推定モジュール１５０はホロプタの推定にあたってアイトラッキングアルゴリズムを利用する。この作業にはどのようなアイトラッキングアルゴリズムも好適であろう。この実現例では、両目間のパターン（目の場所は頬と鼻梁に比べ光強度が低い）が、サイズを更新したパターンマッチングによって検出されトラッキングされる。顔のスケールに対応するために、検出の間に様々なスケールのパターンが考慮され、これに従ってトラッキングに適したスケールが選択される。 <Eye tracking>
The horopta estimation module 150 uses an eye tracking algorithm in estimating the horopter. Any eye tracking algorithm would be suitable for this task. In this implementation, the pattern between the eyes (where the eye location has a lower light intensity than the cheek and nose bridge) is detected and tracked by pattern matching with an updated size. To accommodate the face scale, different scale patterns are considered during detection and a scale suitable for tracking is selected accordingly.

アルゴリズムは「積分画像」と称される、入力された画像の中間的表現を計算する。積分画像は、画像の各画素について、画素を包含する任意のサイズの長方形内の画素値を合計し、結果として得られる和をその画素に加算することで得られる。その後、６分割長方形状（ｓｉｘｓｅｇｍｅｎｔｅｄｒｅｃｔａｎｇｕｌａｒ：ＳＳＲ）フィルタを用いて、画像の目領域の明暗関係を高速でフィルタリングする。候補の初期検証のために、サポートベクタマシン（ｓｕｐｐｏｒｔｖｅｃｔｏｒｍａｃｈｉｎｅ：ＳＶＭ）アルゴリズムを用いる。 The algorithm computes an intermediate representation of the input image, called the “integrated image”. An integral image is obtained by summing the pixel values in a rectangle of any size that encompasses the pixels for each pixel of the image and adding the resulting sum to that pixel. Thereafter, the light / dark relationship of the eye area of the image is filtered at high speed using a six segmented rectangular (SSR) filter. For initial verification of candidates, a support vector machine (SVM) algorithm is used.

エピポーラの制約を用いて、右カメラ画像内のエピポーラ線に沿って、ステレオ目位置とユーザの頭部のホロプタとを推定するのための相関ベースのテンプレートマッチングアルゴリズムを用いて、対応する両目間のパターンを探索する。 Using epipolar constraints, use a correlation-based template matching algorithm to estimate the stereo eye position and the user's head horopter along the epipolar line in the right camera image, Search for patterns.

ここで用いる「ホロプタ」はステレオ相関アルゴリズムの探索間隔でカバーされる３Ｄの容積を意味する。「ホロプタ範囲」はここで、ステレオカメラの前にある３Ｄの容積の範囲である。これらの用語はステレオビジョンの分野ではしばしば用いられ、ステレオ画像対の画素（対画素）の３Ｄ座標点を見出すのに有用である。 As used herein, “Holoputa” means a 3D volume covered by the search interval of the stereo correlation algorithm. The “holopter range” is here the range of the 3D volume in front of the stereo camera. These terms are often used in the field of stereovision and are useful for finding the 3D coordinate points of a pixel in a stereo image pair (vs. pixel) .

例えば、３Ｄの世界の任意の未知点＜Ｘ，Ｙ，Ｚ＞に関して、（ｘ_Ｌ，ｙ_Ｌ）を左カメラ画像の座標とする。ステレオカメラシステムから＜Ｘ，Ｙ，Ｚ＞の値を計算するためには、ステレオ相関アルゴリズムを用いて「右」カメラ画像の対応の点（ｘ_Ｒ，ｙ_Ｒ）を求める必要がある。 For example, regarding any unknown point <X, Y, Z> in the 3D world, let (x _L , y _L ) be the coordinates of the left camera image. In order to calculate the value of <X, Y, Z> from the stereo camera system, it is necessary to find the corresponding point (x _R , y _R ) of the “right” camera image using a stereo correlation algorithm.

未知の対象点＜Ｘ，Ｙ，Ｚ＞のホロプタについて全く見当がつかない場合には、右カメラの探索区間はｘ_Ｒ＝[０,ｘ_Ｌ］（画像が補正された場合にはｙ_Ｒ＝ｙ_Ｌ）でなければならない。これには２つの欠点がある。 If there is no idea about the horopter at the unknown target point <X, Y, Z>, the search interval of the right camera is x _R = [0, x _L ] (if the image is corrected, y _R = y _L ). This has two drawbacks.

１−探索範囲が広いため、計算費用が高い。 1- Since the search range is wide, the calculation cost is high.

２−一致の誤検出を生じる可能性が高い。 2- There is a high possibility of false detection of coincidence.

未知の対象点＜Ｘ，Ｙ，Ｚ＞のホロプタについて大まかにわかっている場合には、探索区間は、右カメラ画像の範囲について、ｘ_Ｒ＝［ｘ_１，ｘ_２］（画像が補正された場合にはｙ_Ｒ＝ｙ_Ｌ）と設定することができる。 If the horopter at the unknown target point <X, Y, Z> is roughly known, the search interval is x _R = [x ₁ , x ₂ ] (image corrected for the range of the right camera image In this case, y _R = y _L ) can be set.

この場合、［ｘ_１，ｘ_２］の区間は、[０,ｘ_Ｌ］の区間の部分集合である。したがって、計算費用は低くなり、一致の誤検出を生じる確率は格段に下がり、したがって、（もしあれば）誤差を補正することが可能となる。 In this case, the interval [x ₁ , x ₂ ] is a subset of the interval [0, x _L ]. Thus, the computational cost is lower and the probability of a false detection of a match is greatly reduced, thus making it possible to correct the error (if any).

ホロプタは以下のように計算することができる。補正されたステレオカメラシステムの光軸に沿って、対象点＜Ｘ，Ｙ，７０ｃｍ＞があると仮定する。Ｌ＜１２５，２００＞をこの点の「左」カメラ画像の座標とし、Ｒ＜７０，２００＞をこの点の「右」カメラ画像の座標とする。対象点＜Ｘ，Ｙ，７０ｃｍ＞のステレオビジョンに基づく３Ｄの再構築は以下の通りである。 The horopter can be calculated as follows. Assume that there is an object point <X, Y, 70 cm> along the optical axis of the corrected stereo camera system. Let L <125,200> be the coordinates of the “left” camera image at this point, and R <70,200> be the coordinates of the “right” camera image at this point. The 3D reconstruction based on the stereo vision of the target point <X, Y, 70 cm> is as follows.

もし、Ｌ＜１２５，２００＞、Ｒ＜７０，２００＞及びカメラのキャリブレーションマトリクスが既知であれば、対象点の座標＜Ｘ，Ｙ，７０ｃｍ＞を３Ｄで計算することができる。 If L <125,200>, R <70,200> and the camera calibration matrix are known, the coordinates <X, Y, 70 cm> of the target point can be calculated in 3D.

ホロプタ情報が未知の場合、利用できるのはこの点の「左」カメラ画像座標からのＬ＜１２５，２００＞である。まず、右カメラ画像から対応する点Ｒ＜７０，２００＞を求める必要がある。しかしながら、ｘ軸についてわかっているのはこれが０から１２５までであるということだけである。ここでは、対称点はいずれのカメラ画像でも可視であるとする。 If the horopter information is unknown, L <125,200> from the “left” camera image coordinates of this point can be used. First, it is necessary to obtain the corresponding point R <70,200> from the right camera image. However, all that is known about the x-axis is that this is from 0 to 125. Here, it is assumed that the symmetry point is visible in any camera image.

ホロプタ情報が「既知」であれば、物体が位置する３Ｄ空間の３Ｄ容積がわかっていることになる。この容積は、例えば、[＜Ｘ，Ｙ，９０ｃｍ＞，＜Ｘ，Ｙ，５０ｃｍ＞］である。この投影を用いて、
＜Ｘ，Ｙ，９０ｃｍ＞に対しＬ＜ｘ_１Ｌ，ｙ_１Ｌ＞,Ｒ＜ｘ_１Ｒ，ｙ_１Ｒ＞
を得ることができ、ここで画像が補正されているので、ｙ_１Ｌ＝ｙ_１Ｒである。したがって、ディスパリティｄ_１はｄ_１＝ｘ_１Ｌ−ｘ_１Ｒとなる。 If the horopter information is “known”, the 3D volume of the 3D space where the object is located is known. This volume is, for example, [<X, Y, 90 cm>, <X, Y, 50 cm>]. Using this projection,
For <X, Y, 90 cm>, L <x _1L , y _1L >, R <x _1R , y _1R >
Where y _1L = y _1R since the image is corrected. Therefore, the disparity _{d 1} becomes _d 1 ₌ x 1L _{-x 1R.}

同様に、
＜Ｘ，Ｙ，５０ｃｍ＞に対しＬ_{＜ｘ2Ｌ，ｙ２Ｌ＞}，Ｒ_{＜ｘ２Ｒ，ｙ２Ｒ＞}
を得ることができ、ここでｙ_２Ｌ＝ｙ_２Ｒである。したがってディスパリティｄ_２はｄ_２＝ｘ_２Ｌ−ｘ_２Ｒとなる。ここでｄ_２はｄ_１より大きい。 Similarly,
<X, Y, 50 cm> with _respect to L _{<x2L, y2L>} , R _{<x2R, y2R>}
Where y _2L = y _2R . Therefore disparity _{d 2} becomes _d 2 ₌ x 2L _{-x 2R.} Here _{d 2} is greater than _{d 1.}

したがって、＜Ｘ，Ｙ，５０ｃｍ＞と＜Ｘ，Ｙ，９０ｃｍ＞の間にある全ての点は、[ｄ_１,ｄ_２]の間のディスパリティを有することになる。したがって、上述の例のホロプタに存在する左カメラ画像の任意の点Ｌ＜ｘ_Ｌ，ｙ_Ｌ＞について、その右カメラ画像の探索間隔はｘ軸の座標値[０、ｘ_Ｌ]に代えてｘ_Ｒ＝［ｘ_Ｌ−ｄ_２，ｘ_Ｌ−ｄ_１］によりよく制限される。 Therefore, all points between <X, Y, 50 cm> and <X, Y, 90 cm> will have a disparity between [d ₁ , d ₂ ]. Therefore, for an arbitrary point L <x _L , y _L > in the left camera image that exists in the horopter in the above example, the search interval of the right camera image is replaced with x-axis coordinate value [0, x _L ]. It is well limited by _R = [x _L −d ₂ , x _L −d ₁ ].

ホロプタサイズは、顔のサイズから推定できる。この実施の形態では、左右の画像の両方から目を見出し、まず両目間の鼻梁の点の３Ｄ座標を推定する。これを＜Ｘ_ｎｂ，Ｙ_ｎｂ，Ｚ_ｎｂ＞とする。これで、顔のおおよそのサイズを知ることができる。したがって、この顔についてホロプタを規定することができる。ホロプタは顔のサイズに関連しているからである。例えば、この場合、ホロプタ範囲は顔の任意の点について＜Ｘ，Ｙ，［Ｚ_ｎｂ−１０ｃｍ，Ｚ_ｎｂ＋１０ｃｍ］＞となる。 The horopter size can be estimated from the face size. In this embodiment, eyes are found from both the left and right images, and first, the 3D coordinates of the nostril point between the eyes are estimated. This is _defined as <X _nb , Y _nb , Z _nb >. Now you can know the approximate size of the face. Therefore, a horopter can be defined for this face. This is because horopta is related to face size. For example, in this case, the horopter range is <X, Y, [Z _nb −10 cm, Z _nb +10 cm]> for any point on the face.

３ＤのＸ、Ｙ座標値はディスパリティ探索範囲にはなんの影響も持たないことに注意されたい。影響があるのはＺ座標のみである。 Note that 3D X and Y coordinate values have no effect on the disparity search range. Only the Z coordinate is affected.

したがって、前の例と同様に、右カメラ画像についてもディスパリティ探索範囲を規定することができる。すなわち、左カメラ画像の任意の点Ｌ＜ｘ_Ｌ，ｙ_Ｌ＞について、右カメラ画像における探索間隔はｘ_Ｒ＝［ｘ_Ｌ−ｄ_２，ｘ_Ｌ−ｄ_１］となる。ここでｙ_Ｌ＝ｙ_Ｒである。 Accordingly, as in the previous example, the disparity search range can be defined for the right camera image. That is, for any point L <x _L , y _L > in the left camera image, the search interval in the right camera image is x _R = [x _L −d ₂ , x _L −d ₁ ]. Here, y _L = y _R.

図５は図３に示すホロプタ推定モジュール１５０を実現するコンピュータプログラムのフロー図である。図５を参照して、キャリブレーションパラメータと補正された左右のステレオ画像対が与えられると、プログラムはステップ２００で開始し、ここで左カメラ画像の両目間パターンが特定される。 FIG. 5 is a flowchart of a computer program for realizing the horopter estimation module 150 shown in FIG. Referring to FIG. 5, given a calibration parameter and a corrected left and right stereo image pair, the program starts at step 200 where the inter-eye pattern of the left camera image is identified.

プログラムはステップ２００に続いてさらに、右カメラ画像のエピポーラ線に沿って対応のパターンを見出すステップ２０２と、ステップ２０２に続いて、緩やかに規定されたヒューリスティックな制約及びＳＶＭ（サポートベクタマシン）とを用いて顔を検証するステップ２０４と、ステップ２０４に続いて、顔が認証されたか否かを判断しプログラムの実行フローを２方向に分離させるステップ２０６と、ステップ２０６で顔が検証されたときに実行され、顔ＲＯＩ（ＲｅｇｉｏｎｏｆＩｎｔｅｒｅｓｔ：顔領域）とホロプタ範囲とを生成するステップ２０８と、を含む。ステップ２０６で顔が検証されなかった場合は、制御はステップ２００に戻る。したがって、顔が検証されるまで、ステップ２００から２０６が繰返し実行される。 Following the step 200, the program further includes a step 202 for finding a corresponding pattern along the epipolar line of the right camera image, and a step 202 followed by loosely defined heuristic constraints and SVM (support vector machine). Using step 204 to verify the face, following step 204, determining whether or not the face has been authenticated and separating the execution flow of the program in two directions, and when the face is verified in step 206 Executed to generate a face ROI (Region of Interest) and a horopter range 208. If the face is not verified at step 206, control returns to step 200. Therefore, steps 200 to 206 are repeated until the face is verified.

図６は検証範囲の制限を示す。図６を参照して、左画像２２０Ｌの画素２２４Ｌが右画像２２０Ｒに対応の画素２２４Ｒを有すると仮定する。もし画像２２０Ｒを画像２２０Ｌの上に置いたとすれば、画素２２４Ｌと２２４Ｒとがホロプタの分だけずれた画像２２０が得られるはずである。この距離の最大値はホロプタサイズと顔のサイズとによって制限される。したがって、画素２２４Ｌに対応する画素２２４Ｒは画素２２４Ｌから距離Ｈｔ以内で、右画像内に見出されるはずである。画像２２０におけるこの画素２２４Ｌと２２４Ｒとの距離を「ディスパリティ」と呼ぶ。 FIG. 6 shows the limits of the verification range. Referring to FIG. 6, it is assumed that pixel 224L of left image 220L has a pixel 224R corresponding to right image 220R. If the image 220R is placed on the image 220L, an image 220 in which the pixels 224L and 224R are shifted by the amount of the horopter should be obtained. The maximum value of this distance is limited by the horopter size and the face size. Accordingly, the pixel 224R corresponding to the pixel 224L should be found in the right image within the distance Ht from the pixel 224L. The distance between the pixels 224L and 224R in the image 220 is referred to as “disparity”.

＜３Ｄ顔データの再構築＞
ステレオコンピュータビジョンアルゴリズムは、対象点の３Ｄ（Ｘ，Ｙ，Ｚ）座標の計算に、カメラキャリブレーションパラメータと共にディスパリティを用いる。ディスパリティアルゴリズムはしばしば、差の二乗又は正規化された相関係数法等の様々な手法を用いて、左カメラ画像のテンプレートウィンドウを右カメラ画像のエピポーラ線に沿ったウィンドウと比較する、探索メカニズムに依存する。 <Reconstruction of 3D face data>
The stereo computer vision algorithm uses disparity along with camera calibration parameters to calculate 3D (X, Y, Z) coordinates of the target point. The disparity algorithm often uses a variety of techniques such as the square of the difference or the normalized correlation coefficient method to compare the left camera image template window with the window along the epipolar line of the right camera image Depends on.

すなわち、マッチングアルゴリズムは、左画像の２ＤテンプレートウィンドウＡの、右画像の２ＤウィンドウＢに対する類似性の尺度を計算する必要がある。一般に、ＡとＢとは以下のように表せる。 That is, the matching algorithm needs to calculate a measure of the similarity of the 2D template window A of the left image to the 2D window B of the right image. In general, A and B can be expressed as follows.

ここでγ及びΓはそれぞれ左右のカメラの任意の利得／スケーリング値である。＿Ｘ（ここで＿は式中文字の上に付されたオーバーバーを示す。）は平均値を表し、〜Ｘ（〜は式中文字の上に付されるものである）はウィンドウデータの真正の形状又はテクスチャ特性を表す。すなわち、〜Ａと〜Ｂとが同じ物体の表面部分に属するものであれば、理想的な条件下ではこれらは同一となる。 Here, γ and Γ are arbitrary gain / scaling values of the left and right cameras, respectively. _X (where _ represents an overbar added to the letter in the formula) represents an average value, and ~ X (to is given to the letter in the formula) is the authenticity of the window data Represents the shape or texture characteristics. That is, if ~ A and ~ B belong to the same surface portion of the object, they are the same under ideal conditions.

しかし現実世界の条件では、ビデオノイズのために、又は〜Ａと〜Ｂとが何の情報も保持していない、固有の真正形状特性を欠くために、ディスパリティ探索アルゴリズムがうまく働かないことがある。ビデオノイズは、左右のカメラの異なるレンズ焦点、異なる視野角、及び異なる照明効果のために起こりうる。 However, under real-world conditions, disparity search algorithms may not work well due to video noise or due to lack of intrinsic true shape characteristics that ~ A and ~ B do not hold any information. is there. Video noise can occur due to different lens focus, different viewing angles, and different lighting effects of the left and right cameras.

一致検出の失敗又は誤検出の数を減じるために、顔のホロプタがステレオアイトラッキングからオンザフライで推定される。すなわち、まず最初に３Ｄ空間における左右の目の座標が計算され、左右の目の間の距離から、顔のサイズが推定される。ホロプタのサイズは顔のサイズに依存するので、これもまた推定される。その後、ディスパリティアルゴリズムの探索ドメインがホロプタサイズによりこの顔について制限される。結果として得られたディスパリティが、さらに画素近傍内での一貫性チェックによって検証される。ディスパリティ探索アルゴリズムのフローを図７に示す。 To reduce the number of coincidence detection failures or false detections, facial horopters are estimated on the fly from stereo eye tracking. That is, first, the coordinates of the left and right eyes in the 3D space are calculated, and the face size is estimated from the distance between the left and right eyes. Since the size of the horopter depends on the size of the face, this is also estimated. The search domain of the disparity algorithm is then limited for this face by the horopter size. The resulting disparity is further verified by a consistency check within the pixel neighborhood. The flow of the disparity search algorithm is shown in FIG.

図７はディスパリティ画像モジュール１５２を実現するソフトウェアのフロー図である。図７を参照して、キャリブレーションパラメータ、補正された左右の画像対、顔領域及びホロプタ範囲が与えられると、プログラムは、与えられた顔サイズとホロプタ範囲に対する相関ウィンドウサイズを規定するステップ２４０で始まり、さらに、ステップ２４０に続いて、顔領域内の各画素について、様々な類似性尺度を用いて、右画像の与えられたホロプタ範囲内でディスパリティ値を見出すステップ２４２と、ステップ２４２に続いて、顔領域の画素について全てのディスパリティが計算されたか否かを判断し、制御フローを２つに分岐させるステップ２４４と、を含む。すなわち、もし全てのディスパリティが計算されていれば、プログラムの実行は終了する。そうでなければ、制御フローはステップ２４２に戻る。 FIG. 7 is a software flowchart for realizing the disparity image module 152. Referring to FIG. 7, given calibration parameters, corrected left and right image pairs, face region and horopter range, the program at step 240 defines a correlation window size for the given face size and horopter range. Beginning and following step 240, following steps 242 to find disparity values within the given horopter range of the right image using various similarity measures for each pixel in the face region Determining whether all disparities have been calculated for the pixels in the face area, and branching the control flow into two steps 244. That is, if all disparities have been calculated, program execution is terminated. Otherwise, control flow returns to step 242.

図８は左画像２６０Ｌの画素２６２Ｌについてディスパリティがどのように計算されるかを示す図である。図８を参照して、左画像２６０Ｌ内でウィンドウ２６４Ｌが規定される。ウィンドウ２６４Ｌのサイズは、顔のサイズとホロプタとによって決定される。右画像２６０Ｒでは、画素２６２Ｌに対応して右画像２６０Ｒ中のエピポーラ線上に、２６４Ｌと同じサイズのウィンドウ２６４Ｒが規定される。両方向矢印２６６によって示されるようにウィンドウ２６４Ｒを移動させることにより、かつウィンドウ２６４Ｌ及び２６４Ｒ間に様々な類似性尺度を用いることにより、画素２６２Ｌに対応する画素２６２Ｒが探索される。探索空間は、画素２６２Ｌからホロプタ範囲内に制限される。 FIG. 8 is a diagram showing how disparity is calculated for the pixel 262L of the left image 260L. Referring to FIG. 8, a window 264L is defined in left image 260L. The size of the window 264L is determined by the face size and the horopter. In the right image 260R, a window 264R having the same size as that of 264L is defined on the epipolar line in the right image 260R corresponding to the pixel 262L. Pixel 262R corresponding to pixel 262L is searched by moving window 264R as indicated by double-headed arrow 266 and using various similarity measures between windows 264L and 264R. The search space is limited within the horopter range from pixel 262L.

＜３Ｄ表面再構築＞
左右画像から画素対とそのディスパリティとが与えられると、画素対に対応するボクセルの深さを、ステレオカメラ６０のキャリブレーションパラメータに基づいて決定することができる。ここで、ボクセルの深さとは、ステレオカメラ６０の画像面から３Ｄ空間におけるボクセルへの距離を意味する。この動作は、３Ｄ表面再構築モジュール１５４によって行われる。 <3D surface reconstruction>
Given a pixel pair and its disparity from the left and right images, the depth of the voxel corresponding to the pixel pair can be determined based on the calibration parameters of the stereo camera 60. Here, the depth of the voxel means a distance from the image plane of the stereo camera 60 to the voxel in the 3D space. This operation is performed by the 3D surface reconstruction module 154.

＜ノイズ消去及び補間＞
ノイズ消去及び補間アルゴリズムは、３Ｄデータからスプリアスノイズを消去し、欠落したボクセルを回復する。この実現例では、欠落したボクセルは画素近傍の線形補間により推定される。 <Noise elimination and interpolation>
The noise cancellation and interpolation algorithm eliminates spurious noise from 3D data and recovers missing voxels. In this implementation, missing voxels are estimated by linear interpolation near the pixels.

図９を参照して、ノイズ消去及び補間モジュール１５６は、各ボクセルの深さを平均深さ及びボクセルを包含する予め定められたウィンドウ内の深さの局部標準偏差と比較してボクセルを有効ボクセルと無効ボクセルとに分類することによって、３Ｄ表面再構築モジュール１５４から出力された３Ｄ顔データからノイズを消去するノイズ消去モジュール２８０と、有効な３Ｄ顔データに平滑化アルゴリズムを適用することによって３Ｄ顔データを平滑化するための平滑化モジュール２８２と、近接するボクセルの線形補間により、欠落した（無効の）ボクセルを推定するための補間モジュール２８４と、を含む。 Referring to FIG. 9, the noise cancellation and interpolation module 156 compares the depth of each voxel with the average depth and the local standard deviation of the depth within a predetermined window that contains the voxels to validate the voxel. And 3D face by applying a smoothing algorithm to valid 3D face data, and a noise elimination module 280 that eliminates noise from the 3D face data output from the 3D surface reconstruction module 154 A smoothing module 282 for smoothing the data and an interpolation module 284 for estimating missing (invalid) voxels by linear interpolation of neighboring voxels.

図１０はノイズ消去モジュール２８０を実現するためのプログラムのフロー図である。図１０を参照して、プログラムは、左画像内の顔領域と関連した、再構築された初期の稠密な表面データが与えられると、局部標準偏差推定のための与えられた顔サイズに対しウィンドウサイズを規定するステップ３００から始まり、ステップ３００に続いて、左画像の顔領域内の各画素について、ｚ軸（深さ）平均とそのウィンドウ内の画素の局部標準偏差とを計算するステップ３０２と、ステップ３０２に続いて、その平均からのｚ軸値の偏差がその標準偏差より小さいか否かを判断するステップ３０４と、を含む。 FIG. 10 is a flowchart of a program for realizing the noise elimination module 280. Referring to FIG. 10, the program is given a window for a given face size for local standard deviation estimation given the reconstructed initial dense surface data associated with the face area in the left image. Starting with step 300 defining the size, and following step 300, calculating 302 the z-axis (depth) average and the local standard deviation of the pixels in the window for each pixel in the face area of the left image; , Step 302 includes determining 304 whether the deviation of the z-axis value from the average is less than the standard deviation.

プログラムはさらに、ステップ３０４での答えがＹＥＳであった場合に実行される、ボクセルの測定を有効と設定するステップ３０６と、ステップ３０４での答えがＮＯであった場合に実行される、ボクセルの測定を無効と設定するステップ３０８と、を含む。ステップ３０６及び３０８の後、制御はステップ３０２に戻る。図１０では明示しないが、ステップ３０４で全てのボクセルが検討されると、制御はこのプログラムから復帰する。 The program further executes step 306 to set the voxel measurement to valid, which is executed when the answer at step 304 is YES, and is executed when the answer at step 304 is NO. Setting the measurement invalid. After steps 306 and 308, control returns to step 302. Although not explicitly shown in FIG. 10, when all voxels have been considered in step 304, control returns from this program.

したがって、制御がこのプログラムから復帰するときには、全てのボクセルに有効か無効かのラベルが付されていることになる。 Therefore, when control returns from this program, all voxels are labeled as valid or invalid.

この処理を図１１に示す。図１１を参照して、左画像３２０Ｌと右画像３２０Ｒとから、画素３２２Ｌと３２２Ｒとのディスパリティと、キャリブレーションパラメータ８２とを用いて、顔領域の各ボクセルの深さが計算される（３３０）。結果として得られる、深さ値を有する稠密な表面データ３３２が、ノイズ消去プログラムに与えられる。稠密な表面データ３３２の各ボクセルについて、このボクセルを包含するウィンドウの平均深さからのその深さの偏差が計算され、ウィンドウの標準深さ偏差と比較され、これに従って、その深さ偏差が標準偏差より大きいか否かによって、有効／無効ボクセルマップ３３４に示されるように、ボクセルに有効又は無効のラベルが付される。 This process is shown in FIG. Referring to FIG. 11, the depth of each voxel in the face area is calculated from the left image 320L and the right image 320R using the disparity of the pixels 322L and 322R and the calibration parameter 82 (330). ). The resulting dense surface data 332 with depth values is provided to the noise cancellation program. For each voxel in the dense surface data 332, the deviation of that depth from the average depth of the window containing this voxel is calculated and compared to the standard depth deviation of the window, and accordingly the depth deviation is standard. Depending on whether it is larger than the deviation, as shown in the valid / invalid voxel map 334, the voxel is labeled as valid or invalid.

図１２は図９に示す平滑化モジュール２８２を実現するプログラムのフロー図である。図１２を参照して、プログラムは、左画像の顔領域に関連したノイズ消去後の表面データが与えられると、推定された顔サイズのための平滑化ウィンドウサイズを規定するステップ３５０で始まり、ステップ３５０に続いて、顔領域において有効な３Ｄ測定値を有する各画素について、平滑化ウィンドウ内のその平均値を計算し、これをその画素に新たな３Ｄ値として割当るステップ３５２と、ステップ３５２に続いて、顔領域の全ての画素が処理されたか否かを判断するステップ３５４と、を含む。もしステップ３５４の答えがＹＥＳであれば、制御はルーチンを出る。そうでなければ、制御はステップ３５２に戻る。 FIG. 12 is a flowchart of a program for realizing the smoothing module 282 shown in FIG. Referring to FIG. 12, the program begins at step 350, which defines a smoothed window size for the estimated face size, given the noise-removed surface data associated with the face area of the left image. Following 350, for each pixel having a valid 3D measurement in the face area, calculate its average value in the smoothing window and assign it to that pixel as a new 3D value; Subsequently, a step 354 for determining whether or not all the pixels of the face area have been processed is included. If the answer to step 354 is yes, control exits the routine. Otherwise, control returns to step 352.

図１３は、図９に示す補間モジュール２８４を実現するフロー図である。図１３を参照して、プログラムは、左画像の顔領域に関連した平滑化後の３Ｄ表面データが与えられると、顔領域において無効の３Ｄ測定値を有する各画素について、その近傍の画素のデータの線形補間によって３Ｄ測定値を推定するステップ３７０で始まり、ステップ３７０に続いて、顔領域の全ての画素が処理されたか否かを判断するステップ３７２を含む。 FIG. 13 is a flowchart for realizing the interpolation module 284 shown in FIG. Referring to FIG. 13, when smoothed 3D surface data related to the face area of the left image is given, the program for each pixel having an invalid 3D measurement value in the face area is the data of the neighboring pixels. Starting with step 370 of estimating the 3D measurement value by linear interpolation of step 370, step 370 is followed by step 372 of determining whether all the pixels of the face region have been processed.

プログラムはさらに、ステップ３７２での答えがＹＥＳであった場合に実行される、顔領域について図１２に示された平滑化アルゴリズムを実行するステップ３７４を含む。ステップ３７４が完了すると、制御はこのルーチンから復帰する。 The program further includes a step 374 of executing the smoothing algorithm shown in FIG. 12 for the face region, which is executed if the answer in step 372 is YES. When step 374 is complete, control returns from this routine.

ステップ３７２での答えがＮＯであれば、制御はステップ３７０に戻る。 If the answer at step 372 is no, control returns to step 370.

＜３Ｄ顔平面抽出＞
図１４はこの実施の形態で用いられる３Ｄ顔データ抽出を実現するプログラムを実現するプログラムのフロー図である。図１４を参照して、左画像の顔領域に関連した稠密な３Ｄ表面データ（ボクセル）が与えられると、プログラムはステップ３９０で始まり、ここで鼻と口の領域を含むたんざく形領域を除いた、目と顎との間のボクセルデータを用いて、顔の平面が推定される。この実施の形態では、新規な３Ｄ顔平面推定アルゴリズムがステップ３９０で用いられる。このステップの詳細は図１５から図２１を参照して後述する。 <3D face plane extraction>
FIG. 14 is a flowchart of a program for realizing a program for realizing 3D face data extraction used in this embodiment. Referring to FIG. 14, given dense 3D surface data (voxels) related to the face area of the left image, the program begins at step 390, where the nail-shaped areas including the nose and mouth areas are excluded. The face plane is estimated using the voxel data between the eyes and the jaw. In this embodiment, a novel 3D face plane estimation algorithm is used at step 390. Details of this step will be described later with reference to FIGS.

プログラムはさらに、ステップ３９０に続いて、顔平面への距離が±δ（δは予め定められたしきい値）以内である全てのボクセルが顔ボクセルとして抽出されるステップ３９２を含む。 Following the step 390, the program further includes a step 392 in which all voxels whose distance to the face plane is within ± δ (δ is a predetermined threshold) are extracted as face voxels.

プログラムはさらに、顔領域内の全てのボクセルが処理されたか否かを判断するステップ３９４を含む。全てのボクセルが処理されていれば、制御はこのルーチンを出る。そうでなければ、制御はステップ３９２に戻る。 The program further includes a step 394 that determines whether all voxels in the face region have been processed. If all voxels have been processed, control exits this routine. Otherwise, control returns to step 392.

図１５は図１４のステップ３９０で利用される新規な３Ｄ顔平面推定アルゴリズムを実施するプログラムの詳細なフロー図である。顔の平面は両目と顎との間で、口と鼻とを含むたんざく形部分を除いた部分の３Ｄ顔データから推定される。しかし、頭部が極端な向きであると、３Ｄ顔データはアウトライアを含む可能性がある。このため、繰返し最小二乗法を利用し、顔データと推定された面との間の誤差ヒストグラムを用いて、ある高い誤差のある点を消去する。繰返しは２回又は３回で止まるので、重い計算上の負担は生じない。（ａＸ＋ｂＹ＋ｃＺ＋ｄ＝０）における顔平面パラメータａ、ｂ、ｃ及びｄは、残りの３Ｄ顔データで、最小二乗解を用いてステップ３９０で推定される。 FIG. 15 is a detailed flow diagram of a program that implements the novel 3D face plane estimation algorithm utilized in step 390 of FIG. The plane of the face is estimated from the 3D face data of the part between the eyes and the chin, excluding the tongue-shaped part including the mouth and nose. However, if the head is in an extreme orientation, the 3D face data may include an outlier. For this reason, a point with a certain high error is eliminated using an error histogram between the face data and the estimated surface by using an iterative least square method. Since the repetition stops in 2 or 3 times, a heavy computational burden does not occur. The face plane parameters a, b, c and d at (aX + bY + cZ + d = 0) are the remaining 3D face data and are estimated at step 390 using the least squares solution.

図１５を参照して、このルーチンは、左画像内の顔領域に関連する稠密な３Ｄ表面データ（ボクセル）が与えられるとステップ４１０で始まり、ここで両目と顎との間で鼻と口との区域を含むたんざく形部分を除いた対称ボクセルデータが３Ｄ目線に直交する顔の対称線の両側から抽出される。 Referring to FIG. 15, this routine begins at step 410 given the dense 3D surface data (voxels) associated with the facial region in the left image, where the nose and mouth between both eyes and chin. The symmetric voxel data excluding the tangled portion including the area is extracted from both sides of the symmetric line of the face orthogonal to the 3D line of sight.

図１６から図１９にこの処理を示す。図１６を参照して、左カメラ画像４３０において、左目の位置４４０と右目の位置４４２とが与えられると、目の位置４４０及び４４２を結ぶ線４４４上の中心点４４６が規定される。その後線４４４から特定の距離にあるいくつかの点４６０、４６２、４６４、…が３Ｄ空間内に規定される。これらの点の各々について、中心点４４６から３Ｄベクトルが規定される。このベクトルと線４４４に平行なベクトルとの内積を計算することで、どの点が両目と顎との間の対称データを規定するかが決定される。 This process is shown in FIGS. Referring to FIG. 16, when a left eye position 440 and a right eye position 442 are given in the left camera image 430, a center point 446 on a line 444 connecting the eye positions 440 and 442 is defined. A number of points 460, 462, 464,... At a specific distance from line 444 are then defined in 3D space. For each of these points, a 3D vector is defined from the center point 446. By calculating the inner product of this vector and a vector parallel to line 444, it is determined which point defines the symmetry data between both eyes and jaw.

図１６において点４６２が対称データであると仮定して、点４６２の下に別の点の組を仮に規定する。別の対称データ点がこの点の組から規定されることになる。この動作を繰返すことにより、図１７に示すように、中心点４４６と顎のと間の対称データ点の組４８０が規定される。この線は顔の対称線であり、この実施の形態ではこれを「Ｓｌｉｎｅ」と呼ぶ。 In FIG. 16, assuming that the point 462 is symmetric data, another set of points is provisionally defined under the point 462. Another symmetric data point will be defined from this set of points. By repeating this operation, a set of symmetrical data points 480 between the center point 446 and the jaw is defined as shown in FIG. This line is a symmetric line of the face, and in this embodiment this is called “Sline”.

図１８を参照して、対称データ点４８０の各々、例えば対称データ点５００に対して、特定の長さの、線４４４に平行な線５０２が選択される。これらの線が、両目と顎との間の正面顔区域を形成する。選択された区域の幅は、この実施の形態では積分された目の距離の１．３倍であり、高さは顎が境界となる。 Referring to FIG. 18, for each symmetric data point 480, eg, symmetric data point 500, a particular length of line 502 parallel to line 444 is selected. These lines form the frontal face area between both eyes and chin. The width of the selected area is 1.3 times the integrated eye distance in this embodiment, and the height is bounded by the jaw.

図１８を参照して、このようにして規定された正面顔区域が３つのたんざく形領域に分割される。これらの線の中央部は除外されこれらの線の残りの部分が、鼻と口とを含むたんざく形領域を除いた、両目と顎との間の対称ボクセル区域５０４（左顔：ＬＦ）と５０６（右顔：ＲＦ）とを形成する。ＬＦたんざく形領域及びＲＦたんざく形領域は正面顔区域全体の１／４に等しく、中心のたんざく形領域は１／２に等しい。 Referring to FIG. 18, the front face area thus defined is divided into three tangential areas. The central part of these lines is excluded and the rest of these lines is a symmetrical voxel area 504 (left face: LF) between the eyes and chin, excluding the tangled area including the nose and mouth. 506 (right face: RF). The LF and RF areas are equal to 1/4 of the entire front face area, and the central area is equal to 1/2.

これらの区域５０４と５０６とを、図１９で顔画像上に示す。 These areas 504 and 506 are shown on the face image in FIG.

再び図１５を参照して、プログラムはさらに、区域５０４及び５０６中の有効なボクセルデータに平面を当てはめることによって顔の平面を推定するステップ４１２を含む。計算効率のため、この当てはめには、全てのボクセルではなく、区域５０４及び５０６において（図１９参照）疎にサンプリングされた線上のボクセルのみを用いる。 Referring again to FIG. 15, the program further includes a step 412 of estimating the face plane by fitting the plane to valid voxel data in areas 504 and 506. For computational efficiency, this fit uses only voxels on sparsely sampled lines in areas 504 and 506 (see FIG. 19), rather than all voxels.

プログラムはさらに、ステップ４１２に続いて、所与の全てのボクセルデータを推定された顔の平面に当てはめて評価し、最も悪い当てはめデータのε％を無効にするステップ４１４と、ステップ４１４に続いて、当てはめ誤差率が１ｃｍ（１０ｍｍ）より小さいか否かを判断するステップ４１６と、を含む。もし全ての当てはめ誤差が１ｃｍより小さければ、制御はこのプログラムを出る。そうでなければ、制御はステップ４１２に戻る。 Following the step 412, the program further evaluates all the given voxel data by applying it to the estimated face plane and invalidates the ε% of the worst fit data, followed by the step 414. And 416 for determining whether the fitting error rate is smaller than 1 cm (10 mm). If all fit errors are less than 1 cm, control exits this program. Otherwise, control returns to step 412.

当てはめアルゴリズムの詳細を以下で説明する。この処理は顔平面推定のための、繰返し最小二乗当てはめ及び評価アルゴリズムである。一般に、ステレオデータからの当てはめで得られた３Ｄ顔平面は、頭部が動くと顔の可視データ量が変動するため、誤差を生じやすい。したがって、顔の両側からの対称データが選択されるが、このデータセットでは、鼻及び口といった個々人の顔の特徴にかかる区域は含まれない。このため、正面３Ｄ顔データは図１９に示すように、両目に整列した３個のたんざく形領域に分割される。 Details of the fitting algorithm are described below. This process is an iterative least squares fitting and evaluation algorithm for face plane estimation. In general, the 3D face plane obtained by fitting from stereo data tends to cause an error because the amount of visible data of the face varies when the head moves. Thus, symmetric data from both sides of the face is selected, but this data set does not include areas for individual facial features such as nose and mouth. For this reason, as shown in FIG. 19, the front 3D face data is divided into three simple regions aligned in both eyes.

実際的な理由から、データの疎なサンプリングが行われる（例えば、両目と顎の線の間で２０行）。最後に、部分的なオクルージョン（部分的に隠れること）又は頭部の動きからくると思われるアーチファクトを繰返し最小二乗当てはめと除去−加算アルゴリズムとによって消去する。このアルゴリズムで用いられるパラメータは、４０人を上回る人の対応する顔画像を検討することで、経験的に決定された。 For practical reasons, sparse sampling of the data is performed (eg, 20 lines between the eyes and jaw lines). Finally, artifacts that appear to come from partial occlusion (partial hiding) or head movement are eliminated by iterative least squares fitting and removal-addition algorithms. The parameters used in this algorithm were determined empirically by examining the corresponding face images of over 40 people.

抽出された３Ｄ顔データの組が与えられた場合、アルゴリズムのフローは以下のように記すことができる。 Given a set of extracted 3D face data, the algorithm flow can be written as:

２）両目間の３Ｄ顔表面データから積分ユークリッド距離を計算する。 2) Calculate the integrated Euclidean distance from the 3D face surface data between the eyes.

３）中心が各行について顔の対称点と整列している、正面顔データ区域を規定する。選択された区域の幅は積分目距離のξ（ξ＝１．３）倍であり、高さは顎によって規定される。 3) Define a frontal face data area whose center is aligned with the face symmetry point for each row. The width of the selected area is ξ (ξ = 1.3) times the integral distance, and the height is defined by the jaw.

４）正面顔区域を３つのたんざく形領域に分割する。左顔（ＬＦ）たんざく形領域と右顔（ＲＦ）たんざく形領域とは全正面顔区域の１／４であり、中央たんざく形領域は全正面顔区域の１／２である。 4) Divide the front face area into three tangential areas. The left face (LF) flank area and the right face (RF) flank area are 1/4 of the total front face area, and the central face area is 1/2 of the total front face area.

５）３Ｄデータ区域において３Ｄ目線に平行な疎のデータ行を規定し、ＬＦ行とＲＦ行との双方において、内側から外へと同じ長さのデータを収集する（顎まで全ての行についてこれを繰返す）。 5) Define a sparse data row parallel to the 3D line in the 3D data area and collect the same length of data from the inside to the outside in both the LF row and the RF row (this for all rows up to the jaw) Repeat).

６）繰返し：最小二乗当てはめアルゴリズムによって最もよく当てはまる３Ｄ平面のパラメータ（顔データ座標Ｘｉ，Ｙｉ，Ｚｉ、ここでｉ＝１，２… Ｎｋ、Ｎｋはｋ回目の繰返しにおけるサンプル数、が与えられたときのａ，ｂ，ｃ及びｄ）を推定する。 6) Iteration: 3D plane parameters best fit by the least square fitting algorithm (face data coordinates Xi, Yi, Zi, where i = 1, 2,... Nk, Nk is the number of samples in the kth iteration) A, b, c and d) are estimated.

８）評価：もし誤差がτ（τ＝１０）ミリメートルより小さいか、又は繰返し数がκ（この実施の形態では、κ＝４）より大きければ、ステップ１１（ＥＮＤ）に進む。 8) Evaluation: If the error is less than τ (τ = 10) millimeters or the number of repetitions is greater than κ (in this embodiment, κ = 4), proceed to step 11 (END).

９）高い誤差を有する顔データのε（ε＝５）％を組から除去。 9) Remove ε (ε = 5)% of face data having high error from the set.

１０）もし以前に破棄されたデータの誤差距離が新たに推定された面からτミリメートルより小さい場合、このデータを組に加え、「繰返し」に進む。 10) If the error distance of previously discarded data is less than τ millimeters from the newly estimated surface, add this data to the set and go to “Repetition”.

１１）ＥＮＤ。 11) END.

このアルゴリズムはパラメータをどのように選択するかによって大きな影響を受けない。幅パラメータξは０．９より大きくてもよいが、１．５より小さくなければならない。誤差しきい値パラメータτは５ミリメートルより大きくてもよいが、１５ミリメートルより小さくなければならない。データ除去率は１％から１０％の間であればよく、最大繰返し数は２から１０の間である。 This algorithm is not greatly affected by how the parameters are selected. The width parameter ξ may be greater than 0.9 but must be less than 1.5. The error threshold parameter τ may be greater than 5 millimeters but must be less than 15 millimeters. The data removal rate may be between 1% and 10%, and the maximum number of repetitions is between 2 and 10.

図２０はステップ４１２から４１６までの繰返しの例を示す。図２０を参照して、この例では、第１回の繰返しで最も上の誤差ヒストグラムが得られた。これには１０ｍｍをこえるロングテールがあった。最も悪い当てはめデータのε％を無効にし、２回目の繰返しを行なった。２回目と３回目の繰返しで、図２０の２番目と３番目のヒストグラムが得られた。「テール」が短くなっているのがわかる。しかしながら依然として、３回目の繰返しでも誤差のいくつかは１０ｍｍを超えている。 FIG. 20 shows an example of repetition from steps 412 to 416. Referring to FIG. 20, in this example, the top error histogram was obtained in the first iteration. This had a long tail exceeding 10 mm. The second iteration was performed with ε% of the worst fitting data invalidated. In the second and third iterations, the second and third histograms of FIG. 20 were obtained. You can see that the “tail” is getting shorter. However, some errors still exceed 10 mm even in the third iteration.

図２０の最も下のヒストグラムからわかるように、４回目の繰返し後は全ての誤差が当てはめられた顔の平面から１０ｍｍ以内となり、繰返しは終了する。 As can be seen from the lowest histogram in FIG. 20, after the fourth iteration, the error is within 10 mm from the plane of the face to which all errors have been applied, and the iteration ends.

図２１は、図１４のステップ３９２での顔抽出を示す。図２１を参照して、顔の平面５２０がステップ３９０で推定され、顔の平面５２０から距離が±δ以内の全てのボクセルが顔ボクセルとして抽出される。 FIG. 21 shows face extraction in step 392 of FIG. Referring to FIG. 21, a face plane 520 is estimated in step 390, and all voxels having a distance within ± δ from the face plane 520 are extracted as face voxels.

図３に示すテクスチャマッピングモジュール１６０が、３Ｄ顔抽出モジュール１５８によって抽出された顔データにテクスチャをマッピングする。 The texture mapping module 160 shown in FIG. 3 maps the texture onto the face data extracted by the 3D face extraction module 158.

図２２は（Ａ）及び（Ｂ）でステレオ画像対を示し、（Ｃ）でそのディスパリティを示す。検出された両目間の位置が、左カメラ画像及び右カメラ画像の両方でマークされ、これらが顔のホロプタ推定に利用される。 In FIG. 22, (A) and (B) show a stereo image pair, and (C) shows the disparity. The detected position between both eyes is marked in both the left camera image and the right camera image, and these are used for face horopter estimation.

図２３は（Ａ）及び（Ｂ）でステレオ対を示し、（Ｃ）でそれらの抽出され再構築された３Ｄ顔データを示す。顔ホロプタの外側では、３Ｄの再構築の信頼性が低いことが認められる。 FIG. 23 shows stereo pairs in (A) and (B), and shows their extracted and reconstructed 3D face data in (C). It can be seen that outside the face horopta, the reliability of the 3D reconstruction is low.

図２４は異なる視野角から保存された３Ｄ顔画像を示す。 FIG. 24 shows 3D face images stored from different viewing angles.

＜３Ｄ顔座標系の形成＞
ある面のベクトルとその面の法線とで、任意の３Ｄ座標系を説明することができる。したがって、３Ｄの目の位置と顔平面を得た後のこの実施の形態の目標とするところは、フレームごとに頭部の姿勢を推定するための３Ｄ顔座標系を形成することである。これは、図１に示す３Ｄ頭部姿勢推定モジュール８８によって行われ、その詳細を図２５に示す。 <Formation of 3D face coordinate system>
An arbitrary 3D coordinate system can be described by a vector of a surface and the normal of the surface. Therefore, the goal of this embodiment after obtaining the 3D eye position and face plane is to form a 3D face coordinate system for estimating the head pose for each frame. This is performed by the 3D head posture estimation module 88 shown in FIG. 1, and details thereof are shown in FIG.

図２５を参照して、３Ｄ頭部姿勢推定モジュール８８は、左右の３Ｄ目位置ｅ_Ｌｔ及びｅ_Ｒｔと３Ｄ顔平面パラメータＰｔとを受けるように結合され、図３の３Ｄ表面抽出モジュール１５８から３Ｄ顔座標系を形成しかつ顔座標系を表すマトリックスＦＣＳ_ｔを出力するための３Ｄ顔座標系形成モジュール５４０と、３Ｄ顔座標系形成モジュール５４０から出力されたマトリックスＦＣＳｔを受けるように結合され、後述する３Ｄ頭部姿勢マトリックスＭを推定するための３Ｄ頭部姿勢推定モジュール５４２と、を含む。 Referring to FIG. 25, the 3D head posture estimation module 88 is coupled to receive the left and right 3D eye positions e _Lt and e _Rt and the 3D face plane parameter Pt, and the 3D surface extraction modules 158 to 3D of FIG. A 3D face coordinate system forming module 540 for forming a face coordinate system and outputting a matrix FCS _t representing the face coordinate system is coupled to receive a matrix FCSt output from the 3D face coordinate system forming module 540. A 3D head posture estimation module 542 for estimating a 3D head posture matrix M.

この実施の形態では、３Ｄ顔座標系は目線、顔の平面及び顔の平面の法線で規定される。 In this embodiment, the 3D face coordinate system is defined by the line of sight, the face plane, and the normal of the face plane.

顔の平面のパラメータａ，ｂ，ｃ及びｄは図１４に示すステップ３９０で計算される。顔平面の法線が利用可能であり、以下のベクトルで表される。 Face plane parameters a, b, c and d are calculated in step 390 shown in FIG. The normal of the face plane is available and is represented by the following vector:

Ｅ１とＥ２とを、以下のような３Ｄ空間における顔平面の左右の目の位置とする。 Let E1 and E2 be the positions of the left and right eyes of the face plane in the following 3D space.

ここで＾Ｚｉ（「＾」は式中文字の上に付されたものである）はそのＸｉ及びＹｉの値が与えられたときの顔平面上の再計算されたＺ値である。３Ｄ空間における目線（Ｅｌｉｎｅ）の式は、点Ｅ１及びＥ２を利用して、以下のように規定することができる。 Here, {circumflex over (Z)} ("^" is added to the letter in the formula) is the recalculated Z value on the face plane when the values of Xi and Yi are given. The expression of the line of sight (Eline) in the 3D space can be defined as follows using the points E1 and E2.

ここでｔはＥｌｉｎｅ上の点のスカラー値（ｔ∈Ｒ）であり、→Ｖ_x＝→Ｅ₁Ｅ₂（「→」は式中文字の上に付されたものである）が成り立ち、これは顔平面の法線に垂直なベクトルを規定する。 Here, t is a scalar value (tεR) of a point on Eline, and → V _x = → E ₁ E ₂ (“→” is added on the letter in the formula), Defines a vector perpendicular to the normal of the face plane.

顔平面の法線であるベクトル→Ｖｚと、式３の３Ｄ目線から得られるベクトル→Ｖｘとのクロス乗積はベクトル→Ｖ_ｙとなり、これは→Ｖ_ｘと→Ｖ_ｙとの両方に垂直である。これら３つのベクトルが顔座標系を形成する。したがって、顔座標系のｘ軸とｚ軸とは３Ｄの目の位置と３Ｄの顔平面とにそれぞれロックされる。したがって、３Ｄ座標系の形成は、フレームごとに繰返して可能である。 The cross product of the vector → Vz which is the normal of the face plane and the vector → Vx obtained from the 3D line of Equation 3 becomes the vector → V _y , which is perpendicular to both → V _x and → V _y. is there. These three vectors form a face coordinate system. Therefore, the x-axis and z-axis of the face coordinate system are locked to the 3D eye position and the 3D face plane, respectively. Therefore, the formation of the 3D coordinate system can be repeated for each frame.

３Ｄ顔座標系形成モジュール５４０及び３Ｄ頭部姿勢推定モジュール５４２は共に、図２６及び図３０でそれぞれ示されるコンピュータプログラムルーチンで実現される。 Both the 3D face coordinate system formation module 540 and the 3D head posture estimation module 542 are implemented by computer program routines shown in FIGS. 26 and 30, respectively.

図２６及び図２７を参照して、３Ｄ顔座標系形成モジュール５４０を実現するコンピュータプログラムルーチンは、ある時点での３Ｄの目の位置５８０Ｌ及び５８０Ｒと３Ｄ顔平面パラメータとが与えられると、ステップ５６０で開始し、ここで左の目の位置５８０Ｌから右の目の位置５８０Ｒまでの３Ｄ目線５８２と、３Ｄの目線５８２に平行な単位ベクトル５８６と、が計算される。この実施の形態では、単位ベクトル５８６の始点は両目間の点５８４である。 Referring to FIGS. 26 and 27, the computer program routine that implements the 3D face coordinate system formation module 540, given the 3D eye positions 580L and 580R at a certain time and the 3D face plane parameters, step 560. Here, a 3D eye line 582 from the left eye position 580L to the right eye position 580R and a unit vector 586 parallel to the 3D eye line 582 are calculated. In this embodiment, the starting point of the unit vector 586 is a point 584 between both eyes.

図２６及び図２８を参照して、プログラムはさらに、ステップ５６０に続いて、顔座標系のｚ軸へ顔平面５２０の単位法線ベクトル５８８を割当て、さらに、顔座標系のｘ軸へ３Ｄ目線５８２に平行な単位ベクトル５８６を割当るステップ５６２を含む。 26 and 28, the program further continues to step 560 by assigning a unit normal vector 588 of the face plane 520 to the z-axis of the face coordinate system, and 3D line of sight to the x-axis of the face coordinate system. Assigning a unit vector 586 parallel to 582.

図２６及び図２９を参照して、プログラムはさらに、ステップ５６２に続いて、顔座標系のｚ軸ベクトル５８８とｘ軸ベクトル５８６とのクロス乗積５９０を計算して、顔座標系のｙ軸ベクトル５９０を得るステップ５６４を含む。 26 and 29, the program further calculates the cross product 590 of the z-axis vector 588 of the face coordinate system and the x-axis vector 586 following the step 562 to obtain the y-axis of the face coordinate system. Step 564 of obtaining a vector 590 is included.

プログラムルーチンは３個のベクトル５８６、５９０及び５８８を用いて３Ｄ座標系を形成する最後のステップ５６６を含む。 The program routine includes a final step 566 that uses the three vectors 586, 590 and 588 to form a 3D coordinate system.

＜３Ｄ頭部姿勢推定＞
顔座標系とグローバル（基準）座標系との間の、この３Ｄ頭部姿勢を表す変換は以下のようになる。 <3D head posture estimation>
The transformation representing this 3D head posture between the face coordinate system and the global (reference) coordinate system is as follows.

２つの座標系の間の回転のみの変換は次のように表すことができる。 A rotation-only transformation between two coordinate systems can be expressed as:

図３１を参照して、一般に、顔座標系６３２は世界座標系６３０を回転させ平行移動させた形であると考えることができる。すなわち、顔座標系６３２と世界座標系６３０とは図３１に破線矢印６３４で示すように、３×３の回転マトリックスＲ（Φ）と３×１平行移動ベクトルＴとによって関連付けられている。 With reference to FIG. 31, generally, the face coordinate system 632 can be considered to be a shape obtained by rotating and translating the world coordinate system 630. That is, the face coordinate system 632 and the world coordinate system 630 are associated with each other by a 3 × 3 rotation matrix R (Φ) and a 3 × 1 translation vector T as indicated by a dashed arrow 634 in FIG.

したがって、式４における３Ｄ頭部姿勢マトリックスＲ（Φ）の解は自明となり、以下で表すことができる。 Therefore, the solution of the 3D head posture matrix R (Φ) in Equation 4 is self-evident and can be expressed as:

式８を満たすそれぞれの軸に対する回転角は以下で与えられる。 The rotation angle for each axis that satisfies Equation 8 is given by:

この処理は、コンピュータプログラムルーチンで実現される。このルーチンのフロー図を図３０に示す。図３０を参照して、ある時点での顔座標系が与えられると、ルーチンは式（９）から（１１）により、各フレーム時点について、基準（共通）座標系６３０に対する３Ｄ頭部姿勢を推定するステップ６１０で始まる。ステップ６１０の後、制御はこのルーチンを出る。 This process is realized by a computer program routine. A flowchart of this routine is shown in FIG. Referring to FIG. 30, when a face coordinate system at a certain time point is given, the routine estimates a 3D head posture with respect to the reference (common) coordinate system 630 for each frame time point according to equations (9) to (11). Begin at step 610. After step 610, control exits this routine.

［動作］
図１から図３０を参照して、上述の３Ｄ顔姿勢推定システム５０は以下のように動作する。特に図１を参照して、始めにキャリブレーションが行われる。このキャリブレーションプロセスにおいて、予め定められたパターンプレートがステレオカメラ６０に提示され、キャリブレーションソフトウェア８０がキャリブレーションパラメータを計算する。パラメータはキャリブレーションパラメータメモリ８２に記憶される。 [Operation]
With reference to FIGS. 1 to 30, the 3D face posture estimation system 50 described above operates as follows. With particular reference to FIG. 1, calibration is first performed. In this calibration process, a predetermined pattern plate is presented to the stereo camera 60, and the calibration software 80 calculates calibration parameters. The parameters are stored in the calibration parameter memory 82.

動作において、システム５０はまず、ステレオカメラ６０からステレオ画像を取得する。キャプチャされた画像は、キャリブレーションパラメータメモリ８２に記憶されたキャリブレーションパラメータを用いて補正ソフトウェア８４によって補正され、エピポーラ線が画像の行と対応するようにされる。補正された画像は、３Ｄ顔再構築モジュール８６に与えられる。 In operation, the system 50 first acquires a stereo image from the stereo camera 60. The captured image is corrected by the correction software 84 using the calibration parameters stored in the calibration parameter memory 82 so that the epipolar lines correspond to the rows of the image. The corrected image is provided to the 3D face reconstruction module 86.

特に図３を参照して、ホロプタ推定モジュール１５０は、ステレオ画像を利用して顔のホロプタを推定する。ホロプタ推定モジュール１５０はこの実現例ではステレオアイトラッキングアルゴリズムに依拠する。ステレオアイトラッキングにより、ホロプタ推定モジュール１５０は顔ホロプタ情報を生成することができる。顔ホロプタ情報ｈｔは左右の画像において対応する画素を見出すための探索区域ｈｔ＝［ｄ１，ｄ２］を計算する助けとなる。ディスパリティ探索アルゴリズムは、より良いディスパリティの結果を得るために、ホロプタ情報と相関テンプレートサイズとを利用する。 With particular reference to FIG. 3, the horopter estimation module 150 estimates a face horopter using a stereo image. The horopter estimation module 150 relies on a stereo eye tracking algorithm in this implementation. Stereo eye tracking allows the horopter estimation module 150 to generate face horopter information. The face horopter information ht helps to calculate the search area ht = [d1, d2] for finding the corresponding pixels in the left and right images. The disparity search algorithm uses the horopter information and the correlation template size to obtain a better disparity result.

ホロプタ情報と左右の画像Ｌｉ及びＲｉと、計算パラメータＡｉとがディスパリティ画像モジュール１５２に与えられる。ディスパリティ画像モジュール１５２では、画像中の各画素についてディスパリティＤｔが計算される。ある点の３Ｄの位置は、＜ｘ,ｙ＞座標と、そのディスパリティとが与えられると、先行技術のステレオ再構築アルゴリズムで再構築することができる。３Ｄ表面は計算されたデータに基づいて再構築される。 The horopter information, the left and right images Li and Ri, and the calculation parameter Ai are given to the disparity image module 152. In the disparity image module 152, a disparity Dt is calculated for each pixel in the image. The 3D position of a point can be reconstructed with prior art stereo reconstruction algorithms given the <x, y> coordinates and their disparity. The 3D surface is reconstructed based on the calculated data.

再構築された３Ｄ顔表面はいくつかのアウトライアを含むので、ノイズ消去及び補間モジュール１５６が無効ボクセルを消去し、近接するボクセル間の補間により、欠落したボクセルを補間する。３Ｄ表面データはさらに、ノイズ消去及び補間モジュール１５６内で平滑化され、これによって稠密な３Ｄ顔表面データが得られ、これは３Ｄ表面抽出モジュール１５８に与えられる。 Since the reconstructed 3D face surface includes several outliers, the noise cancellation and interpolation module 156 eliminates invalid voxels and interpolates the missing voxels by interpolation between adjacent voxels. The 3D surface data is further smoothed in the noise cancellation and interpolation module 156, thereby obtaining dense 3D face surface data, which is provided to the 3D surface extraction module 158.

３Ｄ表面抽出モジュール１５８は両目間の点を求め、目と顎との間の対称顔ボクセルデータを抽出する。図１５のステップ４１０及び図１９に示すように、鼻と口の区域を含むたんざく形領域は顔の対称線の両側から除外される。抽出された対象顔ボクセルデータにおけるボクセルを用いて、平面を当てはめ、最も悪い当てはめデータを除外しながら残りの対称顔ボクセルすべてが顔平面から予め定められた、例えば１０ｍｍの距離以内になるまでこれを繰返すことによって、顔平面が推定される。 The 3D surface extraction module 158 determines the points between the eyes and extracts the symmetric face voxel data between the eyes and chin. As shown in step 410 of FIG. 15 and FIG. 19, the bumpy region including the nose and mouth areas is excluded from both sides of the face symmetry line. Fit the plane using the voxels in the extracted target face voxel data and exclude this worst fit data until all the remaining symmetric face voxels are within a predetermined distance, eg, 10 mm, from the face plane. By repeating, the face plane is estimated.

顔平面パラメータを用いて、顔平面から±δの距離にある顔ボクセルが顔ボクセルとして抽出される。 Using the face plane parameters, face voxels located at a distance of ± δ from the face plane are extracted as face voxels.

テクスチャマッピングモジュール１６０は抽出された顔ボクセルの肌テクスチャをマッピングし、これによって抽出された顔画像が図２３（Ｃ）及び図２４に示すように生成される。 The texture mapping module 160 maps the extracted skin texture of the face voxel, and the extracted face image is generated as shown in FIGS.

顔平面パラメータが図１に示す３Ｄ頭部姿勢推定モジュール８８にさらに与えられる。図２５を参照して、３Ｄ顔座標系形成モジュール５４０は、まず図２７に示すように眼の線に平行な単位ベクトル５８６を求め、図２８に示すように顔平面５２０に垂直な単位ベクトル５８８を求め、その後、図２９に示すようにベクトル５８６及び５８８のクロス乗積である単位ベクトル５９０を計算する。これら３個のベクトルが３Ｄ顔座標系を形成する。 Face plane parameters are further provided to the 3D head pose estimation module 88 shown in FIG. Referring to FIG. 25, the 3D face coordinate system formation module 540 first obtains a unit vector 586 parallel to the eye line as shown in FIG. 27, and a unit vector 58 perpendicular to the face plane 520 as shown in FIG. 8 is calculated, and then a unit vector 590 which is a cross product of the vectors 586 and 588 is calculated as shown in FIG. These three vectors form a 3D face coordinate system.

＜実験的セットアップ＞
ＶｉｄｅｒｅステレオビジョンハードウェアとＳＶＳソフトウェアがこの実現に利用される。カメラのキャリブレーション及び補正はＳＶＳライブラリを用いて自動的に行われる。ＳＶＳソフトウェアはステレオビデオシーケンスをキャプチャすることができ、ステレオ対の３Ｄデータを３２０×２４０の全画像解像度で３０Ｈｚで再構築する。しかしながら、３Ｄの再構築に関係ある領域はこの実験においてユーザの顔区域であるので、ディスパリティ探索区域を、上述の説明の通り、顔のホロプタ周辺に限定した。したがって、顔の外側で深さの異なる再構築された３Ｄデータは、図２３（Ｃ）に示すように３Ｄ推定が不正確である。 <Experimental setup>
Vide stereo vision hardware and SVS software are used for this implementation. Camera calibration and correction is performed automatically using the SVS library. SVS software can capture a stereo video sequence and reconstruct stereo pairs of 3D data at 30 Hz with a total image resolution of 320 × 240. However, since the region related to the 3D reconstruction is the user's face area in this experiment, the disparity search area was limited to the area around the face horopter as described above. Therefore, the reconstructed 3D data having different depths outside the face has an inaccurate 3D estimation as shown in FIG.

３Ｄ座標値は、世界座標系で計算される。この実現例では、世界座標系（原点）は左カメラの焦点と規定され、右手座標系である。 3D coordinate values are calculated in the world coordinate system. In this implementation, the world coordinate system (origin) is defined as the focus of the left camera and is the right hand coordinate system.

頭部姿勢推定スキームの正確さを評価するために、マーカベースの頭部姿勢推定から得られた測定値との比較を行なった。図３２の参照符号６５０で示すように、ユーザの額に半径５ミリの３個の黒のマーカを、２５ミリずつ離して時計回りに９０度回転させたＬ字型を形成するように位置づけ、ステレオ処理アルゴリズムを利用してマーカ位置の３Ｄ座標を推定した。マーカベースの頭部姿勢推定アルゴリズムの正確さは、マーカ位置検出におけるジッタのために、±３度であった。 In order to evaluate the accuracy of the head pose estimation scheme, we compared it with measurements obtained from marker-based head pose estimation. As shown by reference numeral 650 in FIG. 32, three black markers having a radius of 5 mm are positioned on the user's forehead so as to form an L-shape that is rotated 90 degrees clockwise by 25 mm apart, A 3D coordinate of the marker position was estimated using a stereo processing algorithm. The accuracy of the marker-based head pose estimation algorithm was ± 3 degrees due to jitter in marker position detection.

図３３で、提案されたアルゴリズムで推定された角度を、マーカベースのアルゴリズムによって得られた値と比較している。この実施の形態の結果を細い実線６６０、６７２及び６８２で示し、マーカベースのアルゴリズムで測定されたものを太い破線６６２、６７０及び６８０で示す。 In FIG. 33, the angle estimated by the proposed algorithm is compared with the value obtained by the marker-based algorithm. The results of this embodiment are shown by thin solid lines 660, 672, and 682, and those measured with a marker-based algorithm are shown by thick dashed lines 662, 670, and 680.

当然、マーカが位置づけられた額は、そのそれぞれのオフセット値により、プロットに反映された顔のそれとは異なる姿勢になる。図３３に示されたｘ、ｙ及びｚ軸の角度データでは、相関係数はそれぞれ０．８７、０．９２及び０．９８であった。ｘ軸周りの回転について相関係数がｙ軸及びｚ軸についての結果に比較して低いのは、アイトラッキングアルゴリズムによって検出される目の位置のジッタによるものである。しかし、アイトラッキングにおけるジッタは、３Ｄ顔構造情報又はより良いアイトラッカを用いることで修正可能である。 Naturally, the forehead where the marker is positioned has a different posture from that of the face reflected in the plot due to its respective offset value. In the x, y, and z-axis angle data shown in FIG. 33, the correlation coefficients were 0.87, 0.92, and 0.98, respectively. The lower correlation coefficient for rotation about the x axis compared to the results for the y and z axes is due to the eye position jitter detected by the eye tracking algorithm. However, jitter in eye tracking can be corrected by using 3D face structure information or a better eye tracker.

結論
上述の実施の形態は、実世界の状況下で人とコンピュータとのインターフェイス応用に好適な、頑健な３Ｄ顔抽出及び３Ｄ頭部姿勢推定スキームを与える。提案されたスキームはモデルを必要とせず、初期化も必要とせず、さらに単一の画像対から３Ｄの顔を抽出し３Ｄ頭部姿勢情報を推定することができる。これはまた、顔の表情及び鼻の形状等の人によって異なる顔の特徴に対して頑健である。 CONCLUSION The above-described embodiments provide a robust 3D face extraction and 3D head pose estimation scheme suitable for human-computer interface applications under real world conditions. The proposed scheme requires no model, does not require initialization, and can extract 3D faces from a single image pair to estimate 3D head pose information. It is also robust against different facial features such as facial expressions and nose shapes.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each of the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

本発明の実施の形態に従った３Ｄ顔姿勢推定システムの全体構成を示す図である。It is a figure which shows the whole structure of the 3D face attitude | position estimation system according to embodiment of this invention. コンピュータによって実現されたこの発明の３Ｄ顔姿勢推定システム５０のハードウェアブロック図である。It is a hardware block diagram of 3D face posture estimation system 50 of this invention realized by computer. 図１に示す３Ｄ顔再構築モジュール８６の全体構成を示す図である。It is a figure which shows the whole structure of the 3D face reconstruction module 86 shown in FIG. 補正プロセスを示す図である。It is a figure which shows a correction process. ホロプタ推定アルゴリズムの詳細なフローを示す図である。It is a figure which shows the detailed flow of a horopter estimation algorithm. ステレオ画像中で両目間のパターンを見出すための探索範囲の限定を示す図である。It is a figure which shows limitation of the search range for finding the pattern between both eyes in a stereo image. 提案されたディスパリティ探索アルゴリズムの全体の処理ステップを示すフローチャートである。It is a flowchart which shows the whole process step of the proposed disparity search algorithm. 左画像２６０Ｌ内の画素２６２Ｌのディスパリティをどのように計算するかを示す図である。It is a figure which shows how the disparity of the pixel 262L in the left image 260L is calculated. 図３に示すノイズ消去及び補間モジュールの詳細構造を示す図である。It is a figure which shows the detailed structure of the noise cancellation and interpolation module shown in FIG. ノイズ消去アルゴリズムの全体の処理ステップを示すフローチャートである。It is a flowchart which shows the whole process step of a noise elimination algorithm. 顔データのボクセルを検証する処理を示す図である。It is a figure which shows the process which verifies the voxel of face data. 図９に示す平滑化モジュール２８２を実現するプログラムのフロー図である。FIG. 10 is a flowchart of a program that implements the smoothing module 282 shown in FIG. 9. 図９に示す補間モジュール２８４を実現するフロー図である。FIG. 10 is a flowchart for realizing the interpolation module 284 shown in FIG. 9. この発明の一実施の形態で用いられる３Ｄ顔データ抽出を実現するプログラムのフロー図である。It is a flowchart of the program which implement | achieves 3D face data extraction used by one embodiment of this invention. 図１４のステップ３９０で用いられる新規な３Ｄ顔平面推定アルゴリズムを実現するプログラムの詳細なフロー図である。FIG. 15 is a detailed flowchart of a program that implements a novel 3D face plane estimation algorithm used in step 390 of FIG. 14. 両目と顎との間の対称ボクセルデータの抽出処理を示す図である。It is a figure which shows the extraction process of the symmetrical voxel data between both eyes and jaws. 両目と顎との間の対称ボクセルデータの抽出処理を示す図である。It is a figure which shows the extraction process of the symmetrical voxel data between both eyes and jaws. 両目と顎との間の対称ボクセルデータの抽出処理を示す図である。It is a figure which shows the extraction process of the symmetrical voxel data between both eyes and jaws. 顔画像上の抽出された対称ボクセルデータを示す図である。It is a figure which shows the extracted symmetrical voxel data on a face image. アウトライアデータを除外するための繰返しの例を示す図である。It is a figure which shows the example of the repetition for excluding outlier data. 図１４のステップ３９２での顔抽出を示す図である。It is a figure which shows the face extraction in step 392 of FIG. （Ａ）及び（Ｂ）はステレオ画像対を示す図であり、（Ｃ）はそのディスパリティマップを示す図である。(A) And (B) is a figure which shows a stereo image pair, (C) is a figure which shows the disparity map. （Ａ）及び（Ｂ）はステレオ画像対を示す図であり、（Ｃ）はその抽出されかつ再構築された３Ｄ顔データを示す図である。(A) and (B) are diagrams showing stereo image pairs, and (C) is a diagram showing the extracted and reconstructed 3D face data. 異なる視野角から保存された３Ｄ顔画像を示す図である。It is a figure which shows the 3D face image preserve | saved from the different viewing angle. ３Ｄ頭部姿勢推定モジュール８８の詳細なブロック図である。4 is a detailed block diagram of a 3D head posture estimation module 88. FIG. ３Ｄ顔座標系形成モジュール５４０を実現するためのフロー図である。It is a flowchart for implement | achieving 3D face coordinate system formation module 540. FIG. ３Ｄ顔座標系を形成する処理を示す図である。It is a figure which shows the process which forms 3D face coordinate system. ３Ｄ顔座標系を形成する処理を示す図である。It is a figure which shows the process which forms 3D face coordinate system. ３Ｄ顔座標系を形成する処理を示す図である。It is a figure which shows the process which forms 3D face coordinate system. 基準座標系に対し３Ｄ頭部姿勢を推定するためのコンピュータプログラムルーチンを示すフロー図である。It is a flowchart which shows the computer program routine for estimating 3D head posture with respect to a reference coordinate system. 顔座標系６３２と世界座標系６３０とが３×３回転マトリクスＲ（Φ）及び３×１平行移動ベクトルＴによってどのように関連づけられるかを示す図である。It is a figure which shows how the face coordinate system 632 and the world coordinate system 630 are linked | related by 3 * 3 rotation matrix R ((PHI)) and 3 * 1 translation vector T. FIG. マーカベースの頭部姿勢推定から得られた測定値のための設定を示す図である。It is a figure which shows the setting for the measured value obtained from the marker-based head posture estimation. この発明の実施の形態から推定された角度をマーカベースのアルゴリズムから得られた値と比較して示す図である。It is a figure which compares and compares the angle estimated from embodiment of this invention with the value obtained from the marker-based algorithm.

Explanation of symbols

５０３Ｄ顔姿勢推定システム
６０ステレオカメラ
６２リアルタイム３Ｄ顔姿勢推定装置
８０キャリブレーションソフトウェア
８２キャリブレーションパラメータメモリ
８４補正ソフトウェア
８６３Ｄ顔再構築モジュール
８８３Ｄ頭部姿勢推定モジュール
１３０ビデオキャプチャボード
１５０ホロプタ推定モジュール
１５２ディスパリティ画像モジュール
１５４３Ｄ表面再構築モジュール
１５６ノイズ消去及び補間モジュール
１５８３Ｄ表面抽出モジュール
１６０テクスチャマッピングモジュール
２８０ノイズ消去モジュール
２８２平滑化モジュール
２８４補間モジュール
５２０顔平面
５４０３Ｄ顔座標系形成モジュール
５４２３Ｄ頭部姿勢推定モジュール
６３０世界座標系
６３２顔座標系 50 3D face posture estimation system 60 stereo camera 62 real-time 3D face posture estimation device 80 calibration software 82 calibration parameter memory 84 correction software 86 3D face reconstruction module 88 3D head posture estimation module 130 video capture board 150 horopta estimation module 152 Disparity image module 154 3D surface reconstruction module 156 noise cancellation and interpolation module 158 3D surface extraction module 160 texture mapping module 280 noise cancellation module 282 smoothing module 284 interpolation module 520 face plane 540 3D face coordinate system formation module 542 3D head Posture estimation module 630 World coordinate system 632 Face coordinate system

Claims

An apparatus for reconstructing 3D human face surface data from a corrected pair of stereo cameras,
Horopter range estimation means for estimating the size and horopter range of the human face based on the calibration parameters of the stereo camera and the corrected image pair;
Disparity estimation means for estimating a disparity value in the horopter range in the other image of the image pair for each pixel of the human face in one image of the image pair;
3D surface reconstruction means for reconstructing 3D surface data of the human face based on the corrected image pair, the disparity value, and the calibration parameters of the stereo camera;
Noise erasing means for erasing noise in the three-dimensional surface data by finding invalid data points having invalid disparity values and interpolating the invalid data points with adjacent data points;
A fitting means for fitting a face plane to the three-dimensional surface data using a predetermined fitting algorithm;
An extraction means for extracting, as a three-dimensional human face, the three-dimensional surface data within a predetermined distance from the face plane.

The horopter range estimation means includes
First identifying means for identifying an interocular pattern in the one image of the image pair;
A second specifying means for specifying a corresponding pattern along an epipolar line corresponding to the interocular pattern in the other image;
Verification means for verifying whether a face candidate near the interocular pattern in the one image and the corresponding pattern satisfy a verification condition;
Means for repeatedly operating the first and second specifying means and the verification means until a face candidate satisfies the authentication condition;
The apparatus according to claim 1, further comprising: means for defining a face area by the face candidates satisfying the verification condition.

The disparity estimation means includes
Means for defining a correlation window size for the horopter range and the size of the face region;
Means for calculating a disparity value for each pixel in the person's face by searching the other image for a pixel that yields the highest similarity measure according to a pre-selected similarity measure. The apparatus according to claim 2.

The noise canceling means is
Means for defining a deviation window size for the size of the face and determining a standard deviation of pixel depth in the deviation window;
For each pixel in the face area in the one image, using the disparity calculated for each of the pixels in the face area in the one image and the image pair, the size near each pixel Means for calculating the local standard deviation and average of the pixel depth within the deviation window of
Each pixel in the face region depends on whether the deviation of the depth value of each pixel from the average deviation window near each pixel is less than the standard deviation value of the deviation window near each pixel. For classifying the
And means for interpolating the depth values of pixels classified as invalid with the depth values of adjacent pixels respectively.

The fitting means is
Means for finding an eye line connecting the left eye and the right eye in the three-dimensional surface data;
Means for finding a three-dimensional face symmetry line orthogonal to the line of sight;
Means for selecting data points to be applied to the face plane ;
An apparatus according to any of claims 1 to 4, comprising means for fitting the face plane to the data points to be fitted using the predetermined fitting algorithm.

The fitting means is
Means for fitting the face plane to the data points to be fitted using a least squares error fit;
Means for determining whether all errors between the face plane and the three-dimensional surface data are within a predetermined threshold;
Means for eliminating certain high error points from the three-dimensional surface in response to determining that all of the errors are not less than the predetermined threshold;
Means for repeatedly operating the fitting means, the determining means, and the erasing means until the determining means determines that all of the errors are less than the predetermined threshold. 5. The apparatus according to 5.

When run on a computer, the computer
A horopter range estimating means for estimating the size and horopter range of a human face on the basis of the calibration parameters and auxiliary Tadashisa image pairs of the scan Tereokamera,
Disparity estimation means for estimating a disparity value in the horopter range in the other image of the image pair for each pixel of the human face in one image of the image pair;
3D surface reconstruction means for reconstructing 3D surface data of the human face based on the corrected image pair, the disparity value, and the calibration parameters of the stereo camera;
Noise erasing means for erasing noise in the three-dimensional surface data by finding invalid data points having invalid disparity values and interpolating the invalid data points with adjacent data points;
A fitting means for fitting the face plane to the three-dimensional surface data using a predetermined fitting algorithm;
A computer program that functions as extraction means for extracting, as a three-dimensional human face, the three-dimensional surface data within a predetermined distance from the face plane.