JP5460499B2

JP5460499B2 - Image processing apparatus and computer program

Info

Publication number: JP5460499B2
Application number: JP2010158178A
Authority: JP
Inventors: クリピングデルサイモン
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-07-12
Filing date: 2010-07-12
Publication date: 2014-04-02
Anticipated expiration: 2030-07-12
Also published as: JP2012022403A

Description

本発明は、画像処理装置およびそのコンピュータプログラムに関する。特に、複数の画像間で対応する点の位置を対応付ける画像処理装置およびそのコンピュータプログラムに関する。 The present invention relates to an image processing apparatus and a computer program thereof. In particular, the present invention relates to an image processing apparatus that associates positions of corresponding points among a plurality of images and a computer program thereof.

三次元形状を有する被写体を撮像した複数の画像間において、被写体上のある特徴点がそれら複数の画像上のどの位置に対応するかを算出することが求められる。
例えば、非特許文献１には、２枚の画像間における被写体の位置ずれと、被写体の回転の中心および回転角度と、被写体の拡大または縮小の中心と拡大率（または縮小率）とを推定する手法が記載されている。同文献に記載された技術では、２枚の画像にそれぞれ含まれるパッチのフーリエ変換を計算して、２つの最大のフーリエ成分により、両パッチ間のアフィン変換を推定する。 It is required to calculate to which position on a plurality of images a certain feature point on the subject corresponds between a plurality of images obtained by imaging a subject having a three-dimensional shape.
For example, Non-Patent Document 1 estimates a subject position shift between two images, a subject rotation center and angle, and a subject enlargement or reduction center and enlargement rate (or reduction rate). The method is described. In the technique described in this document, the Fourier transform of the patches respectively included in the two images is calculated, and the affine transformation between both patches is estimated using the two largest Fourier components.

また、非特許文献２には、２つの画像パッチに含まれる特徴点で計測したガボールウェーブレット（Gabor wavelet）係数の比較により、特徴点同士の間での位置ずれを推定する技術が記載されている。非特許文献２に記載されたシステムは、１枚の入力顔画像と顔テンプレートとを照合し、その結果に基づいて入力された顔の認識を行なう。 Non-Patent Document 2 describes a technique for estimating a positional deviation between feature points by comparing Gabor wavelet coefficients measured at feature points included in two image patches. . The system described in Non-Patent Document 2 collates one input face image with a face template, and recognizes an input face based on the result.

Stefan Kruger，Andrew Calway，“Image Registration using Multiresolution Frequency Domain Correlation”，Proceedings of the British Machine Vision Conference BMVC98，１９９８年，ｐｐ．３１６−３２５Stefan Kruger, Andrew Calway, “Image Registration using Multiresolution Frequency Domain Correlation”, Proceedings of the British Machine Vision Conference BMVC98, 1998, pp. 316-325 Laurenz Wiskott，Jean-Marc Fellous，Norbert Kruger，Christoph von der Malsburg，“Face Recognition by Elastic Bunch Graph Matching”，TR96-08，Institut fur Neuroinformatik，Ruhr-Universitat Bochum，１９９６年Laurenz Wiskott, Jean-Marc Fellous, Norbert Kruger, Christoph von der Malsburg, “Face Recognition by Elastic Bunch Graph Matching”, TR96-08, Institut fur Neuroinformatik, Ruhr-Universitat Bochum, 1996

しかしながら、非特許文献１に記載された技術は、パッチ全体(つまり両パッチの全画素)の間のグローバルなアフィン変換を推定するのみである。現実には被写体を撮像した複数の画像間では、そのようなグローバルなアフィン変換だけでなく、局所的なパッチ間の非線形的歪み（warping）が存在するが、非特許文献１に記載された技術ではそのような非線形的歪みに対応した点を画像間で対応付けることはできない。 However, the technique described in Non-Patent Document 1 only estimates a global affine transformation between the entire patches (that is, all pixels of both patches). In reality, not only such global affine transformation but also non-linear warping between patches exists between a plurality of images obtained by imaging a subject. The technique described in Non-Patent Document 1 Then, points corresponding to such nonlinear distortion cannot be associated between images.

また、非特許文献１または２に記載された技術では、同一の被写体を撮影した２枚の画像が与えられ、それら２枚の画像のうちの片方をリファレンスとし、他方をテストとして、テストの画像をリファレンスの画像にあわせるようにワーピング（warping）する。しかしながら、一般に多数（３枚以上）の画像（画像パッチ）が与えられ、それらの画像内に含まれる点の位置の対応付けをする場合には、どの画像をリファレンスとして使用すればよいかが不明である。 In the technique described in Non-Patent Document 1 or 2, two images obtained by photographing the same subject are given, and one of the two images is used as a reference and the other is used as a test. Warping to match the reference image. However, in general, a large number (three or more) of images (image patches) are given, and it is unclear which image should be used as a reference when associating the positions of points included in these images. is there.

本発明は、上記のような事情を考慮してなされたものであり、三次元形状を有する被写体を撮影した画像を元に画像上の画素の位置を被写体表面の位置に対応させる場合に、非線形的歪みにも対応してアラインメントを行なえる画像処理装置を提供する。 The present invention has been made in consideration of the above-described circumstances. When the pixel position on the image is made to correspond to the position of the subject surface based on the image obtained by photographing the subject having a three-dimensional shape, the present invention is nonlinear. Provided is an image processing apparatus that can perform alignment in response to mechanical distortion.

［１］上記の課題を解決するため、本発明の一態様による画像処理装置は、三次元物体を撮像した複数のフレーム画像に含まれる画素の前記三次元物体の表面における位置を推定して得られる貼り付け画像を取得する貼り付け画像取得部と、前記貼り付け画像取得部が取得した複数の前記貼り付け画像の和に、所定の解像度によるローパスフィルターをかけることによって当該解像度のリファレンス画像を生成するリファレンス画像生成部と、前記貼り付け画像と前記リファレンス画像とに基づき前記貼り付け画像に含まれる画素の位置ずれ量を計算し、得られた前記位置ずれ量により前記貼り付け画像の前記画素の位置をアライン（align）するワーピング処理部とを具備する。 [1] In order to solve the above-described problem, an image processing apparatus according to an aspect of the present invention obtains the position of pixels included in a plurality of frame images obtained by imaging a three-dimensional object on the surface of the three-dimensional object. A reference image of the resolution is generated by applying a low-pass filter with a predetermined resolution to the sum of the pasted images acquired by the pasted image acquisition unit and the pasted image acquisition unit that acquires the pasted image to be obtained And calculating a positional deviation amount of pixels included in the pasted image based on the pasted image and the reference image, and calculating the positional deviation amount of the pixels of the pasted image based on the obtained positional deviation amount. And a warping processing unit for aligning positions.

画像取得部が取得する貼り付け画像は、三次元物体を撮像したものであり、それぞれの画素に対応する三次元物体上での位置が推定されている。但し、この推定は、所定の精度を有しているものの、位置ずれを含んでいる可能性がある。リファレンス画像生成部は、これら複数の貼り付け画像を元に、所定の解像度でのリファレンス画像を生成する。つまり、リファレンス画像を別に与えたり、何をリファレンスとすべきかを指定したりすることなく、貼り付け画像を元にリファレンス画像を構築できる。上記において、画像の和とは、単純な和であってもよく、また画像ごとに重み付けして得られる画素値と重み値との積和（sum of products）等であってもよい。一例としては、重み値としては、撮像方向と三次元物体の表面法線とがなす角度によって定まる重み値を用いるが、これに限られない。ワーピング処理部は、得られたリファレンス画像と各貼り付け画像とに基づき、貼り付け画像の画素ごとに位置ずれ量を算出する。言い換えれば、一枚の貼り付け画像についての各画素の位置ずれ量は、アフィン変換に相当するものに限定されず、非線形歪みに対応するものであり得る。 The pasted image acquired by the image acquisition unit is an image of a three-dimensional object, and the position on the three-dimensional object corresponding to each pixel is estimated. However, although this estimation has a predetermined accuracy, it may include a positional deviation. The reference image generation unit generates a reference image with a predetermined resolution based on the plurality of pasted images. That is, a reference image can be constructed based on a pasted image without giving a reference image separately or designating what should be a reference. In the above, the sum of images may be a simple sum, or a sum of products of pixel values obtained by weighting each image and a weight value. As an example, a weight value determined by an angle formed by the imaging direction and the surface normal of the three-dimensional object is used as the weight value, but is not limited thereto. The warping processing unit calculates a positional deviation amount for each pixel of the pasted image based on the obtained reference image and each pasted image. In other words, the positional deviation amount of each pixel for one pasted image is not limited to that corresponding to affine transformation, and may correspond to nonlinear distortion.

［２］また、本発明の一態様は、上記の画像処理装置において、前記所定の解像度において前記ワーピング処理部が前記画素の位置をアラインして得られた前記貼り付け画像を、前記貼り付け画像取得部に取得させ、次の解像度において前記リファレンス画像生成部に前記リファレンス画像を生成させるとともに、低解像度側から高解像度側へ、前記貼り付け画像の前記画素の位置のアラインを順次繰り返すように制御する制御部をさらに具備する。 [2] Further, according to one aspect of the present invention, in the image processing apparatus, the pasted image obtained by the warping processing unit aligning the pixel positions at the predetermined resolution may be used as the pasted image. The acquisition unit acquires the reference image, and the reference image generation unit generates the reference image at the next resolution, and controls to sequentially repeat the alignment of the pixel positions of the pasted image from the low resolution side to the high resolution side. And a control unit.

上記の構成により、まず低解像度側で位置ずれのアラインメントを行い、次に上の段階の解像度で位置ずれのアラインメントを行い、順次解像度を上げながら徐々に位置ずれのアラインメントを行い、これを最高解像度まで繰り返すこととなる。 With the above configuration, first the misalignment alignment is performed on the low resolution side, then the misalignment alignment is performed at the upper resolution, and the misalignment alignment is gradually performed while increasing the resolution sequentially, and this is the highest resolution. Will be repeated.

［３］また、本発明の一態様は、上記の画像処理装置において、三次元物体の表面のテクスチャを表すテクスチャ画像を記憶するテクスチャ画像記憶部と、前記ワーピング処理部によってアライン済みの複数の前記貼り付け画像を合成して得られる前記テクスチャ画像を前記テクスチャ画像記憶部に書き込むテクスチャ画像書き込み部とをさらに具備する。 [3] Further, according to one aspect of the present invention, in the image processing apparatus described above, a texture image storage unit that stores a texture image representing a texture of a surface of a three-dimensional object, and a plurality of the alignments that are aligned by the warping processing unit A texture image writing unit that writes the texture image obtained by combining the pasted images into the texture image storage unit;

上記の構成により、複数の貼り付け画像を合成してテクスチャ画像を得ることができる。このとき、複数の貼り付け画像を、重み値によって重みを付けて合成する（加算する）ようにしても良い。 With the above configuration, a texture image can be obtained by combining a plurality of pasted images. At this time, a plurality of pasted images may be combined (added) with weighting by weight values.

［４］また、本発明の一態様は、前記三次元物体として人の顔（頭部）を対象として、上記の画像処理装置において、顔の特徴点に対応する二次元座標値データを人物識別情報と頭部姿勢を表わす角度データに関連付けて記憶する顔特徴データベース部と、前記顔特徴データベース部から読み出した前記特徴点に対応する二次元座標値データに基づいて、読み込んだ画像フレームに含まれる顔の特徴点の二次元座標値データを推定する顔領域検出照合部と、予め定められたジェネリックモデルにおけるメッシュ頂点に対応する三次元座標値データを記憶する顔モデル記憶部と、前記顔領域検出照合部によって推定された前記特徴点の二次元座標値データと、前記顔モデル記憶部から読み出した前記メッシュ頂点に対応する三次元座標値データとに基づいて、前記画像フレームに含まれる前記特徴点の三次元座標値データと前記画像フレームについての頭部姿勢を表わす角度データとを推定する位置・姿勢推定部と、前記位置・姿勢推定部によって推定された前記特徴点の三次元座標値データと前記角度データとに基づき、前記ジェネリックモデルにおける前記メッシュ頂点をワーピングさせることによって修正顔モデルを生成し、前記メッシュ頂点に対応して前記修正顔モデルにおけるメッシュ頂点の三次元座標値データを算出するメッシュワーピング部と、前記メッシュワーピング部によって生成された修正顔モデルに基づき、頭部姿勢を表わす前記角度データを変えたときのレンダリング処理を行って複数の合成顔画像モデルを生成し、前記特徴点の二次元座標値データを算出するレンダリング部と、前記レンダリング部が算出した前記特徴点の二次元座標値データを、対応する前記角度データと関連付けて前記顔特徴データベース部に登録するデータベース登録部とを備え、前記顔特徴データベース部は、前記人物識別情報と頭部姿勢の前記角度データとに関連付けて、少なくとも前記特徴点の近傍の画像特徴情報を記憶するものであり、前記レンダリング部は、前記テクスチャ画像書き込み部が書き込んだテクスチャ画像を前記テクスチャ画像記憶部から読み出し、このテクスチャ画像に基づくレンダリング処理を行うものであり、前記データベース登録部は、前記レンダリング部が行なったレンダリング処理の結果に基づく前記画像特徴情報を、対応する前記角度データと関連付けて前記顔特徴データベース部に登録することを特徴とする。 [4] Further, according to one aspect of the present invention, in the above-described image processing apparatus, two-dimensional coordinate value data corresponding to a feature point of a face is identified as a person in the image processing apparatus as the three-dimensional object. Included in the read image frame based on the face feature database unit stored in association with information and angle data representing the head posture, and the two-dimensional coordinate value data corresponding to the feature points read from the face feature database unit A face area detecting / collating unit for estimating two-dimensional coordinate value data of a facial feature point; a face model storing unit for storing three-dimensional coordinate value data corresponding to mesh vertices in a predetermined generic model; and the face area detecting unit Two-dimensional coordinate value data of the feature points estimated by the matching unit, and three-dimensional coordinate value data corresponding to the mesh vertices read from the face model storage unit A position / posture estimation unit that estimates three-dimensional coordinate value data of the feature points included in the image frame and angle data representing a head posture of the image frame, and the position / posture estimation unit Based on the estimated three-dimensional coordinate value data of the feature points and the angle data, a corrected face model is generated by warping the mesh vertices in the generic model, and the corrected face model corresponding to the mesh vertices A mesh warping unit for calculating three-dimensional coordinate value data of mesh vertices in the image, and a plurality of rendering processes when the angle data representing the head posture is changed based on the corrected face model generated by the mesh warping unit. Generate a composite face image model and calculate the two-dimensional coordinate value data of the feature points And a database registration unit that registers the two-dimensional coordinate value data of the feature points calculated by the rendering unit in the face feature database unit in association with the corresponding angle data, the face feature database unit Stores at least image feature information in the vicinity of the feature point in association with the person identification information and the angle data of the head posture, and the rendering unit writes the texture written by the texture image writing unit. The image is read from the texture image storage unit, and rendering processing based on the texture image is performed. The database registration unit is configured to correspond to the image feature information based on the result of rendering processing performed by the rendering unit. In association with angle data in the facial feature database section It is characterized by registering.

［４Ａ］また、本発明の一態様は、上記［４］の画像処理装置において、前記テクスチャマッピング部は、前記修正顔モデルに対する視線の光軸と前記修正顔モデルの表面法線との角度に基づき、前記画像フレームに含まれる前記濃淡・色彩情報を前記二次元顔テクスチャ画像にマッピングする際の方向の重みを調整する。
［４Ｂ］また、本発明の一態様は、上記［４］の画像処理装置において、前記テクスチャマッピング部は、前記二次元顔テクスチャ画像において前記修正顔モデルにおける前記特徴点に対応する位置からの距離に基づき、前記画像フレームに含まれる前記濃淡・色彩情報を前記二次元顔テクスチャ画像にマッピングする際の距離の重みを調整するものであり、前記距離に対する前記距離の重みの変化度合いを第１のパラメータにより可変とするとともに、前記視線の光軸と前記表面法線との前記角度に対する前記方向の重みの変化度合いを第２のパラメータにより可変としたものである。 [4A] Further, according to one aspect of the present invention, in the image processing apparatus according to [4], the texture mapping unit may determine an angle between an optical axis of a line of sight with respect to the corrected face model and a surface normal of the corrected face model. Based on this, the weight of the direction when mapping the shading / color information included in the image frame to the two-dimensional face texture image is adjusted.
[4B] Further, according to one aspect of the present invention, in the image processing device according to [4], the texture mapping unit is a distance from a position corresponding to the feature point in the corrected face model in the two-dimensional face texture image. And adjusting the weight of the distance when mapping the shading / color information included in the image frame to the two-dimensional face texture image, and the degree of change of the weight of the distance with respect to the distance is set to the first The variable is variable according to the parameter, and the degree of change in the weight of the direction with respect to the angle between the optical axis of the line of sight and the surface normal is variable according to the second parameter.

［５］また、本発明の一態様は、三次元物体を撮像した複数のフレーム画像に含まれる画素の前記三次元物体の表面における位置を推定して得られる貼り付け画像を取得する貼り付け画像取得部と、前記貼り付け画像取得部が取得した複数の前記貼り付け画像の和に、所定の解像度によるローパスフィルターをかけることによって当該解像度のリファレンス画像を生成するリファレンス画像生成部と、前記貼り付け画像と前記リファレンス画像とに基づき前記貼り付け画像に含まれる画素の位置ずれ量を計算し、得られた前記位置ずれ量により前記貼り付け画像の前記画素の位置をアラインするワーピング処理部とを具備する画像処理装置としてコンピュータを機能させるコンピュータプログラムである。 [5] In addition, according to one aspect of the present invention, a pasted image for obtaining a pasted image obtained by estimating positions of pixels included in a plurality of frame images obtained by capturing a three-dimensional object on the surface of the three-dimensional object. An acquisition unit; a reference image generation unit that generates a reference image of the resolution by applying a low-pass filter with a predetermined resolution to a sum of the plurality of the pasted images acquired by the pasted image acquisition unit; and the pasting A warping processing unit that calculates a positional deviation amount of pixels included in the pasted image based on the image and the reference image, and aligns the pixel positions of the pasted image based on the obtained positional deviation amount. A computer program that causes a computer to function as an image processing apparatus.

本発明によれば、三次元形状を有する被写体を撮影した画像を元に、画像上の画素の位置を被写体表面の位置に対応させる場合に、非線形的歪みにも対応して位置ずれ量を算出し、アラインメントを行なうことができる。
また、本発明によれば、リファレンスとすべき画像を特定できない場合も、撮影した画像を元にリファレンス画像を構築し、そのリファレンス画像との関係によって画素の位置ずれ量を算出することができる。 According to the present invention, when the position of a pixel on the image is made to correspond to the position of the subject surface based on an image of a subject having a three-dimensional shape, the amount of positional deviation is calculated corresponding to nonlinear distortion. And alignment can be performed.
Further, according to the present invention, even when an image to be used as a reference cannot be specified, a reference image can be constructed based on the captured image, and the amount of pixel displacement can be calculated based on the relationship with the reference image.

本発明の第１の実施形態による画像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image processing apparatus by the 1st Embodiment of this invention. 同実施形態によるテクスチャマッピング部のより詳細な機能構成を示すブロック図である。It is a block diagram which shows the more detailed functional structure of the texture mapping part by the embodiment. 同実施形態によるテクスチャ画像記憶部が記憶するテクスチャ画像と、被写体の三次元モデルとの関係を示した概略図である。It is the schematic which showed the relationship between the texture image which the texture image memory | storage part by the same embodiment memorize | stores, and a three-dimensional model of a to-be-photographed object. 同実施形態による処理手順を示したフローチャートである。It is the flowchart which showed the process sequence by the embodiment. 本発明の第２の実施形態による顔画像処理装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the face image processing apparatus by the 2nd Embodiment of this invention. 同実施形態において、顔領域検出照合部が顔画像領域から推定した特徴点位置を模式的に示した図である。In the embodiment, it is the figure which showed typically the feature point position which the face area detection collation part estimated from the face image area | region. 同実施形態において、メッシュ頂点割当情報記憶部に記憶される対応関係に関する情報のデータ構造を示した図である。In the embodiment, it is the figure which showed the data structure of the information regarding the correspondence memorize | stored in a mesh vertex allocation information storage part. 同実施形態において、画素割当情報記憶部に記憶される対応関係に関する情報のデータ構造を示した図である。In the same embodiment, it is the figure which showed the data structure of the information regarding the correspondence memorize | stored in a pixel allocation information storage part. 同実施形態において、データベース登録部が生成する可変テンプレート構造体のデータ構造を示す図である。In the same embodiment, it is a figure which shows the data structure of the variable template structure which a database registration part produces | generates. 同実施形態において、顔領域検出照合部が映像データの各画像フレームから顔画像領域を検出して顔特徴データと照合し、追跡結果または認識結果を出力する手順を示すフローチャートである。4 is a flowchart illustrating a procedure in which a face area detection / collation unit detects a face image area from each image frame of video data, collates it with face feature data, and outputs a tracking result or a recognition result in the embodiment. 同実施形態において、映像データの各画像フレームにおける二次元の特徴点位置から、三次元の特徴点位置と画像フレームごとの頭部姿勢とを対応付けて推定する処理の手順を示すフローチャートである。4 is a flowchart showing a procedure of processing for estimating a three-dimensional feature point position and a head posture for each image frame from two-dimensional feature point positions in each image frame of video data in the embodiment. 同実施形態において、メッシュワーピング部が実行するメッシュワーピング処理の手順を示したフローチャートである。In the same embodiment, it is the flowchart which showed the procedure of the mesh warping process which a mesh warping part performs. 同実施形態において、三次元ＣＧ顔モデルのメッシュを示す図である。In the same embodiment, it is a figure which shows the mesh of a three-dimensional CG face model. 同実施形態において、三次元ＣＧ顔モデルのメッシュ頂点のワーピング処理を説明するための図である。In the embodiment, it is a figure for demonstrating the warping process of the mesh vertex of a three-dimensional CG face model. 同実施形態において、ＵＶテクスチャ画像の修正処理の手順を示すフローチャートである。5 is a flowchart showing a procedure of UV texture image correction processing in the embodiment.

以下、図面を参照しながら、本発明の実施形態について説明する。
［第１の実施形態］
図１は、第１の実施形態による画像処理装置の機能構成を示すブロック図である。図示するように、画像処理装置１は、画像データ記憶部１１と逆ポーズ変換処理部１２とテクスチャマッピング処理部１４とテクスチャ画像記憶部１５とを含んで構成される。
画像データ記憶部１１は、カメラ等を用いて、ある被写体を撮影して得られた画像データを記憶する。画像データは、ある被写体について複数の画像フレームを含む。個々の画像フレームのデータは、各画素の色値（例えば、ＲＧＢ値）を含む。ここで各々の画像フレームは、時間間隔をおいて撮影された静止画であっても良いし、映像（動画）に含まれる１枚の画像フレームであっても良い。ここで、複数の画像フレームは、被写体を撮像する際の位置や方向や距離や画角等が異なっていても良い。
テクスチャ画像記憶部１５は、上記被写体の全表面、または少なくとも一部表面のテクスチャを表すテクスチャ画像を含む。テクスチャ画像は、被写体表面の色彩や濃淡や模様のパターンや質感を表す。テクスチャ画像記憶部１５に記憶されるテクスチャ画像は、ｕ座標およびｖ座標で表される２次元直交座標系平面の画像であり、当該平面が上記被写体の表面に対応する。言い換えれば、当該平面画像上の１画素（座標（ｕ，ｖ）で特定される）が、被写体の表面におけるある点に対応する。テクスチャ画像のデータは、各画素の色値（例えば、ＲＧＢ値）で表される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram illustrating a functional configuration of the image processing apparatus according to the first embodiment. As illustrated, the image processing apparatus 1 includes an image data storage unit 11, a reverse pose conversion processing unit 12, a texture mapping processing unit 14, and a texture image storage unit 15.
The image data storage unit 11 stores image data obtained by photographing a certain subject using a camera or the like. The image data includes a plurality of image frames for a certain subject. The data of each image frame includes the color value (for example, RGB value) of each pixel. Here, each image frame may be a still image taken at a time interval, or may be one image frame included in a video (moving image). Here, the plurality of image frames may have different positions, directions, distances, angles of view, and the like when the subject is imaged.
The texture image storage unit 15 includes a texture image representing the texture of the entire surface of the subject or at least a part of the surface. The texture image represents the color, shading, pattern, or texture of the subject surface. The texture image stored in the texture image storage unit 15 is an image of a two-dimensional orthogonal coordinate system plane represented by u coordinates and v coordinates, and the plane corresponds to the surface of the subject. In other words, one pixel (specified by coordinates (u, v)) on the planar image corresponds to a certain point on the surface of the subject. The texture image data is represented by the color value (for example, RGB value) of each pixel.

逆ポーズ変換処理部１２は、画像データ記憶部１１から読み出したフレーム画像に基づき、フレーム画像上に撮影されている被写体のポーズを推定する。ここで、ポーズとは、フレーム画像を撮影したカメラとの関係における、被写体の位置や向きや拡大／縮小の度合いである。ポーズとは、空間における被写体の平行移動と、回転量（例えば互いに直交する３軸を中心とする、それぞれの軸についての回転量）と、拡大／縮小の量との組み合わせによって表される量である。拡大／縮小を決める要素は、カメラと被写体との間の距離と、撮影に用いたレンズの画角である。逆ポーズ変換処理部１２は、フレーム画像に写された複数の特徴点を認識し、それらの特徴点の関係に基づいて上記のポーズを推定する。そして、逆ポーズ変換処理部１２は、各々のフレーム画像に、上で推定されたポーズの逆変換をかけることによって、被写体の位置、向き、および拡大／縮小を正規化することができる。また、逆ポーズ変換処理部１２は、不図示の三次元形状モデル記憶部から、被写体の三次元形状の情報を読み出す。この被写体の三次元形状の情報とは、例えば、被写体をメッシュで表現したときの各々のメッシュ頂点の三次元座標値である。逆ポーズ変換処理部１２は、ポーズの逆変換をしたフレーム画像と、この三次元形状の情報とから、フレーム画像上の画素と、テクスチャ画像上の画素との間の対応関係（写像関係）を推定する。逆ポーズ変換処理部１２は、この写像関係を用いてフレーム画像を展開することにより、貼り付け画像を得る。つまり、逆ポーズ変換処理部１２は、第k番目（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）のフレーム画像から、第ｋ番目の貼り付け画像ｓ_ｋ（ｕ，ｖ）を得る。なお、Ｎ_{ｆｒａｍｅｓ}は画像フレームの枚数である。 The inverse pose conversion processing unit 12 estimates the pose of the subject photographed on the frame image based on the frame image read from the image data storage unit 11. Here, the pose is the position and orientation of the subject and the degree of enlargement / reduction in relation to the camera that captured the frame image. A pose is an amount represented by a combination of a parallel movement of a subject in space, a rotation amount (for example, a rotation amount about each axis about three axes orthogonal to each other), and an enlargement / reduction amount. is there. The factors that determine enlargement / reduction are the distance between the camera and the subject and the angle of view of the lens used for shooting. The inverse pose conversion processing unit 12 recognizes a plurality of feature points captured in the frame image, and estimates the pose based on the relationship between the feature points. Then, the inverse pose conversion processing unit 12 can normalize the position, orientation, and enlargement / reduction of the subject by applying inverse conversion of the pose estimated above to each frame image. Further, the reverse pose conversion processing unit 12 reads out information on the three-dimensional shape of the subject from a three-dimensional shape model storage unit (not shown). The information on the three-dimensional shape of the subject is, for example, the three-dimensional coordinate value of each mesh vertex when the subject is represented by a mesh. The inverse pose conversion processing unit 12 obtains a correspondence relationship (mapping relationship) between the pixel on the frame image and the pixel on the texture image from the frame image obtained by inversely converting the pose and the information on the three-dimensional shape. presume. The reverse pose conversion processing unit 12 develops a frame image using this mapping relationship to obtain a pasted image. That is, the reverse pose conversion processing unit 12 obtains the kth pasted image s _k (u, v) from the kth (0 ≦ k <N _frames ) frame image. N _frames is the number of image frames.

逆ポーズ変換処理部１２は、また、貼り付け画像の各画素における方向の重みα^ｋ（ｕ，ｖ）を算出する。ここで、方向の重みとは、例えば、α^ｋ（ｕ，ｖ）＝ｃｏｓ^ｍ（θ^ｋ（ｕ，ｖ））で与えられる重み値である。ここで、ｍは適宜定められる１以上の整数である。また、θ^ｋ（ｕ，ｖ）は、ｋ番目の貼り付け画像上の座標（ｕ，ｖ）における画素に対応する、フレーム画像撮影時の被写体の表面法線と視線の光軸（カメラの光軸）とがなす角度であり、−（π／２）≦θ^ｋ（ｕ，ｖ）≦（π／２）である。既に上でｋ番目のフレーム画像における被写体のポーズが計算されており、また、貼り付け画像上の座標（ｕ，ｖ）に対応する被写体表面における表面法線の方向を前記の三次元形状の情報から求めることができるため、逆ポーズ変換処理部１２はθ^ｋ（ｕ，ｖ）を計算することができる。なお、逆ポーズ変換処理部１２は、ここで計算された角画素における方向の重みα^ｋ（ｕ，ｖ）を、不図示のメモリに書き込むことにより、後で参照できるようにしておく。 The inverse pose conversion processing unit 12 also calculates a direction weight α ^k (u, v) in each pixel of the pasted image. Here, the direction weight is a weight value given by, for example, α ^k (u, v) = cos ^m (θ ^k (u, v)). Here, m is an integer of 1 or more determined as appropriate. Θ ^k (u, v) is the surface normal of the subject and the optical axis of the line of sight (camera light) corresponding to the pixel at coordinates (u, v) on the k-th pasted image. Angle) and − (π / 2) ≦ θ ^k (u, v) ≦ (π / 2). The pose of the subject in the k-th frame image has already been calculated, and the direction of the surface normal on the subject surface corresponding to the coordinates (u, v) on the pasted image is the information on the three-dimensional shape. Therefore, the inverse pose conversion processing unit 12 can calculate θ ^k (u, v). The inverse pose conversion processing unit 12 writes the direction weight α ^k (u, v) in the corner pixel calculated here in a memory (not shown) so that it can be referred to later.

テクスチャマッピング部１４は、画像データ記憶部１１に記憶されているフレーム画像から、テクスチャ画像記憶部１５に記憶されているテクスチャ画像へのマッピングを行なう。言い換えれば、テクスチャマッピング部１４は、個々のフレーム画像に撮像された被写体表面に対応する画素に対応する、テクスチャ画像上の１画素を特定する。さらに、テクスチャマッピング部１４は、このマッピング結果を利用して、単数または複数のフレーム画像から、テクスチャ画像を作成して、テクスチャ画像記憶部１５に書き込むことができる。 The texture mapping unit 14 performs mapping from the frame image stored in the image data storage unit 11 to the texture image stored in the texture image storage unit 15. In other words, the texture mapping unit 14 specifies one pixel on the texture image corresponding to the pixel corresponding to the subject surface imaged in each frame image. Furthermore, the texture mapping unit 14 can create a texture image from one or a plurality of frame images using this mapping result, and write the texture image in the texture image storage unit 15.

つまり、テクスチャマッピング部１４は、テクスチャマッピングにより、テクスチャ画像に、各フレーム画像からの貢献分の和を貼り付ける。フレーム画像の枚数に関する正規化の項をのぞくと、座標（ｕ，ｖ）における和は、次の式（１）のｆ（ｕ，ｖ）の通りである。 That is, the texture mapping unit 14 pastes the sum of contributions from each frame image on the texture image by texture mapping. Excluding the normalization term regarding the number of frame images, the sum at the coordinates (u, v) is as f (u, v) in the following equation (1).

ここで、α^ｋ（ｕ，ｖ）は座標（ｕ，ｖ）における方向の重みであり、ｒ（ハット）_ｋ（ｕ，ｖ）は座標（ｕ，ｖ）におけるテクスチャ画像がｋ番目の画像フレームにおいて映ると推定された位置であり、ｓ_ｋ（ｕ，ｖ）＝ｐ_ｋ（ｒ（ハット）_ｋ（ｕ，ｖ））はその推定された位置におけるＲＧＢ値である。方向の重みの値は、既に逆ポーズ変換処理部１２によって得られている。 Here, α ^k (u, v) is the weight of the direction at the coordinates (u, v), and r (hat) _k (u, v) is the k-th image frame of the texture image at the coordinates (u, v). S _k (u, v) = p _k (r (hat) _k (u, v)) is an RGB value at the estimated position. The value of the direction weight has already been obtained by the reverse pose conversion processing unit 12.

テクスチャマッピング部１４は、ｓ_ｋ（ｕ,ｖ）,０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}のＮ_{ｆｒａｍｅｓ}枚の貼り付け画像をアラインする。つまり、ｋ番目の貼り付け画像におけるｓ_ｋ（ｕ,ｖ）をｓ_ｋ（ｕ＋ｍ^ｋ（ｕ，ｖ）,ｖ＋ｎ^ｋ（ｕ，ｖ））に置き換える。ここで、（ｍ^ｋ（ｕ，ｖ）,ｎ^ｋ（ｕ，ｖ））が、ｋ枚目の貼り付け画像を他の貼り付け画像にアラインさせるための最適位置ずれ量である。
なお、テクスチャマッピング部１４による、位置ずれ量の算出、およびその補正の方法については、後述する。 The texture mapping unit 14 _aligns N _frames pasted images of s _k (u, v), 0 ≦ k <N _frames . That is, s _k (u, v) in the k-th pasted image is replaced with s _k (u + m ^k (u, v), v + n ^k (u, v)). Here, (m ^k (u, v), n ^k (u, v)) is the optimum amount of positional deviation for aligning the k-th pasted image to another pasted image.
A method of calculating the amount of misalignment by the texture mapping unit 14 and correcting it will be described later.

図２は、テクスチャマッピング部１４内部のより詳細な機能構成を示すブロック図である。図示するように、テクスチャマッピング部１４は、貼り付け画像取得部５１−０，５１−１，５１−２，・・・と、ワーピング処理部５２−０，５２−１，・・・と、リファレンス画像生成部５３と、制御部５４と、テクスチャ画像書き込み部５５とを含んで構成される。また、リファレンス画像生成部５３は、内部に、画像加算部６１−０，６１−１，・・・、およびガウシアンウィンドウ（Gaussian window）処理部６２−０，６２−１，・・・を含んでいる。
なお、同図には、ｋ−１番目とｋ番目とｋ＋１番目のフレーム画像に関する構成のみを示して他を省略しているが、実際には、テクスチャマッピング部１４は、第０番目から第（Ｎ_{ｆｒａｍｅｓ}−１）番目までのフレーム画像に対応した構成を備えている。また、同図には、第２の解像度の貼り付け画像を記憶する貼り付け画像取得部５１−２までを示してその後の段階の構成を省略しているが、実際には、テクスチャマッピング部１４は、その後段（より高い解像度に対応）のワーピング処理部および貼り付け画像取得部を備えている。 FIG. 2 is a block diagram showing a more detailed functional configuration inside the texture mapping unit 14. As illustrated, the texture mapping unit 14 includes a pasted image acquisition unit 51-0, 51-1, 51-2,..., A warping processing unit 52-0, 52-1,. An image generation unit 53, a control unit 54, and a texture image writing unit 55 are included. The reference image generation unit 53 includes image addition units 61-0, 61-1,... And Gaussian window processing units 62-0, 62-1,. Yes.
In the figure, only the configuration relating to the (k−1) -th, k-th, and k + 1-th frame images is shown, and the others are omitted. However, in practice, the texture mapping unit 14 uses the 0th to ( It has a configuration corresponding to the N _frames −1) th frame images. Further, in the figure, the pasted image acquisition unit 51-2 for storing the pasted image of the second resolution is shown and the configuration at the subsequent stage is omitted, but in reality, the texture mapping unit 14 is omitted. Includes a warping processing section and a pasted image acquisition section in the subsequent stage (corresponding to a higher resolution).

また、同図では記載を省略しているが、テクスチャマッピング部１４は、画像データ記憶部１１から読み出したデータ、および逆ポーズ変換処理部１２から取得したデータを用いて、以下に述べる処理を行なう。 Although not shown in the figure, the texture mapping unit 14 performs the processing described below using the data read from the image data storage unit 11 and the data acquired from the reverse pose conversion processing unit 12. .

貼り付け画像取得部５１−０，５１−１，５１−２，・・・は、テクスチャ画像に貼り付ける画像を生成するための処理過程の画像データを取得し、記憶する。具体的には、貼り付け画像取得部５１−０，５１−１，５１−２，・・・は、三次元物体を撮像した複数のフレーム画像に含まれる画素の三次元物体の表面における位置を推定して得られる貼り付け画像を取得する。貼り付け画像取得部５１−０は、第０番目の解像度（解像度等については後述する）の貼り付け画像を取得する。貼り付け画像取得部５１−１は、第１番目の解像度の貼り付け画像を取得する。以下同様に、最高解像度（Ｎ_ｒｅｓ−１）まで、各解像度に応じた貼り付け画像取得部が貼り付け画像を取得し、保持する。 The pasted image acquisition units 51-0, 51-1, 51-2,... Acquire and store image data in the process for generating an image to be pasted on the texture image. Specifically, the pasted image acquisition units 51-0, 51-1, 51-2,... Position the positions of the pixels included in the plurality of frame images obtained by capturing the three-dimensional object on the surface of the three-dimensional object. A pasted image obtained by estimation is acquired. The pasted image acquisition unit 51-0 acquires a pasted image having a 0th resolution (resolution will be described later). The pasted image acquisition unit 51-1 acquires a pasted image having the first resolution. Similarly, up to the highest resolution (N _res −1), the pasted image acquisition unit corresponding to each resolution acquires and holds the pasted image.

リファレンス画像生成部５３は、貼り付け画像取得部５１−０，５１−１，５１−２，・・・から読み出した各解像度の貼り付け画像に基づき、各解像度のリファレンス画像ｑ_０（ｕ，ｖ），ｑ_１（ｕ，ｖ），・・・を生成する。具体的には、リファレンス画像生成部５３は、複数の貼り付け画像の和に、所定の解像度によるローパスフィルターをかけることによって当該解像度のリファレンス画像を生成する。リファレンス画像生成の詳細な手順については、後述する。
画像加算部６１−０，６１−１，・・・は、それぞれの解像度における第０番目から第（Ｎ_{ｆｒａｍｅｓ}−１）番目までのフレーム画像に対応する貼り付け画像を、重み付けして加算する。なお、ここで用いる重み値α（ｕ，ｖ）は、前述の方向の重みである。
ガウシアンウィンドウ処理部６２−０，６２−１，・・・は、それぞれの解像度における画像加算部６１−０，６１−１，・・・の出力に、当該解像度のガボールウェーブレットが持つ二次元ガウシアンウィンドウ（窓関数）によるローパスフィルターをかける処理を行なう。 The reference image generation unit 53, based on the pasted image of each resolution read from the pasted image acquisition unit 51-0, 51-1, 51-2, ..., the reference image q ₀ (u, v of each resolution). ), Q ₁ (u, v),. Specifically, the reference image generation unit 53 generates a reference image of the resolution by applying a low-pass filter with a predetermined resolution to the sum of the plurality of pasted images. A detailed procedure for generating the reference image will be described later.
The image adding units 61-0, 61-1,... Add weighted pasted images corresponding to the 0th to (N _frames −1) th frame images in the respective resolutions. The weight value α (u, v) used here is the weight in the above-described direction.
The Gaussian window processing units 62-0, 62-1,... Are output to the output of the image addition units 61-0, 61-1,. Performs low-pass filter processing using (window function).

ワーピング処理部５２−０，５２−１，・・・は、それぞれ、貼り付け画像取得部５１−０，５１−１，・・・から読み出した当該解像度における貼り付け画像と、当該解像度のリファレンス画像ｑ_０（ｕ，ｖ），ｑ_１（ｕ，ｖ），・・・とに基づき、貼り付け画像の各画素における位置ずれ量を推定するとともに、推定された位置ずれ量を用いて次の（一段階高い）解像度の貼り付け画像を生成する。言い換えれば、ワーピング処理部５２−０，５２−１，・・・は、貼り付け画像とリファレンス画像とに基づき貼り付け画像に含まれる画素の位置ずれ量を計算し、得られた位置ずれ量により貼り付け画像の画素の位置をアラインする。このアラインメントの結果により、位置ずれをアラインされた貼り付け画像が得られる。そして、生成された貼り付け画像は、当該解像度用の貼り付け画像取得部５１に渡される。 The warping processing units 52-0, 52-1,... Are the pasted images at the resolution read from the pasted image acquisition units 51-0, 51-1,. Based on q ₀ (u, v), q ₁ (u, v),..., the amount of positional deviation at each pixel of the pasted image is estimated, and the following ( A pasted image with a higher resolution is generated. In other words, the warping processing units 52-0, 52-1,... Calculate the positional deviation amount of the pixels included in the pasted image based on the pasted image and the reference image, and the obtained positional deviation amount. The pixel positions of the pasted image are aligned. As a result of the alignment, a pasted image in which the misalignment is aligned is obtained. Then, the generated pasted image is transferred to the resolution pasted image acquisition unit 51.

制御部５４は、所定の解像度においてワーピング処理部が画素の位置をアラインして得られた貼り付け画像を、貼り付け画像取得部に取得させ、次の解像度においてリファレンス画像生成部にリファレンス画像を生成させるとともに、低解像度側から高解像度側へ、貼り付け画像の画素の位置のアラインを順次繰り返すように制御する。制御部５４の制御による繰り返しを含む具体的処理手順については後述する。 The control unit 54 causes the pasted image acquisition unit to acquire a pasted image obtained by the warping processing unit aligning the pixel positions at a predetermined resolution, and generates a reference image at the next resolution at the reference image generation unit. At the same time, the control is performed so that the alignment of the pixel positions of the pasted image is sequentially repeated from the low resolution side to the high resolution side. A specific processing procedure including repetition by the control of the control unit 54 will be described later.

テクスチャ画像書き込み部５５は、ワーピング処理部５２−０，５２−１，・・・によって順次、位置ずれをアラインされたの複数の貼り付け画像を合成して得られるテクスチャ画像をテクスチャ画像記憶部１５に書き込む。 The texture image writing unit 55 sequentially outputs a texture image obtained by synthesizing a plurality of pasted images whose misalignments are aligned by the warping processing units 52-0, 52-1,. Write to.

ここで、上記の解像度について説明する。解像度は、第０番目の解像度（低解像度）から、第（Ｎ_ｒｅｓ−１）番目の解像度（最高解像度）まで、Ｎ_ｒｅｓ個の段階を有する。Ｎ_ｒｅｓの値は、対象とする画像の解像度に応じて適宜設定される。
そして、一例として、第ｉ番目（０≦ｉ＜Ｎ_ｒｅｓ）の解像度がｂ^ｉとなるようにする。ここで、ｂは適宜定められる整数である。例えば、ｂ＝２とするとき、第ｉ番目の解像度は２^ｉである。つまり、第０番目から順に、解像度は１，２，４，８，１６，・・・という系列となる。そして、対象とする画像の解像度をフルに利用できるようにＮ_ｒｅｓの値を定める。一例として、対象とする画像の解像度が２０４８（縦）×２０４８（横）である場合、ｂ＝２、Ｎ_ｒｅｓ＝１２とすれば、第０番目から第１１番目まで、１，２，４，８，１６，３２、６４、１２８，２５６，５１２，１０２４，２０４８という系列の解像度を用いることとなる。 Here, the resolution will be described. The resolution has N _res stages from the 0th resolution (low resolution) to the (N _res −1) th resolution (highest resolution). The value of N _res is appropriately set according to the resolution of the target image.
Then, as an example, the resolution of the i-th (0 ≦ _{i <N res)} is made to be ^{b i.} Here, b is an integer determined as appropriate. For example, when b = 2, the i-th resolution is 2 ⁱ . That is, in order from the 0th, the resolution is a series of 1, 2, 4, 8, 16,. Then, the value of N _res is determined so that the resolution of the target image can be fully used. As an example, when the resolution of the target image is 2048 (vertical) × 2048 (horizontal), if b = 2 and N _res = 12, the first, second, fourth, A series of resolutions of 8, 16, 32, 64, 128, 256, 512, 1024, and 2048 will be used.

図３は、撮像される被写体と、テクスチャ画像記憶部１５が保持するテクスチャ画像との関係を示す図である。同図（ａ）は、被写体を表した三次元物体ＣＧ（コンピュータグラフィクス）モデルにテクスチャマッピングを施した状態を示すものであり、同図（ｂ）は、テクスチャ画像に三次元物体ＣＧモデルのメッシュを重ね合わせた状態を示す。図示するように、被写体の表面上の一つの点（ｎ番目のメッシュ頂点ｘ^ｎ（三次元座標値に対応するベクトル）と、テクスチャ画像上の画素位置である座標値（ｕ^ｎ，ｖ^ｎ）（０≦ｕ^ｎ≦１，０≦ｖ^ｎ≦１，０≦ｎ＜Ｎ_ＶＴ）とが関係付けられる。なお、Ｎ_ＶＴは、ここで用いるメッシュ頂点の総数である。具体的には、画像処理装置１は、ｎごとに、ｘ^ｎの三次元座標値とＵＶテクスチャ画像上の画素の座標値（ｕ^ｎ，ｖ^ｎ）とを関連付けて保持するテーブルのデータ（不図示）を記憶している。
なお、ここでは被写体である三次元物体として、人の顔（頭部）の例を示したが、実際にはこれに限らず、被写体となり得る全ての三次元物体を扱うことができる。 FIG. 3 is a diagram illustrating the relationship between the subject to be imaged and the texture image held by the texture image storage unit 15. FIG. 4A shows a state in which texture mapping is performed on a 3D object CG (computer graphics) model representing a subject, and FIG. 4B shows a mesh of a 3D object CG model on a texture image. Shows the state of overlapping. As shown in the figure, one point (n-th mesh vertex x ⁿ (vector corresponding to a three-dimensional coordinate value) on the surface of the subject and coordinate values (u ⁿ , v ⁿ ) that are pixel positions on the texture image. (0 ≦ u ⁿ ≦ 1, 0 ≦ v ⁿ ≦ 1, 0 ≦ n <N _VT ), where N _VT is the total number of mesh vertices used here. The processing apparatus 1 stores, for each ⁿ , data (not shown) in a table that holds the three-dimensional coordinate values of ^xn and the coordinate values (u ⁿ , v ⁿ ) of pixels on the UV texture image in association with each other. Yes.
Here, an example of a human face (head) has been shown as a three-dimensional object that is a subject, but in practice, not only this but all three-dimensional objects that can be subjects can be handled.

図４は、画像処理装置１の処理手順を示すフローチャートである。以下、このフローチャートに沿って処理手順を説明する。
まず、ステップＳ１において、貼り付け画像取得部５１−０は、初期の貼り付け画像ｓ_０（ｕ，ｖ），ｓ_１（ｕ，ｖ），・・・，ｓ_ｋ（ｕ，ｖ），・・・，ｓ_{Ｎｆｒａｍｅｓ}（ｕ，ｖ）を取得し、記憶する。第ｋ番目（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）のフレームの初期の貼り付け画像は、第ｋ番目のフレーム画像に基づくものであり、逆ポーズ変換処理部１２の処理に基づいてｕｖ直交座標系に変換されている画像である。
そして、ステップＳ２において、テクスチャマッピング部１４は、変数ｉの値を０に初期化する。この変数ｉは、解像度の段階を表すものである。 FIG. 4 is a flowchart showing a processing procedure of the image processing apparatus 1. Hereinafter, a processing procedure will be described along this flowchart.
First, in step S1, the pasted image acquisition unit 51-0 sets initial pasted images s ₀ (u, v), s ₁ (u, v),..., S _k (u, v),. _.. , S _Nframes (u, v) is acquired and stored. The initial pasted image of the kth frame (0 ≦ k <N _frames ) is based on the kth frame image, and is converted into the uv orthogonal coordinate system based on the processing of the inverse pose conversion processing unit 12. It is an image that has been.
In step S2, the texture mapping unit 14 initializes the value of the variable i to 0. This variable i represents the resolution stage.

（ａ０）リファレンス画像の構築
次にステップＳ３において、リファレンス画像生成部５３が、既に得られている貼り付け画像を元に、第ｉ番目の解像度のリファレンス画像を構築する。ｉ＝０であるので、リファレンス画像生成部５３は、貼り付け画像取得部５１−０から貼り付け画像を読み出す。そして、リファレンス画像生成部５３は、低解像度（第０番目の解像度）でリファレンス画像ｑ_０（ｕ，ｖ）を構築する。この画像は、各映像フレームからの貼り付け画像の重み付きの和に、低解像度のガボールウェーブレットが持つ二次元ガウシアンウィンドウによりローパスフィルターを掛けたものである。リファレンス画像生成部５３は、次の式（２）で表される計算によりリファレンス画像ｑ_０（ｕ，ｖ）を生成する。 (A0) Construction of Reference Image Next, in step S3, the reference image generation unit 53 constructs a reference image having the i-th resolution based on the pasted image already obtained. Since i = 0, the reference image generation unit 53 reads the pasted image from the pasted image acquisition unit 51-0. Then, the reference image generation unit 53 constructs the reference image q ₀ (u, v) with a low resolution (0th resolution). This image is obtained by applying a low-pass filter to the weighted sum of pasted images from each video frame by a two-dimensional Gaussian window possessed by a low-resolution Gabor wavelet. The reference image generation unit 53 generates a reference image q ₀ (u, v) by calculation represented by the following equation (2).

なおこのとき、画像加算部６１−０が貼り付け画像の重み付きの和の計算を行い、ガウシアンウィンドウ処理部６２−０が当該解像度の二次元ガウシアンウィンドウの畳み込み計算を行なう。 At this time, the image adding unit 61-0 calculates the weighted sum of the pasted image, and the Gaussian window processing unit 62-0 performs the convolution calculation of the two-dimensional Gaussian window of the resolution.

（ｂ０）リファレンス画像とのワーピングの算出
次にステップＳ４において、ワーピング処理部５２−０が、低解像度のガボールウェーブレットにより、各貼り付け画像ｓ_ｋ（ｕ,ｖ）の各画素において、リファレンス画像ｑ_０（ｕ，ｖ）との位置ずれを推定する。なお、ｉ＝０である。
位置ずれ量を推定するための具体的な計算方法については、後述する。
そして、この計算により、（ｍ^ｋ _０（ｕ，ｖ）,ｎ^ｋ _０（ｕ，ｖ））という、低解像度の（第０番目の解像度の）画像成分のみから推定した位置ずれ量が得られる。
そして、ステップＳ５において、ワーピング処理部５２−０が、上のステップで得られた位置ずれ量を用いて、貼り付け画像をワーピングする。これは、即ち、各々のｋについて、ｓ_ｋ（ｕ＋ｍ^ｋ _０（ｕ，ｖ）,ｖ＋ｎ^ｋ _０（ｕ，ｖ））という貼り付け画像を得ることである。そして、ワーピング処理部５２−０は、ワーピングによって得られた貼り付け画像を、貼り付け画像取得部５１−１に供給する。 (B0) Calculation of Warping with Reference Image Next, in step S4, the warping processing unit 52-0 uses the low resolution Gabor wavelet to calculate the reference image q at each pixel of each pasted image s _k (u, v). A positional deviation from ₀ (u, v) is estimated. Note that i = 0.
A specific calculation method for estimating the displacement amount will be described later.
By this calculation, a misregistration amount estimated from only a low-resolution (0th resolution) image component (m ^k ₀ (u, v), n ^k ₀ (u, v)) is obtained. .
In step S5, the warping processing unit 52-0 warps the pasted image using the positional deviation amount obtained in the above step. That is, for each k, a pasted image of s _k (u + m ^k ₀ (u, v), v + n ^k ₀ (u, v)) is obtained. Then, the warping processing unit 52-0 supplies the pasted image obtained by the warping to the pasted image acquisition unit 51-1.

次にステップＳ６において、テクスチャマッピング部１４は、最高解像度の処理を完了したか否かを判断する。最高解像度までの処理を完了した場合（ステップＳ６：ＹＥＳ）にはこのフローチャート全体の処理を終了し、未完了の場合（ステップＳ６：ＮＯ）にはステップＳ７に進む。
ステップＳ７に進んだ場合、同ステップでは、テクスチャマッピング部１４は、変数ｉに１を加算する。これにより、変数ｉの値は、次の解像度の段階を表すことになる。ここでは、ｉ＝１となる。このステップの処理を終えると、ステップＳ３に移る。 Next, in step S6, the texture mapping unit 14 determines whether the highest resolution processing has been completed. When the process up to the maximum resolution is completed (step S6: YES), the process of the entire flowchart is terminated, and when the process is not completed (step S6: NO), the process proceeds to step S7.
When the process proceeds to step S7, in this step, the texture mapping unit 14 adds 1 to the variable i. Thus, the value of the variable i represents the next resolution stage. Here, i = 1. When the process of this step is finished, the process proceeds to step S3.

（ａ１）次の解像度でのリファレンス画像の構築
次にステップＳ３において、リファレンス画像生成部５３が、既に得られている貼り付け画像を元に、第ｉ番目の解像度のリファレンス画像を構築する。ｉ＝１となったので、次に高い解像度（第１番目の解像度）でリファレンス画像ｑ_１（ｕ，ｖ）を構築する。このリファレンス画像ｑ_１（ｕ，ｖ）は、低解像度（第０番目の解像度）で得られた位置ずれ量を反映させた後、各映像フレームからの貼り付け画像の重み付きの和に、第１番目の解像度のガボールウェーブレットが持つ二次元ガウシアンウィンドウによりローパスフィルターをかけたものである。リファレンス画像生成部５３は、次の式（３）で表される計算によりリファレンス画像ｑ_１（ｕ，ｖ）を生成する。 (A1) Construction of Reference Image at Next Resolution In step S3, the reference image generation unit 53 constructs a reference image at the i-th resolution based on the pasted image already obtained. Since i = 1, the reference image q ₁ (u, v) is constructed with the next highest resolution (first resolution). This reference image q ₁ (u, v) reflects the amount of displacement obtained at a low resolution (0th resolution), and then adds the weighted sum of the pasted image from each video frame. A low-pass filter is applied by a two-dimensional Gaussian window possessed by the first resolution Gabor wavelet. The reference image generation unit 53 generates a reference image q ₁ (u, v) by calculation represented by the following equation (3).

なおこのとき、画像加算部６１−１が貼り付け画像の重み付きの和の計算を行い、ガウシアンウィンドウ処理部６２−１が当該解像度の二次元ガウシアンウィンドウの畳み込み計算を行なう。 At this time, the image adding unit 61-1 calculates the weighted sum of the pasted image, and the Gaussian window processing unit 62-1 calculates the convolution of the two-dimensional Gaussian window of the resolution.

（ｂ１）リファレンス画像とのワープの算出
次にステップＳ４において、ワーピング処理部５２−１が、第１番目のガボールウェーブレットにより、各貼り付け画像ｓ_ｋ（ｕ,ｖ）の各画素において、リファレンス画像ｑ_０（ｕ，ｖ）との位置ずれを推定する。なお、ｉ＝０である。
つまり、ワーピング処理部５２−１は、低解像度（第０番目の解像度）でワープされた各貼り付け画像ｓ_ｋ（ｕ＋ｍ^ｋ _０（ｕ，ｖ）,ｖ＋ｎ^ｋ _０（ｕ，ｖ））の各画素において、リファレンス画像ｑ_１（ｕ，ｖ）との位置ずれ量を推定する。位置ずれ量を計算する方法は、第０番目の解像度の場合と同様である。この計算により、（ｍ^ｋ _１（ｕ，ｖ）,ｎ^ｋ _１（ｕ，ｖ））という、第１番目の解像度の画像成分のみから推定した位置ずれ量が得られる。 (B1) Calculation of Warp with Reference Image Next, in step S4, the warping processing unit 52-1 uses the first Gabor wavelet to generate a reference image for each pixel of each pasted image s _k (u, v). The positional deviation from q ₀ (u, v) is estimated. Note that i = 0.
In other words, the warping processing unit 52-1 executes each of the pasted images s _k (u + m ^k ₀ (u, v), v + n ^k ₀ (u, v)) warped at a low resolution (the 0th resolution). In a pixel, the amount of positional deviation from the reference image q ₁ (u, v) is estimated. The method for calculating the amount of displacement is the same as in the case of the 0th resolution. By this calculation, a misregistration amount estimated from only the image component of the first resolution, (m ^k ₁ (u, v), n ^k ₁ (u, v)), is obtained.

そして、ステップＳ５において、ワーピング処理部５２−１が、上のステップで得られた位置ずれ量を用いて、貼り付け画像をワーピングする。これは、即ち、各々のｋについて、ｓ_ｋ（ｕ＋ｍ^ｋ _０（ｕ，ｖ）＋ｍ^ｋ _１（ｕ，ｖ）,ｖ＋ｎ^ｋ _０（ｕ，ｖ）＋ｎ^ｋ _１（ｕ，ｖ））という貼り付け画像を得ることである。そして、ワーピング処理部５２−１は、ワーピングによって得られた貼り付け画像を、貼り付け画像取得部５１−２に供給する。 In step S5, the warping processing unit 52-1 warps the pasted image using the positional deviation amount obtained in the above step. That is, for each k, a paste of s _k (u + m ^k ₀ (u, v) + m ^k ₁ (u, v), v + n ^k ₀ (u, v) + n ^k ₁ (u, v))) To get an image. Then, the warping processing unit 52-1 supplies the pasted image obtained by the warping to the pasted image acquisition unit 51-2.

そして、ステップＳ６およびＳ７では、テクスチャマッピング部１４は、再び、変数ｉに１を加算し、即ちｉ＝２として、次の解像度の処理に移る。
以上、（ａ０），（ｂ０），（ａ１），（ｂ１）と、第０番目および第１番目の解像度において、それぞれ、位置ずれ量を計算し、その位置ずれ量によるワーピングを行なった。そして、その後、第２番目以降の解像度についても同様に処理を行なう。
第ｉ番目の解像度（ｉ＝０，１，２，・・・，Ｎ_ｒｅｓ−１）における一般的な処理は次の通りである。 In steps S6 and S7, the texture mapping unit 14 again adds 1 to the variable i, i.e., sets i = 2, and proceeds to the next resolution process.
As described above, the amount of displacement is calculated for each of (a0), (b0), (a1), (b1) and the 0th and first resolutions, and warping is performed based on the amount of displacement. Thereafter, the same processing is performed for the second and subsequent resolutions.
General processing at the i-th resolution (i = 0, 1, 2,..., N _res −1) is as follows.

（ａ）第ｉ番目の解像度でリファレンス画像の構築
ステップＳ３において、リファレンス画像生成部５３が、既に得られている貼り付け画像を元に、第ｉ番目の解像度のリファレンス画像ｑ_ｉ（ｕ，ｖ）を構築する。この画像は、各映像フレームからの貼り付け画像の重み付きの和に、低解像度のガボールウェーブレットが持つ二次元ガウシアンウィンドウによりローパスフィルターを掛けたものである。ｑ_ｉ（ｕ，ｖ）は、下の式（４）で表される。 (A) Construction of Reference Image with i-th Resolution In step S3, the reference image generation unit 53 uses the already obtained pasted image as a reference image q _i (u, v) with the i-th resolution. ) Build. This image is obtained by applying a low-pass filter to the weighted sum of pasted images from each video frame by a two-dimensional Gaussian window possessed by a low-resolution Gabor wavelet. q _i (u, v) is expressed by the following equation (4).

但し、ここでｕ^ｋ _ｉ（ｕ，ｖ）およびｖ^ｋ _ｉ（ｕ，ｖ）は、それぞれ、第０番目の解像度から第ｉ番目の解像度までの位置ずれ量をアラインした座標値であり、下の式（５）および式（６）の通りである。 Here, u ^k _i (u, v) and v ^k _i (u, v) are coordinate values obtained by aligning displacement amounts from the 0th resolution to the i-th resolution, respectively. (5) and (6).

（ｂ）リファレンス画像とのワープの算出
ステップＳ４において、ワーピング処理部（５２−ｉ）が、第ｉ番目の解像度でのガボールウェーブレットにより、第ｉ番目より前の解像度においてワープされた貼り付け画像の各画素における、リファレンス画像と間の位置ずれ量を、推定する。０＜ｉのとき、第（ｉ−１）番目の解像度においてワープされた貼り付け画像は、ｓ_ｋ（ｕ^ｋ _ｉ−１（ｕ，ｖ）,ｖ^ｋ _ｉ−１（ｕ，ｖ））である。
位置ずれ量を計算する方法は、第０番目および第１番目の解像度の場合について述べた方法と同様である。この計算により、（ｍ^ｋ _ｉ（ｕ，ｖ）,ｎ^ｋ _ｉ（ｕ，ｖ））という、第ｉ番目の解像度の画像成分のみから推定した位置ずれ量が得られる。
この位置ずれ量をアラインすることにより、第ｉ番目の解像度においてワープされた貼り付け画像ｓ_ｋ（ｕ^ｋ _ｉ（ｕ，ｖ）,ｖ^ｋ _ｉ（ｕ，ｖ））が得られる。 (B) Calculation of warp with reference image In step S4, the warping processing unit (52-i) uses the Gabor wavelet at the i-th resolution to calculate the pasted image warped at the resolution before the i-th resolution. The amount of positional deviation between each pixel and the reference image is estimated. When 0 <i, the pasted image warped at the (i−1) -th resolution is s _k (u ^k _i−1 (u, v), v ^k _i−1 (u, v)). is there.
The method for calculating the amount of displacement is the same as the method described for the 0th and 1st resolutions. By this calculation, a displacement amount estimated from only the image component of the i-th resolution, (m ^k _i (u, v), n ^k _i (u, v)), is obtained.
By aligning the misalignment amounts, a pasted image s _k (u ^k _i (u, v), v ^k _i (u, v)) warped at the i-th resolution is obtained.

ステップＳ５において、ワーピング処理部（５２−ｉ）が、ステップＳ４で得られた位置ずれ量を用いて、貼り付け画像をワーピングする。そして、ワーピング処理部（５２−ｉ）は、ワーピングによって得られた貼り付け画像を、次の解像度に対応する貼り付け画像取得部に供給する。 In step S5, the warping processing unit (52-i) warps the pasted image using the positional deviation amount obtained in step S4. Then, the warping processing unit (52-i) supplies the pasted image obtained by the warping to the pasted image acquisition unit corresponding to the next resolution.

第ｉ番目の解像度における（ａ）リファレンス画像の構築および（ｂ）リファレンス画像とのワープの算出については上に説明したとおりである。この処理を、第０番目の解像度から最高解像度Ｎ_ｒｅｓ−１まで繰り返すことにより、正確に非線形的アラインメントを計算することができる。 (A) Construction of a reference image and (b) calculation of a warp with a reference image at the i-th resolution are as described above. By repeating this process from the 0th resolution to the highest resolution N _res −1, the nonlinear alignment can be accurately calculated.

なお、ベクトルｘ（矢印）で表される座標における上記のガボールウェーブレットｇ^ｒ，φ（ｘ（矢印））は、式（７）および式（８）に示す通りである。 The Gabor wavelet gr ^{, φ} (x (arrow)) at the coordinates represented by the vector x (arrow) is as shown in the equations (7) and (8).

また、それをもつガウシアンｇ_ｒ（ｘ（矢印））は、式（９）に示す通りである（ｒ＝１，２，・・・）。 Further, Gaussian g _r (x (arrow)) having the same is as shown in Expression (9) (r = 1, 2,...).

そして、テクスチャマッピング部１４は、位置ずれのアライン済みの貼り付け画像をテクスチャ画像に貼り付ける。下の式（１０）は、貼り付け画像を貼り付けた後のテクスチャ画像の例を表す。 Then, the texture mapping unit 14 pastes the misaligned aligned pasted image on the texture image. The following formula (10) represents an example of the texture image after the pasted image is pasted.

式（１０）において、ｔ（ｕ，ｖ）は、貼り付け画像を貼り付けた後のテクスチャ画像の座標（ｕ，ｖ）における色値である。α^ｋ´（ｕ，ｖ）は、位置ずれをアラインした後の座標（ｕ，ｖ）におけるフレーム画像ｋ（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）に対応した方向の重みである。ｓ_ｋ´（ｕ，ｖ）は、位置ずれをアラインした後の、フレーム画像ｋに対応した貼り付け画像である。ｔ_０（ｕ，ｖ）は、テクスチャ画像の初期値である。テクスチャ画像の初期値を用いない場合は、一律にｔ_０（ｕ，ｖ）＝０として式（１０）を用いればよい。そして、γは、テクスチャ画像の初期値の重みを調整するためのパラメータ値である。
このように、テクスチャマッピング部１４は、位置ずれをアラインした後の貼り付け画像を、適宜重み付けして加算し、テクスチャ画像に貼り付ける（合成する）計算を行なう。そして、テクスチャマッピング部１４内のテクスチャ画像書き込み部５５は、得られたテクスチャ画像をテクスチャ画像記憶部１５に書き込む。 In Expression (10), t (u, v) is a color value at the coordinates (u, v) of the texture image after pasting the pasted image. α ^k ′ (u, v) is a weight in the direction corresponding to the frame image k (0 ≦ k <N _frames ) in the coordinates (u, v) after aligning the positional deviation. s _k ′ (u, v) is a pasted image corresponding to the frame image k after alignment of the positional deviation. t ₀ (u, v) is an initial value of the texture image. When the initial value of the texture image is not used, equation (10) may be used with t ₀ (u, v) = 0 uniformly. Γ is a parameter value for adjusting the weight of the initial value of the texture image.
In this way, the texture mapping unit 14 performs a calculation of adding and pasting (synthesizing) the pasted image after aligning the positional deviation appropriately weighted and added to the texture image. Then, the texture image writing unit 55 in the texture mapping unit 14 writes the obtained texture image into the texture image storage unit 15.

［位置ずれ量を推定する方法］
次に、図４のステップＳ４において説明した、位置ずれを推定する計算方法の詳細について説明する。
ここで、まず「ジェット」（ｊｅｔ）について説明する。ジェットとは、ｕ−ｖ座標における画像上の１画素に対応する量である。座標（ｕ，ｖ）におけるジェットＪ（ｕ，ｖ）は、４０個の値Ｊ_ｊ（ｕ，ｖ）（但し、０≦ｊ＜４０，ｊは整数）の集合体である。
ここで、ｊは指標値であり、ｊは別の指標値μ（但し、０≦μ＜８，μは整数）およびν（但し、０≦ν＜５，νは整数）を用いて、ｊ＝μ＋８ν、と表される。
なお、νは、ガボールウェーブレットの周波数に関する指標値である。また、μは、ガボールウェーブレットの方向に関する指標値である。 [Method for estimating misalignment]
Next, the details of the calculation method for estimating the displacement described in step S4 of FIG. 4 will be described.
Here, “jet” (jet) will be described first. A jet is an amount corresponding to one pixel on an image in uv coordinates. The jet J (u, v) at the coordinates (u, v) is an aggregate of 40 values J _j (u, v) (where 0 ≦ j <40, j is an integer).
Here, j is an index value, j is another index value μ (where 0 ≦ μ <8, μ is an integer) and ν (where 0 ≦ ν <5, ν is an integer), and j = Μ + 8ν.
Note that ν is an index value related to the frequency of the Gabor wavelet. Μ is an index value related to the direction of the Gabor wavelet.

ジェットを構成する上記のＪ_ｊ（ｕ，ｖ）は、次の式（１１）で表される。 The above-mentioned J _j (u, v) constituting the jet is represented by the following equation (11).

なお、ベクトルｘ（矢印）は、座標（ｕ，ｖ）を示す。また、ψ_ｊ（ｘ（矢印））は、次の式（１２）で表されるガボールカーネルである。 The vector x (arrow) indicates coordinates (u, v). Ψ _j (x (arrow)) is a Gabor kernel represented by the following equation (12).

上の式（１２）で表されるガボール関数は、ガウス包絡関数によって制約されており、またベクトルｋ（矢印）はウェーブベクトルである。このウェーブベクトルｋ（矢印）は、次の式（１３）で表される。 The Gabor function expressed by Equation (12) above is constrained by a Gaussian envelope function, and the vector k (arrow) is a wave vector. This wave vector k (arrow) is expressed by the following equation (13).

なお、前記の指標値νを用いて表されるｋ_νは、次の式（１４）の通りである。 In addition, kv represented using the said index value (nu) is as the following formula _| equation (14).

また、前記の指標値μを用いて表されるθ_μは、次の式（１５）の通りである。 Further, θ _μ expressed using the index value μ is as shown in the following formula (15).

上記のように定義されるＪ_ｊ（ｕ，ｖ）は、次の式（１６）のように表すことができる。 J _j (u, v) defined as described above can be expressed as the following Expression (16).

そして、上記のように定義されるジェットを用いて位置のずれ量を推定する。２つのジェット（ＪおよびＪ´）の間の類似度Ｓ_Φは、次の近似式（１７）で表される。 Then, the positional deviation amount is estimated using the jet defined as described above. The similarity S _[Phi between two jets (J and J'), is expressed by the following approximate expression (17).

ここで、上記のジェット間の類似度Ｓ_Φが最大となるように、ずれ量ベクトルｄ（矢印）＝（ｄ_ｘ，ｄ_ｙ）を求める。そのために式（１８）の条件を設定する。 Here, the deviation vector d (arrow) = (d _x , d _y ) is obtained so that the similarity S _Φ between the jets is maximized. For this purpose, the condition of equation (18) is set.

この条件を満たすようにずれ量ベクトルｄ（矢印）（Ｊ，Ｊ´）を解くと、次の式（１９）による解が得られる（但し、Γ_ｘｘΓ_ｙｙ−Γ_ｘｙΓ_ｙｘ≠０の場合）。 Deviation so as to satisfy the condition amount vector d (arrow) (J, J') Solving, solutions according to the following equation (19) is obtained (in the case of _{_{_{_{Γ xx Γ yy -Γ xy Γ yx}}}} ≠ 0 ).

ここで得られたずれ量ベクトルが、当該解像度における位置ずれ量である。なお、上の式において、それぞれ、次の式（２０）〜（２４）の通りである。 The displacement vector obtained here is the positional displacement amount at the resolution. In the above formula, the following formulas (20) to (24) are provided.

［第２の実施形態］
次に第２の実施形態について説明する。第２の実施形態は、第１の実施形態による画像処理装置の応用であり、顔画像をモデル化してデータベースに登録する顔画像処理装置である。以下では、本実施形態特有の事項を記載し、第１の実施形態と共通する事項については記載を省略する。 [Second Embodiment]
Next, a second embodiment will be described. The second embodiment is an application of the image processing apparatus according to the first embodiment, and is a face image processing apparatus that models a face image and registers it in a database. In the following, items specific to the present embodiment will be described, and description of items common to the first embodiment will be omitted.

図５は、同実施形態による顔画像処理装置の機能構成を示すブロック図である。同図に示すように、顔画像処理装置１００（画像処理装置）は、画像データ記憶部１１０と、顔領域検出照合部１２０と、三次元推定部１３０と、レンダリング部１４０と、データベース登録部１５０と、顔特徴データベース部１６０とを含んで構成される。
同図における顔画像処理装置１００は、登録モードまたは認識モードいずれかのモードに設定されて動作する。顔画像処理装置１００は、人物の顔画像が含まれた、映像データの画像フレームまたは複数の静止画像データ（静止画像データも、以下では画像フレームと呼ぶ。）を入力ソースとして、画像フレームから顔画像領域を検出し、この顔画像領域から顔特徴データを生成して登録する処理（登録モード）または照合する処理（認識モード）を実行する。また、顔画像処理装置１００は、画像フレームに含まれない頭部姿勢の顔画像に対応する顔特徴データを近似合成して登録または照合する処理を実行する。 FIG. 5 is a block diagram showing a functional configuration of the face image processing apparatus according to the embodiment. As shown in the figure, the face image processing apparatus 100 (image processing apparatus) includes an image data storage unit 110, a face area detection collation unit 120, a three-dimensional estimation unit 130, a rendering unit 140, and a database registration unit 150. And a face feature database unit 160.
The face image processing apparatus 100 in the figure operates by being set to either the registration mode or the recognition mode. The face image processing apparatus 100 uses an image frame of video data or a plurality of still image data (still image data is also referred to as an image frame hereinafter) including a human face image as an input source. An image region is detected, and processing for generating and registering face feature data from the face image region (registration mode) or verification processing (recognition mode) is executed. In addition, the face image processing apparatus 100 executes a process of approximating and registering or collating face feature data corresponding to a face image of a head posture that is not included in an image frame.

なお、入力ソースが映像データであるか複数の静止画像データであるかによって、本実施形態の処理に大きな差異は生じない。よって、本実施形態の以下の記載では、映像データの各画像フレームに含まれた人物の顔画像に基づいて顔特徴データを生成し、近似合成して、登録または照合する処理について説明する。入力ソースが静止画像データである場合における処理は、本質的には入力ソースが映像データである場合と同様である。 Note that there is no significant difference in the processing of this embodiment depending on whether the input source is video data or a plurality of still image data. Therefore, in the following description of the present embodiment, a process of generating facial feature data based on a human face image included in each image frame of video data, performing approximate synthesis, and registering or collating will be described. Processing when the input source is still image data is essentially the same as when the input source is video data.

画像データ記憶部１１０は、登録対象人物の顔画像が含まれた映像データを記憶する。顔領域検出照合部１２０は、画像データ記憶部１１０から映像データを読み込み、画像フレームごとに顔画像領域を検出して複数の特徴点の位置を推定する。この推定処理は、時間的に連続した画像フレームについて行われるものである。そして、顔領域検出照合部１２０は、推定した位置の特徴点と顔特徴データベース部１６０に記憶された顔特徴データである可変テンプレート構造体（詳細は後述）とを照合し、特徴点の追跡結果または認識結果を出力する。つまり、顔領域検出照合部１２０は、特徴点ベースの顔画像照合アルゴリズムによって特徴点の追跡または認識を行うものであり、特徴点の追跡および顔領域の検出自体は既存技術を用いて行なえる。例えば、参考文献［Clippingdale, S., 伊藤崇之，“動画像の顔検出・追跡・認識への統一されたアプローチ，”電子情報通信学会技術研究報告，PRMU98-200，大阪大学，1999.］や、参考文献［Clippingdale, S., Ito, T., “A Unified Approach to Video Face Detection, Tracking and Recognition,” Proc. ICIP’99, Kobe, Japan, 1999.］などに記載されたＦＡＶＲＥＴシステムを適用することができる。ＦＡＶＲＥＴシステムは、映像（動画）にも対応でき、時間経過に伴って変化する頭部姿勢にも対応できる。ＦＡＶＲＥＴシステムにおけるデータ表現は、ＥＢＧＭアルゴリズムにおけるデータ表現（つまり、各特徴点の位置、およびその位置で計測されたガボールウェーブレット特徴）に基づいている。ＦＡＶＲＥＴシステムでは、複数の頭部姿勢への対応と、リアルタイム映像における動作とを実現するために、複数の頭部姿勢についての認識対象顔画像を顔画像データベースに登録し、各画像フレームでの特徴点の探索を、前画像フレームで推定した位置から開始するようにしている。つまり、ＦＡＶＲＥＴシステムは、特徴点の追跡処理を実行することによって探索範囲を削減している。 The image data storage unit 110 stores video data including a face image of a person to be registered. The face area detection / collation unit 120 reads video data from the image data storage unit 110, detects a face image area for each image frame, and estimates the positions of a plurality of feature points. This estimation process is performed for temporally continuous image frames. Then, the face area detection / collation unit 120 collates the estimated feature point with a variable template structure (details will be described later) stored in the face feature database unit 160, and the tracking result of the feature point Alternatively, the recognition result is output. That is, the face area detection / collation unit 120 performs tracking or recognition of feature points using a feature point-based face image matching algorithm, and the tracking of the feature points and the detection of the face area itself can be performed using existing techniques. For example, reference [Clippingdale, S., Takayuki Ito, “Unified approach to face detection / tracking / recognition of moving images,” IEICE technical report, PRMU98-200, Osaka University, 1999.] FAVRET system described in references [Clippingdale, S., Ito, T., “A Unified Approach to Video Face Detection, Tracking and Recognition,” Proc. ICIP'99, Kobe, Japan, 1999.] can do. The FAVRET system can deal with video (moving images), and can also deal with head postures that change over time. The data representation in the FAVRET system is based on the data representation in the EBGM algorithm (that is, the position of each feature point and the Gabor wavelet feature measured at that position). In the FAVRET system, the recognition target face images for a plurality of head postures are registered in the face image database in order to realize the correspondence to the plurality of head postures and the operation in the real time video, and the features in each image frame The point search is started from the position estimated in the previous image frame. That is, the FAVRET system reduces the search range by executing the feature point tracking process.

顔領域検出照合部１２０は、登録モードまたは認識モードいずれのモードに設定されている場合でも、画像フレームごとに顔画像領域から複数の特徴点の位置を推定してそれらの特徴点位置（画像における二次元座標値）を照合の結果として出力する。また、認識モードに設定されている場合は、顔領域検出照合部１２０は、顔特徴データベース部１６０に記憶された登録済人物ごとの顔特徴データとの類似度を計算し、類似度の最も高い顔特徴データに関するマッチスコアを出力する。このマッチスコアは、例えば、登録済人物の氏名又は登録済人物を特定するための名称（ニックネーム等）と、同人物の識別情報（識別番号等）と、計算された類似度とを関連付けた情報である。 The face area detection / collation unit 120 estimates the position of a plurality of feature points from the face image area for each image frame, regardless of whether the registration mode or the recognition mode is set. 2D coordinate value) is output as a result of matching. When the recognition mode is set, the face area detection / collation unit 120 calculates the similarity with the face feature data for each registered person stored in the face feature database unit 160 and has the highest similarity. Outputs match score for facial feature data. This match score is, for example, information that associates the name of a registered person or a name for identifying a registered person (such as a nickname), identification information (such as an identification number) of the person, and the calculated similarity. It is.

図６は、顔領域検出照合部１２０が顔画像領域から推定した特徴点位置を模式的に示した図である。同図において、顔領域検出照合部１２０が推定する特徴点位置は、顔画像領域における目頭、目尻、および口元に相当する位置（それぞれ２箇所ずつ）、ならびに鼻元、鼻先、唇先に相当する位置（それぞれ1箇所ずつ）である。なお、顔領域検出照合部１２０は、上記の特徴点位置のうち、人物の頭部姿勢に応じて可視である部分の特徴点位置のみを推定する。 FIG. 6 is a diagram schematically showing the feature point positions estimated from the face image area by the face area detection collating unit 120. In the figure, the feature point positions estimated by the face area detection / collation unit 120 correspond to positions corresponding to the eyes, the corners of the eyes, and the mouth of the face image area (two each), and the nose, nose, and tip of the lip. Position (one for each). Note that the face area detection / collation unit 120 estimates only the feature point position of the visible part according to the head posture of the person among the above-described feature point positions.

図５に戻り、三次元推定部１３０は、各特徴点の三次元位置を推定し、この推定位置に、ジェネリック（人物不特定）な三次元コンピュータグラフィックス顔メッシュモデル（三次元ＣＧ顔モデル，ジェネリックモデル）のメッシュ形状をワーピングさせる。このワーピングは、三次元推定部１３０が、推定された各特徴点の三次元位置に、三次元ＣＧ顔モデルにおけるこれら特徴点に相当するメッシュ頂点の三次元位置を合わせ、メッシュ形状を合わせ込んで修正三次元ＣＧ顔モデル（修正顔モデル）を生成する処理のことをいう。三次元ＣＧ顔モデルおよび修正三次元ＣＧ顔モデルは、メッシュで構成される多角形からなる立体を表わすモデルである。それらの多角形が、このモデルにおける頭部形状の表面を構成する。人の頭部形状によりこのモデルが表わす立体は、頭部全体にわたって大まかにはほぼ単調に凸状である。そして、三次元推定部１３０は、映像データから修正三次元ＣＧ顔モデルにテクスチャを貼り付けて、映像データに含まれた顔が適用されたＣＧ顔モデルを出力する。 Returning to FIG. 5, the three-dimensional estimation unit 130 estimates a three-dimensional position of each feature point, and a generic (person unspecified) three-dimensional computer graphics face mesh model (three-dimensional CG face model, Warping mesh shape of generic model. In this warping, the three-dimensional estimation unit 130 matches the three-dimensional positions of mesh vertices corresponding to these feature points in the three-dimensional CG face model to the estimated three-dimensional positions of the respective feature points, and matches the mesh shape. This refers to processing for generating a corrected three-dimensional CG face model (corrected face model). The three-dimensional CG face model and the modified three-dimensional CG face model are models that represent a solid made up of polygons composed of meshes. These polygons constitute the head-shaped surface in this model. The solid represented by this model according to the human head shape is roughly monotonously convex over the entire head. Then, the 3D estimation unit 130 pastes a texture from the video data to the modified 3D CG face model, and outputs a CG face model to which the face included in the video data is applied.

同図に示すように、三次元推定部１３０は、位置・姿勢推定部１３１と、メッシュワーピング部１３２と、テクスチャマッピング部１３３と、顔モデル記憶部１３４と、メッシュ頂点割当情報記憶部１３５と、テクスチャ画像記憶部１３６と、画素割当情報記憶部１３７とを含んで構成される。 As shown in the figure, the three-dimensional estimation unit 130 includes a position / posture estimation unit 131, a mesh warping unit 132, a texture mapping unit 133, a face model storage unit 134, a mesh vertex assignment information storage unit 135, A texture image storage unit 136 and a pixel allocation information storage unit 137 are included.

位置・姿勢推定部１３１は、顔領域検出照合部１２０で推定された、映像データの各画像フレームにおける特徴点位置である特徴点二次元座標値から、各特徴点の三次元位置である特徴点三次元座標値と画像フレームごとの頭部姿勢の角度データ（鉛直方向の軸を中心とする回転位置（角度）、もしくは水平方向の軸を中心とする回転位置（角度）、またはそれらの組み合わせ）とを対応付けて推定する。 The position / orientation estimation unit 131 uses the feature point two-dimensional coordinate value, which is the feature point position in each image frame of the video data, estimated by the face area detection collation unit 120, and the feature point that is the three-dimensional position of each feature point Three-dimensional coordinate values and head posture angle data for each image frame (rotational position (angle) around the vertical axis, or rotational position (angle) around the horizontal axis, or a combination thereof) Are estimated in association with each other.

メッシュワーピング部１３２は、位置・姿勢推定部１３１で推定された各特徴点の特徴点三次元座標値位置に、顔モデル記憶部１３４に記憶された三次元ＣＧ顔モデルのメッシュ頂点をワーピングさせて、修正三次元ＣＧ顔モデルを生成する。 The mesh warping unit 132 warps the mesh vertex of the 3D CG face model stored in the face model storage unit 134 to the feature point 3D coordinate value position of each feature point estimated by the position / posture estimation unit 131. Then, a modified three-dimensional CG face model is generated.

メッシュ頂点割当情報記憶部１３５は、メッシュワーピング部１３２が三次元ＣＧ顔モデルのメッシュ頂点をワーピングさせる際に、各メッシュ頂点と、メッシュ頂点を頂点とする三角形との割当ての対応関係であるメッシュ頂点割当情報を記憶する。
図７は、メッシュ頂点割当情報記憶部１３５が記憶するメッシュ頂点割当情報のデータ構造を示した概略図である。同図に示すように、メッシュ頂点割当情報記憶部１３５は、表形式のデータであり、メッシュ頂点の三次元座標値と、そのメッシュ頂点が割当てられる三角形（メッシュが三角形以外の多角形を含んで構成される場合にも、後述するようにそれら多角形を適宜三角形に分割する）の識別情報とを関連付けて記憶する。この表における主キーは、メッシュ頂点の三次元座標値である。 The mesh vertex assignment information storage unit 135, when the mesh warping unit 132 warps the mesh vertices of the 3D CG face model, mesh vertices corresponding to the assignment relationship between each mesh vertex and a triangle having the mesh vertex as a vertex. The allocation information is stored.
FIG. 7 is a schematic diagram illustrating a data structure of mesh vertex assignment information stored in the mesh vertex assignment information storage unit 135. As shown in the figure, the mesh vertex assignment information storage unit 135 is tabular data, including a three-dimensional coordinate value of a mesh vertex and a triangle to which the mesh vertex is assigned (including a polygon other than a triangle). Even in the case of being configured, the information is stored in association with the identification information of dividing the polygon into triangles as appropriate, as will be described later. The primary key in this table is the three-dimensional coordinate value of the mesh vertex.

テクスチャマッピング部１３３は、画像データ記憶部１１０に記憶された映像データの濃淡・色等の画像情報を含むテクスチャを、後述する重みの比率で、テクスチャ画像記憶部１３６に記憶されたＵＶテクスチャ画像（二次元顔テクスチャ画像）における顔の特徴点に対応する位置および少なくともその近傍部分のＵＶテクスチャ画像に合成して、修正三次元ＣＧ顔モデルに貼り付けることによりＣＧ顔モデルを生成する。ＵＶテクスチャ画像は、直交するＵ軸およびＶ軸による二次元平面における顔画像データである。ＵＶテクスチャ画像は、画素の位置（ｕ，ｖ）（但し、０≦ｕ≦１，０≦ｖ≦１）と三次元ＣＧ顔モデルのメッシュ頂点とが対応付けられて三次元ＣＧ顔モデルにマッピングされる。このマッピング処理の結果、修正三次元ＣＧ顔モデルのテクスチャ表面にはＲＧＢ各色による色彩および濃淡が表現される。
なお、テクスチャマッピング部１３３は、第１の実施形態で説明したテクスチャマッピング部１４と同等の構成および機能を内部に有している。そしてテクスチャマッピング部１３３は、後述するように、貼り付け画像をＵＶテクスチャ画像に貼り付ける際に、画素の位置ずれ量を算出して、算出された位置ずれ量を用いて画素位置をアラインする。 The texture mapping unit 133 converts the texture including the image information such as the shade and color of the video data stored in the image data storage unit 110 into the UV texture image (stored in the texture image storage unit 136 at a weight ratio described later). A CG face model is generated by synthesizing a UV texture image at a position corresponding to the facial feature point in the two-dimensional face texture image) and at least its vicinity and pasting it on the modified three-dimensional CG face model. The UV texture image is face image data in a two-dimensional plane with orthogonal U and V axes. The UV texture image is mapped to the 3D CG face model by associating the pixel position (u, v) (where 0 ≦ u ≦ 1, 0 ≦ v ≦ 1) and the mesh vertex of the 3D CG face model. Is done. As a result of this mapping process, colors and shades of RGB colors are expressed on the texture surface of the modified three-dimensional CG face model.
The texture mapping unit 133 has the same configuration and function as the texture mapping unit 14 described in the first embodiment. Then, as will be described later, the texture mapping unit 133 calculates the pixel displacement amount when the pasted image is pasted on the UV texture image, and aligns the pixel position using the calculated displacement amount.

画素割当情報記憶部１３７は、テクスチャマッピング部１３３がＵＶテクスチャ画像を貼り付ける際の、特徴点近傍のＵＶテクスチャ画像の画素位置と割り当てられる三角形との対応関係を表わす画素割当情報を記憶する。
図８は、画素割当情報記憶部１３７が記憶する画素割当情報のデータ構造を示した概略図である。同図に示すように、この画素割当情報は、表形式のデータであり、ＵＶテクスチャ画像上の画素の画素位置（ｕ，ｖ）と、当該画素が割当てられる三角形（メッシュが三角形以外の多角形を含んで構成される場合にも、後述するようにそれら多角形を適宜三角形に分割する）の識別情報とを関連付けて記憶する。なお、この表における主キーは、ＵＶ画像の画素位置である。 The pixel allocation information storage unit 137 stores pixel allocation information indicating the correspondence between the pixel position of the UV texture image near the feature point and the assigned triangle when the texture mapping unit 133 pastes the UV texture image.
FIG. 8 is a schematic diagram illustrating a data structure of pixel allocation information stored in the pixel allocation information storage unit 137. As shown in the figure, the pixel allocation information is tabular data, and the pixel position (u, v) of the pixel on the UV texture image and the triangle to which the pixel is allocated (the mesh is a polygon other than a triangle). Are also stored in association with the identification information of dividing the polygon into triangles as appropriate, as will be described later. The primary key in this table is the pixel position of the UV image.

レンダリング部１４０は、三次元推定部１３０から出力されたＣＧ顔モデルを所定の頭部姿勢に回転させ、レンダリング処理を行って合成顔画像モデルを生成する。レンダリング部１４０は、登録に必要な全ての頭部姿勢の合成顔画像モデルを生成する。この登録に必要な頭部姿勢の数は、例えば、登録対象人物の左右方向に０度から１８０度まで（顔正面が９０度の位置である。）を１０度刻みとした１９パターンそれぞれについて、水平方向を０度として、上下方向にそれぞれ０度，±１５度，±３０度の５パターンで、合計９５パターンである。 The rendering unit 140 rotates the CG face model output from the three-dimensional estimation unit 130 to a predetermined head posture, performs a rendering process, and generates a composite face image model. The rendering unit 140 generates a combined face image model of all head postures necessary for registration. The number of head postures necessary for this registration is, for example, for each of 19 patterns in increments of 10 degrees from 0 degrees to 180 degrees in the left-right direction of the person to be registered (the face front is at a position of 90 degrees). The horizontal direction is 0 degree, and there are 5 patterns of 0 degree, ± 15 degrees, and ± 30 degrees in the vertical direction, for a total of 95 patterns.

データベース登録部１５０は、レンダリング部１４０でレンダリング処理した各合成顔画像モデルにおいて、可視である特徴点を中心にして、ガボールウェーブレット特徴（画像特徴情報）を所定数の解像度と所定数の方位とによって畳み込む処理を行う。そして、データベース登録部１５０は、登録人物を認識するために必要な顔特徴データである可変テンプレート構造体を生成して顔特徴データベース部１６０に記憶する。
図９は、データベース登録部１５０が生成する可変テンプレート構造体のデータ構造を示す概略図である。同図に示す可変テンプレート構造体は、登録人物ごとに、登録人物の氏名または登録人物を特定するための名称、および識別情報を含む人物識別情報を有する。また、各人物ごとに、頭部姿勢の個数分の頭部姿勢インデックス（角度データ）を有する。さらに、この頭部姿勢インデックスごとに、特徴点の個数分の特徴点情報を有する。この特徴点情報は、特徴点番号（０から始まる整数）をキーとして、当該特徴点番号に対応する特徴点が可視であるか非可視であるかを示す可視性フラグと、その特徴点の特徴点二次元座標値（但し、可視性フラグが可視である場合にのみこの二次元座標値が示される。）と、所定数の解像度（ウェーブレットサイズ）×所定数の方位におけるガボールウェーブレット特徴（画像情報）の値とを関連付けて保持するものである。
なお、頭部姿勢インデックスは、例えば、頭部の鉛直方向（ｙ方向）の軸を中心とする回転位置（角度）および水平方向（ｘ方向）の軸を中心とする回転位置（角度）の組み合わせを表わす指標データである。但し、頭部姿勢インデックスがそれらの角度そのもののデータであってもよい。 The database registration unit 150 uses a predetermined number of resolutions and a predetermined number of azimuths for Gabor wavelet features (image feature information) centered on feature points that are visible in each synthetic face image model rendered by the rendering unit 140. Perform the convolution process. Then, the database registration unit 150 generates a variable template structure, which is face feature data necessary for recognizing a registered person, and stores the variable template structure in the face feature database unit 160.
FIG. 9 is a schematic diagram showing the data structure of the variable template structure generated by the database registration unit 150. The variable template structure shown in the figure has, for each registered person, the name of the registered person or a name for identifying the registered person, and person identification information including identification information. Each person has a head posture index (angle data) corresponding to the number of head postures. Furthermore, for each head posture index, feature point information corresponding to the number of feature points is included. The feature point information includes a feature flag (integer starting from 0) as a key, a visibility flag indicating whether the feature point corresponding to the feature point number is visible or not, and the feature of the feature point Point two-dimensional coordinate value (however, the two-dimensional coordinate value is indicated only when the visibility flag is visible), and a predetermined number of resolutions (wavelet size) × Gabor wavelet features (image information in a predetermined number of directions) ) In association with the value.
The head posture index is, for example, a combination of a rotational position (angle) centered on the vertical axis (y direction) of the head and a rotational position (angle) centered on the horizontal (x direction) axis. Is index data representing. However, the head posture index may be data of these angles themselves.

例えば、データベース登録部１５０は、登録人物一人あたり、１個の人物識別情報と、例として前述した９５種類の頭部姿勢に対応する頭部姿勢インデックスとを可変テンプレート構造体のメンバとして作成する。さらに、各頭部姿勢に対して、図６に示した９個の特徴点（特徴点番号が、０から８まで）ごとの特徴点情報をメンバとして作成する。さらに、それら特徴点の各々に対して、可視性フラグと特徴点二次元座標値の各メンバを作成するとともに、５種類の解像度×８方向（２２．５度ごと）で４０個のガボールウェーブレット特徴の値のデータをとのメンバをとして作成する。 For example, the database registration unit 150 creates one piece of person identification information and a head posture index corresponding to the 95 types of head postures described above as members of the variable template structure for each registered person. Further, for each head posture, feature point information for each of the nine feature points (feature point numbers from 0 to 8) shown in FIG. 6 is created as a member. Furthermore, for each of these feature points, members of a visibility flag and a feature point two-dimensional coordinate value are created, and 40 Gabor wavelet features in 5 resolutions × 8 directions (every 22.5 degrees). Create the value data as a member.

次に、顔画像処理装置１００の各部の動作について説明する。
［顔領域検出・照合処理］
図１０は、顔領域検出照合部１２０が映像データの各画像フレームから顔画像領域を検出して顔特徴データと照合し、追跡結果または認識結果を出力する手順を示すフローチャートである。ステップＳ６０１において、顔領域検出照合部１２０は、モードを登録モードまたは認識モードいずれかに設定する。このモード設定は、例えば、顔画像処理装置１００の操作部（不図示）の操作による指定によって行われる。 Next, the operation of each part of the face image processing apparatus 100 will be described.
[Face area detection / collation processing]
FIG. 10 is a flowchart illustrating a procedure in which the face area detection / collation unit 120 detects a face image area from each image frame of video data, collates it with face feature data, and outputs a tracking result or a recognition result. In step S601, the face area detection collation unit 120 sets the mode to either the registration mode or the recognition mode. This mode setting is performed by, for example, designation by operating an operation unit (not shown) of the face image processing apparatus 100.

次に、ステップＳ６０２において、顔領域検出照合部１２０は、画像データ記憶部１１０に記憶された映像データの画像フレームを時刻情報の古い順に読み込む。
次に、ステップＳ６０３において、顔領域検出照合部１２０は、画像フレームを読み込んだとき（Ｓ６０３：ＹＥＳ）はステップＳ６０４の処理に移り、画像フレームを読み込まなかったとき（Ｓ６０３：ＮＯ）はこのフローチャートの処理を終了する。 Next, in step S602, the face area detection collation unit 120 reads the image frames of the video data stored in the image data storage unit 110 in order of oldest time information.
Next, in step S603, the face area detection collation unit 120 moves to the process of step S604 when the image frame is read (S603: YES), and when the image frame is not read (S603: NO), The process ends.

ステップＳ６０４において、顔領域検出照合部１２０は、読み込んだ画像フレームから顔画像領域を検出する。このとき顔領域検出照合部１２０は、例えば、画像フレームをスキャンして色情報を取得し、人間の身体の色に対応する特定の色の画素を抽出することによって顔画像領域を検出する方法や、目、鼻、口等の顔の部位を形状として認識する周知の顔認識アルゴリズムを用いて顔画像領域を検出する方法、またはこれらを組み合わせて用いる方法を使用する。 In step S604, the face area detection collation unit 120 detects a face image area from the read image frame. At this time, for example, the face area detection collation unit 120 scans an image frame to acquire color information, and extracts a pixel of a specific color corresponding to the color of the human body. A method of detecting a face image region using a known face recognition algorithm for recognizing a facial part such as an eye, nose, or mouth as a shape, or a method using a combination thereof is used.

次に、ステップＳ６０５において、顔領域検出照合部１２０は、顔画像領域から特徴点位置を推定する。特徴点は図６に示したものである。なお、特徴点位置の部位が非可視である場合は、顔画像領域検出照合部１２０はその特徴点位置を推定しない。顔領域検出照合部１２０による特徴点位置の推定処理は、例えば次のようにして行う。顔領域検出照合部１２０は、現画像フレームにおける特徴点が前画像フレームにおける特徴点と同位置またはその近傍に存在するとの仮定に基づき、同位置を初期位置として、その初期位置を中心とした所定範囲内の近傍を探索する。過去の画像フレームにおいて推定された特徴点位置は、顔特徴データベース部１６０の可変テンプレート構造体に特徴点時次元座標値として登録されているため、顔画像領域検出照合部１２０は、その可変テンプレート構造体から前フレームの特徴点二次元座標値を読み出すことにより初期位置を設定することができる。顔画像領域検出照合部１２０は、ガボールウェーブレット特徴を解像度の低い領域から高い領域まで計測することによって探索を行う。ガボールウェーブレット特徴は複素数であり、実部のコサイン波形と虚部のサイン波形とは位相が９０度ずれたものである。そこで、顔領域検出照合部１２０は、実部および虚部のそれぞれについての係数を用いて位置ずれを推定し、位置をずらしながら繰り返し探索処理を行う。そして、位置ずれが収束した時点で探索処理を終了させる。このときの位置が、類似度が最大となる特徴点位置であると推定され、すなわち推定された特徴点二次元座標値である。１画像フレームあたりの顔画像領域中に推定された特徴点の個数をＮ_ＦＰ個とする。 Next, in step S605, the face area detection collation unit 120 estimates a feature point position from the face image area. The feature points are those shown in FIG. In addition, when the site | part of a feature point position is invisible, the face image area | region detection collation part 120 does not estimate the feature point position. The feature point position estimation processing by the face area detection / collation unit 120 is performed as follows, for example. Based on the assumption that the feature point in the current image frame is present at or near the same position as the feature point in the previous image frame, the face area detection / collation unit 120 uses the same position as the initial position and sets the initial position as the center. Search for neighborhoods in range. Since the feature point positions estimated in the past image frames are registered as feature point time-dimensional coordinate values in the variable template structure of the face feature database unit 160, the face image region detection collation unit 120 uses the variable template structure. The initial position can be set by reading the feature point two-dimensional coordinate value of the previous frame from the body. The face image area detection collation unit 120 performs a search by measuring Gabor wavelet features from a low resolution area to a high area. The Gabor wavelet feature is a complex number, and the cosine waveform of the real part and the sine waveform of the imaginary part are 90 degrees out of phase. Therefore, the face area detection collation unit 120 estimates a positional shift using coefficients for each of the real part and the imaginary part, and repeatedly performs a search process while shifting the position. Then, the search process is terminated when the positional deviation converges. The position at this time is estimated to be the feature point position having the maximum similarity, that is, the estimated feature point two-dimensional coordinate value. The number of feature points estimated in the face image area per image frame is _NFP .

次に、ステップＳ６０６において、顔領域検出照合部１２０は、ステップＳ６０１の処理において設定されたモードに応じて処理を分ける。設定モードが登録モードである場合（Ｓ６０６：登録モード）はステップＳ６０７の処理に移り、設定モードが認識モードである場合（Ｓ６０６：認識モード）はステップＳ６０８の処理に移る。 Next, in step S606, the face area detection collation unit 120 divides the process according to the mode set in the process of step S601. If the setting mode is the registration mode (S606: registration mode), the process proceeds to step S607. If the setting mode is the recognition mode (S606: recognition mode), the process proceeds to step S608.

ステップＳ６０７において、登録モードの顔領域検出照合部１２０は、ステップＳ６０５の処理で推定されたＮ_ＦＰ個の特徴点位置である特徴点二次元座標値を出力してステップＳ６０２の処理に戻る。一方、ステップＳ６０８において、認識モードの顔領域検出照合部１２０は、ステップＳ６０５の処理で推定されたＮ_ＦＰ個の特徴点二次元座標値を出力するとともに、類似度の最も高い顔特徴データに関するマッチスコアを生成して出力し、ステップＳ６０２の処理に戻る。 In step S607, the face area detection collation unit 120 in the registration mode outputs the feature point two-dimensional coordinate values that are the _NFP feature point positions estimated in the process of step S605, and returns to the process of step S602. On the other hand, in step S608, the face area detecting matching unit 120 of the recognition mode, outputs the estimated N _FP pieces of two-dimensional coordinate values the feature point in the processing of step S605, the match about the highest face characteristic data of the similarity A score is generated and output, and the process returns to step S602.

［特徴点の三次元位置・頭部姿勢推定処理］
図１１は、映像データの各画像フレームにおける二次元の特徴点位置（特徴点二次元座標値）から、三次元の特徴点位置と画像フレームごとの頭部姿勢とを対応付けて推定する処理の手順を示すフローチャートである。ステップＳ７０１において、位置・姿勢推定部１３１は、顔領域検出照合部１２０から、Ｎ_{ｆｒａｍｅｓ}枚の画像フレームそれぞれについてのＮ_ＦＰ個の特徴点二次元座標値を取り込む。このとき、ｋ枚目（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）の画像フレームにおけるｊ番目（０≦ｊ＜Ｎ_ＦＰ）の特徴点二次元座標値は、下の式（２５）に示す要素（ｘ^ｋ _ｊ，ｙ^ｋ _ｊ）を有するベクトルｙ（ボールド体）^ｋ _ｊ（ハット）で示される。ここで、要素（ｘ^ｋ _ｊ，ｙ^ｋ _ｊ）は、ｋ枚目の画像フレームを直交するｘ軸およびｙ軸による二次元平面で表したときのｊ番目の特徴点のｘ座標推定値，ｙ座標推定値に相当する。すなわち、式（２５）におけるｙ（ボールド体）^ｋ _ｊは実数（Ｒ）の二次元ベクトルである。なおここで、「（ボールド体）」という記載は、その記載直前の文字がボールド体の書体であることを指し、当該表現が行列またはベクトルであることを意味している。また、「（ハット）」という記載は、同様に、当該表現の値が推定値であることを意味している。 [Three-dimensional position / head posture estimation processing of feature points]
FIG. 11 shows a process of estimating a three-dimensional feature point position and a head posture for each image frame from two-dimensional feature point positions (feature point two-dimensional coordinate values) in each image frame of video data. It is a flowchart which shows a procedure. In step S <b> 701, the position / posture estimation unit 131 captures N _FP feature point two-dimensional coordinate values for each of the N _frames image _frames from the face area detection collation unit 120. At this time, the j-th (0 ≦ j <N _FP ) feature point two-dimensional coordinate value in the k-th (0 ≦ k <N _frames ) image frame is represented by an element (x ^k _j , Y ^k _j ) and is represented by a vector y (bold field) ^k _j (hat). Here, the element (x ^k _j , y ^k _j ) is an x-coordinate estimated value of the j-th feature point when the k-th image frame is represented by a two-dimensional plane with orthogonal x-axis and y-axis, y It corresponds to the coordinate estimated value. That is, y (bold field) ^k _j in equation (25) is a two-dimensional vector of real numbers (R). Here, the description “(bold)” indicates that the character immediately before the description is a bold typeface, and that the expression is a matrix or a vector. Similarly, the description “(hat)” means that the value of the expression is an estimated value.

次に、ステップＳ７０２において、位置・姿勢推定部１３１は、ｋ枚目（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）の画像フレームにおけるｊ番目（０≦ｊ＜Ｎ_ＦＰ）の特徴点が可視であるかまたは非可視であるかを推定した可視性ｖ^ｋ _ｊ（ハット）∈｛０，１｝を取得する。値は、非可視が０であり可視が１である。登録モードにおいては、画像フレームごとの可視性ｖ^ｋ _ｊ（ハット）は、特徴点位置が見えるか否かにより外部から入力される。 Next, in step S < _b > 702, the position / posture estimation unit 131 determines that the j-th (0 ≦ j <N _FP ) feature point in the k-th (0 ≦ k <N _frames ) image frame is visible or not. Visibility v ^k _j (hat) ε {0, 1} that is estimated to be visible is acquired. Values are 0 for invisible and 1 for visible. In the registration mode, the visibility v ^k _j (hat) for each image frame is input from the outside depending on whether or not the feature point position is visible.

次に、ステップＳ７０３において、位置・姿勢推定部１３１は、顔モデル記憶部１３４から、三次元ＣＧ顔モデルにおけるｊ番目（０≦ｊ＜Ｎ_ＦＰ）特徴点に対応するメッシュ頂点の位置であるメッシュ頂点三次元座標値ｍ（ボールド体）_ｊ∈Ｒ^３（但し、０≦ｊ＜Ｎ_ＦＰ）を読み込む。 Next, in step S < _b > 703, the position / posture estimation unit 131 stores the mesh vertex position corresponding to the j-th (0 ≦ j <N _FP ) feature point in the three-dimensional CG face model from the face model storage unit 134. The vertex three-dimensional coordinate value m (bold body) _j ∈ R ³ (where 0 ≦ j <N _FP ) is read.

次に、ステップＳ７０４において、位置・姿勢推定部１３１は、ステップＳ７０１からＳ７０３までの処理で得られたデータに基づいて、映像データにおける各特徴点の三次元位置である特徴点三次元座標値と画像フレームごとの頭部姿勢とを対応付けて推定する。この推定処理は、公知文献である「サイモンクリピングデル、藤井真人、八木伸行、“遮蔽とノイズのある二次元観測データから三次元顔特徴点推定の一検討”、電子情報通信学会技術研究報告、ＰＲＭＵ２００８−４２、２００８／０６、ｐｐ．１３３−１３８．」に記載された二次元特徴データから三次元特徴データの推定処理を適用する。または、後に記載する「顔の三次元モデル推定処理」を適用する。 Next, in step S704, the position / orientation estimation unit 131 calculates a feature point three-dimensional coordinate value, which is a three-dimensional position of each feature point in the video data, based on the data obtained by the processing from steps S701 to S703. The head posture for each image frame is estimated in association with it. This estimation process is a well-known document such as “Simon Clipping Dell, Masato Fujii, Nobuyuki Yagi,“ Study of 3D facial feature point estimation from 2D observation data with shielding and noise ”, IEICE Technical Report, PRMU2008. -42, 2008/06, pp. 133-138. ”, The estimation process of the three-dimensional feature data is applied from the two-dimensional feature data. Alternatively, “face three-dimensional model estimation processing” described later is applied.

次に、ステップＳ７０５において、位置・姿勢推定部１３１は、映像データにおけるＮ_ＦＰ個の特徴点三次元座標値ｍ（ボールド体）_ｊ（ハット）∈Ｒ^３（但し、０≦ｊ＜Ｎ_ＦＰ）と、ｋ枚目の画像フレームにおける頭部姿勢Ｑ（ボールド体）^ｋ（ハット）∈Ｒ^２×３（但し、０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）と、ｋ枚目の画像フレームにおける二次元の特徴点の重心ｙ（ボールド体）^ｋ（バー）∈Ｒ^２（但し、０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）とを出力する。 Next, in step S705, the position and orientation estimating section 131, _{N FP} pieces of three-dimensional feature point coordinates m _(bold) in the image data _j (hat) ∈R ³ (where, 0 ≦ j _{<N FP)} And head posture Q (bold body) ^k (hat) ∈ R ^{2 × 3} (where 0 ≦ k <N _frames ) in the k-th image frame, and two-dimensional feature points in the k-th image frame Centroid y (bold body) ^k (bar) ∈ R ² (where 0 ≦ k <N _frames ).

ステップＳ７０４の処理における推定アルゴリズムは、非線形条件付最小自乗法推定クラスに属するものであり、式（２６）に示す自乗誤差を式（２７）に示す直交性条件下で最小化するものである。 The estimation algorithm in the process of step S704 belongs to the nonlinear conditional least square method estimation class, and minimizes the square error represented by equation (26) under the orthogonality condition represented by equation (27).

なお、式（２７）において、Ｉ_２は二次元単位行列であり、λは定数である。 In Expression (27), I ₂ is a two-dimensional unit matrix, and λ is a constant.

［メッシュワーピング処理］
次に、メッシュワーピング部１３２は、推定された各特徴点の特徴点三次元座標値ｍ（ボールド体）_ｊ（ハット）∈Ｒ^３（但し、０≦ｊ＜Ｎ_ＦＰ）で示される位置に、顔モデル記憶部１３４に記憶された三次元ＣＧ顔モデルのメッシュ頂点三次元座標値ｘ（ボールド体）^ｎ∈Ｒ^３（但し、０≦ｎ＜Ｎ_ＶＴ）で示されるメッシュ頂点をワーピングさせることによって修正三次元ＣＧ顔モデルを生成する。
図１２は、メッシュワーピング部１３２が実行するメッシュワーピング処理の手順を示したフローチャートである。また、図１３は、三次元ＣＧ顔モデルのメッシュを示す図である。以下、図１２のフローチャートの手順に沿って説明する。 [Mesh warping process]
Next, the mesh warping unit 132 has a feature point three-dimensional coordinate value m (bold body) _j (hat) ∈ R ³ (where 0 ≦ j <N _FP ) of each estimated feature point, By warping the mesh vertex indicated by the mesh vertex three-dimensional coordinate value x (bold body) ⁿ ∈ R ³ (where 0 ≦ n <N _VT ) of the three-dimensional CG face model stored in the face model storage unit 134 A modified 3D CG face model is generated.
FIG. 12 is a flowchart showing a procedure of mesh warping processing executed by the mesh warping unit 132. FIG. 13 is a diagram showing a mesh of a three-dimensional CG face model. Hereinafter, a description will be given along the flowchart of FIG.

ステップＳ８０１において、メッシュワーピング部１３２は、特徴点に対応する三次元ＣＧ顔モデルのメッシュ頂点（特徴点頂点と呼ぶ。）と、三次元ＣＧ顔モデルと外界との境界部分に位置する所定数のメッシュ頂点（固定頂点と呼ぶ。）とを指定する。そして、メッシュワーピング部１３２は、これらのメッシュ頂点を頂点とするＮ_ｔｒｉ個の三角形ｉ（但し、０≦ｉ＜Ｎ_ｔｒｉ）を設定する。これらの三角形の頂点の座標はｘ（ボールド体）^ｉ，ｌ，（但し、０≦ｉ＜Ｎ_ｔｒｉ，０≦ｌ（エル）＜３）である。
図１３は、特徴点頂点（色抜きの丸印）と、固定頂点（ハッチングの丸印）と、これらを頂点とする三角形とを、模式的に示している。 In step S801, the mesh warping unit 132 has a predetermined number of mesh vertices (referred to as feature point vertices) of the 3D CG face model corresponding to the feature points and a boundary portion between the 3D CG face model and the outside world. Designate mesh vertices (called fixed vertices). Then, the mesh warping unit 132 sets N _tri triangles i (where 0 ≦ i <N _tri ) having these mesh vertices as vertices. The coordinates of the vertices of these triangles are x (bold) ^{i, l} , where 0 ≦ i <N _tri , 0 ≦ l (el) <3.
FIG. 13 schematically shows feature point vertices (colored circles), fixed vertices (hatched circles), and triangles having these as vertices.

次に、ステップＳ８０２において、メッシュワーピング部１３２は、下の式（２８）により、三次元ＣＧ顔モデルのｎ番目のメッシュ頂点三次元座標値ｘ（ボールド体）^ｎ∈Ｒ^３，０≦ｎ＜Ｎ_ＶＴを、ステップＳ８０１の処理において設定した三角形ｉ（０≦ｉ＜Ｎ_ｔｒｉ）のうちのｉ_ｍｉｎ（ｎ）番目（但し、０≦ｎ＜Ｎ_ｖｔ）の三角形に割当てる。そして、メッシュワーピング部１３２は、メッシュ頂点三次元座標値とそれが割当てられる三角形の識別情報とを、メッシュ頂点割当情報記憶部１３５に記憶されたメッシュ頂点割当情報に書き込む。メッシュワーピング部１３２は、２回目以降の処理において既に計算されたメッシュ頂点が再度選択された場合にメッシュ頂点割当情報記憶部１３５を参照するようにすれば、再度計算処理を行う必要がなく効率的な処理を行うことができる。 Next, in step S802, the mesh warping unit 132 obtains the n-th mesh vertex three-dimensional coordinate value x (bold body) ⁿ ∈ R ³ , 0 ≦ n <of the three-dimensional CG face model according to the following equation (28). N _VT is assigned to the i _min (n) -th triangle (where 0 ≦ n <N _vt ) among the triangles i (0 ≦ i <N _tri ) set in step S801. Then, the mesh warping unit 132 writes the mesh vertex three-dimensional coordinate value and the identification information of the triangle to which the mesh vertex is assigned to the mesh vertex assignment information stored in the mesh vertex assignment information storage unit 135. If the mesh vertex allocation information storage unit 135 is referred to when the mesh vertices already calculated in the second and subsequent processes are selected again, the mesh warping unit 132 does not need to perform the calculation process again and is efficient. Can be processed.

なお、式（２８）における関数ｆ（ｘ）はペナルティ関数であり、メッシュ頂点が三角形内に含まれる場合にはこのペナルティ関数の値は０である。 Note that the function f (x) in Expression (28) is a penalty function, and the value of this penalty function is 0 when the mesh vertex is included in the triangle.

次に、ステップＳ８０３において、メッシュワーピング部１３２は、ｎ番目のメッシュ頂点三次元座標値ｘ（ボールド体）^ｎ，０≦ｎ＜Ｎ_ＶＴを、下の式（２９）で示すｎ番目の修正メッシュ頂点三次元座標値ｘ（ボールド体）^ｎ（ハット）（０≦ｎ＜Ｎ_ＶＴ）に置き換える。 Next, in step S803, the mesh warping unit 132 sets the n-th mesh vertex three-dimensional coordinate value x (bold body) ⁿ , 0 ≦ n <N _VT to the n-th modified mesh represented by the following equation (29). Replace with the vertex three-dimensional coordinate value x (bold) ⁿ (hat) (0 ≦ n <N _VT ).

式（２９）において、ｐ＝ｉ_ｍｉｎ（ｎ）であり、ｘ（ボールド体）^ｐ，ｌ（ハット），０≦ｌ（エル）＜３は、前述した特徴点の三次元位置・頭部姿勢推定処理において推定した特徴点三次元座標値ｍ（ボールド体）_ｊ（ハット），０≦ｊ＜Ｎ_ＦＰに対応するメッシュ頂点および固定頂点の三次元座標値である。 In Expression (29), p = i _min (n), x (bold body) ^{p, l} (hat), 0 ≦ l (el) <3 is the three-dimensional position / head posture of the feature point described above The feature point three-dimensional coordinate value m (bold) _j (hat), 0 ≦ j <N _FP corresponding to the mesh point and the fixed vertex estimated in the estimation process.

図１４は、三次元ＣＧ顔モデルのメッシュ頂点のワーピング処理を説明するための図である。同図は、前述した特徴点の三次元位置・頭部姿勢推定処理において推定した特徴点および固定頂点のうち３個のメッシュ頂点三次元座標値ｘ（ボールド体）^ｐ，ｌ（ｌ（エル）∈｛０，１，２｝）が、映像データから推定された修正メッシュ頂点三次元座標値ｘ（ボールド体）^ｐ，ｌ（ハット），ｌ（エル）∈｛０，１，２｝に置き換えられ、さらに、これら置き換えられた特徴点頂点および固定頂点に合わせて、一般メッシュ頂点のメッシュ頂点三次元座標値ｘ（ボールド体）^ｎが修正メッシュ頂点三次元座標値ｘ（ボールド体）^ｎ（ハット）に置き換えられた様子を示す。同図において、メッシュ頂点三次元座標値ｘ（ボールド体）^ｐ，１は固定頂点であり、メッシュ頂点三次元座標値ｘ（ボールド体）^ｐ，ｌ，ｌ（エル）∈｛０，２｝は特徴点に対応するメッシュ頂点である。 FIG. 14 is a diagram for explaining a warping process for mesh vertices of a three-dimensional CG face model. The figure shows three mesh vertex three-dimensional coordinate values x (bold body) ^{p, l} (l (el)) among the feature points and fixed vertices estimated in the above-described three-dimensional position / head posture estimation processing of feature points. ∈ {0, 1, 2}) is replaced with the modified mesh vertex three-dimensional coordinate value x (bold) ^{p, l} (hat), l (el) ∈ {0, 1, 2} estimated from the video data Further, in accordance with the replaced feature point vertex and fixed vertex, the mesh vertex three-dimensional coordinate value x (bold body) ^{n of} the general mesh vertex is corrected mesh vertex three-dimensional coordinate value x (bold body) ⁿ (hat) ) Shows how it was replaced. In the figure, the mesh vertex three-dimensional coordinate value x (bold body) ^{p, 1} is a fixed vertex, and the mesh vertex three-dimensional coordinate value x (bold body) ^{p, l} , l (el) ε {0,2} is Mesh vertices corresponding to feature points.

［テクスチャマッピング処理］
映像データの顔画像を三次元ＣＧ顔モデルに合わせてワーピングして生成した修正三次元ＣＧ顔モデルにテクスチャを貼り付ける（マッピングする）と、そのマッピング後の修正三次元ＣＧ顔モデルを任意の頭部姿勢でレンダリングしたときに、登録対象人物の顔テクスチャがその頭部姿勢で映ることになる。レンダリングされた修正三次元ＣＧ顔モデルの表面に映るテクスチャの質感は、頭部姿勢や照明条件等の他、ＵＶテクスチャ画像により定められるため、特定の人物の顔にモデルを合わせるためには、以下のようにしてＵＶテクスチャ画像の修正を行う。 [Texture mapping process]
When a texture is pasted (mapped) on the modified 3D CG face model generated by warping the face image of the video data in accordance with the 3D CG face model, the modified 3D CG face model after the mapping is added to an arbitrary head. When rendering with the head posture, the facial texture of the person to be registered is reflected in the head posture. Since the texture of the texture reflected on the surface of the rendered modified three-dimensional CG face model is determined by the UV texture image in addition to the head posture and lighting conditions, in order to match the model to the face of a specific person, In this manner, the UV texture image is corrected.

テクスチャマッピングされた三次元ＣＧ顔モデルとＵＶテクスチャ画像との関係は、既に図３に示した通りである。同図（ａ）は、三次元ＣＧ顔モデルにテクスチャマッピングを施した状態を示すものであり、同図（ｂ）は、ＵＶテクスチャ画像にメッシュを重ね合わせた図である。同図が示すように、ＵＶテクスチャ画像にメッシュを重ね合わせると、ｎ番目のメッシュ頂点三次元座標値ｘ（ボールド体）^ｎ（０≦ｎ＜Ｎ_ＶＴ）とＵＶテクスチャ画像上の画素位置である座標値（ｕ^ｎ，ｖ^ｎ）（０≦ｕ^ｎ≦１，０≦ｖ^ｎ≦１，０≦ｎ＜Ｎ_ＶＴ）とが関係付けられる。具体的には、ｎ番目のメッシュ頂点三次元座標値ｘ（ボールド体）^ｎのデータが、ＵＶテクスチャ画像上の画素の座標値（ｕ^ｎ，ｖ^ｎ）を、ＵＶテクスチャ画像上へのポインタとして関連付けて有していることによって、これら両者が関係付けられる。 The relationship between the texture-mapped three-dimensional CG face model and the UV texture image has already been shown in FIG. FIG. 6A shows a state in which texture mapping is performed on a three-dimensional CG face model, and FIG. 6B is a diagram in which a mesh is superimposed on a UV texture image. As shown in the figure, when the mesh is superimposed on the UV texture image, the n-th mesh vertex three-dimensional coordinate value x (bold body) ⁿ (0 ≦ n <N _VT ) and the pixel position on the UV texture image. Coordinate values (u ⁿ , v ⁿ ) (0 ≦ u ⁿ ≦ 1, 0 ≦ v ⁿ ≦ 1, 0 ≦ n <N _VT ) are related. Specifically, the data of the n-th mesh vertex three-dimensional coordinate value x (bold body) ⁿ uses the pixel coordinate values (u ⁿ , v ⁿ ) on the UV texture image as a pointer to the UV texture image. By having them in association, both of these are related.

図１５は、ＵＶテクスチャ画像の修正処理の手順を示すフローチャートである。ステップＳ１２０１において、テクスチャマッピング部１３３は、三次元ＣＧ顔モデルのメッシュを構成する四角形以上の多角形を三角形に分割する。つまり、テクスチャマッピング部１３３は、三次元ＣＧ顔モデルのメッシュ頂点三次元座標値ｘ（ボールド体）^ｊ（０≦ｊ＜Ｎ）に対応する頂点によって構成されるＮ角形を、｛ｘ^０，ｘ^１，ｘ^２｝，｛ｘ^０，ｘ^２，ｘ^３｝，・・・，｛ｘ^０，ｘ^Ｎ−２，ｘ^Ｎ−１｝（ｘはそれぞれボールド体）を頂点とした（Ｎ−２）個の三角形に分割する。これによって、テクスチャマッピング部１３３が三次元ＣＧ顔モデル上に設ける三角形の個数をＮ_{ｕｖｔｒｉ}個としたとき、ｉ番目の三角形｛ｘ^ｉ，０，ｘ^ｉ，１，ｘ^ｉ，２｝（ｘはそれぞれボールド体）（０≦ｉ＜Ｎ_{ｕｖｔｒｉ}）に、ＵＶテクスチャ画像におけるｉ番目の三角形｛（ｕ^ｉ，０，ｖ^ｉ，０），（ｕ^ｉ，１，ｖ^ｉ，１），（ｕ^ｉ，２，ｖ^ｉ，２）｝が対応付けられる。 FIG. 15 is a flowchart showing a procedure of UV texture image correction processing. In step S <b> 1201, the texture mapping unit 133 divides a polygon that is a quadrilateral or more forming a mesh of the three-dimensional CG face model into triangles. That is, the texture mapping unit 133 converts an N-gon formed by vertices corresponding to the mesh vertex three-dimensional coordinate value x (bold body) ^j (0 ≦ j <N) of the three-dimensional CG face model to {x ⁰ , x ¹ , x ² }, {x ⁰ , x ² , x ³ },..., {X ⁰ , x ^N−2 , x ^N−1 } (x is a bold body) (N−2) ) Divide into triangles. Thus, when the number of triangles provided on the 3D CG face model by the texture mapping unit 133 is N _uvtri , the i-th triangle {x ^{i, 0} , x ^{i, 1} , x ^{i, 2} } (x is (Bold body) (0 ≦ i <N _uvtri ), i-th triangle {(u ^{i, 0} , v ^{i, 0} ), (u ^{i, 1} , v ^{i, 1} ), (u ⁱ ) in the UV texture image ^{, 2} , v ^{i, 2} )} are associated with each other.

次に、ステップＳ１２０２において、テクスチャマッピング部１３３は、ＵＶテクスチャ画像において、特徴点に対応するメッシュ頂点のうちの、１つのメッシュ頂点に対応する画素およびその近傍（半径＝Ｒ画素の範囲内）にある画素を選択する（それらの画素の座標を（ｕ，ｖ）で表わす）。
次に、ステップＳ１２０３において、画素（ｕ，ｖ）が選択されている場合（ステップＳ１２０３：ＹＥＳ）は、ステップＳ１２０４の処理に移る。一方、画素（ｕ，ｖ）が選択されていない場合（ステップＳ１２０３：ＮＯ）は、ステップＳ１２１０の処理に移る。 Next, in step S1202, the texture mapping unit 133 sets the pixel corresponding to one mesh vertex among the mesh vertices corresponding to the feature points and the vicinity thereof (within a radius = R pixel) in the UV texture image. Select certain pixels (the coordinates of those pixels are represented by (u, v)).
Next, when the pixel (u, v) is selected in step S1203 (step S1203: YES), the process proceeds to step S1204. On the other hand, when the pixel (u, v) is not selected (step S1203: NO), the process proceeds to step S1210.

ステップＳ１２０４において、テクスチャマッピング部１３３は、選択された画素（ｕ，ｖ）を、式（３０）によってＵＶテクスチャ画像上のｉ_ｍｉｎ（ｕ，ｖ）番目の三角形に割当てる。 In step S1204, the texture mapping unit 133 assigns the selected pixel (u, v) to the i _min (u, v) -th triangle on the UV texture image using Expression (30).

テクスチャマッピング部１３３は、これらの各画素（ｕ，ｖ）と、その画素に割当てられた三角形の番号ｉ_ｍｉｎ（ｕ，ｖ）とを、画素割当情報記憶部１３７に記憶される画素割当情報に書き込む。テクスチャマッピング部１３３は、２回目以降の処理において既に計算された画素（ｕ，ｖ）が再度選択された場合に画素割当情報記憶部１３７を参照するようにすれば、再度計算処理を行う必要がなく効率的な処理を行うことができる。 The texture mapping unit 133 uses each pixel (u, v) and the triangle number i _min (u, v) assigned to the pixel as pixel allocation information stored in the pixel allocation information storage unit 137. Write. The texture mapping unit 133 needs to perform the calculation process again if the pixel allocation information storage unit 137 is referred to when the pixel (u, v) already calculated in the second and subsequent processes is selected again. Efficient processing.

ここで、ｉ_ｍｉｎ（ｕ，ｖ）を「ｉｍｉｎ」と表記する（以後において同様）。ステップＳ１２０４の処理で画素（ｕ，ｖ）が割り当てられたｉｍｉｎ番目の三角形の頂点を（ｕ^{ｉｍｉｎ，ｌ}，ｖ^{ｉｍｉｎ，ｌ}）（ｌ（エル）＝０，１，２）とし、これらが対応付けられた修正メッシュ頂点三次元座標値をｘ（ボールド体）^{ｉｍｉｎ，ｌ}（ハット）（ｌ（エル）＝０，１，２）とし、それらに対応する三次元ＣＧ顔モデルの頂点における表面法線をｎ（ボールド体）^{ｉｍｉｎ，ｌ}（ｌ（エル）＝０，１，２）とする。 Here, i _min (u, v) is expressed as “imin” (the same applies hereinafter). The vertex of the imin-th triangle to which the pixel (u, v) is assigned in the process of step S1204 is defined as (u ^{imin, l} , ^{vimin, l} ) (l (el) = 0, 1, 2), and these correspond The attached three-dimensional coordinate value of the corrected mesh vertex is x (bold body) ^{imin, l} (hat) (l (el) = 0, 1, 2), and the surface method at the vertex of the corresponding three-dimensional CG face model Let the line be n (bold) ^{imin, l} (l (0), 1, 2).

次に、ステップＳ１２０５において、テクスチャマッピング部１３３は、映像データの各画像フレームにおける射影位置を推定する。具体的には、テクスチャマッピング部１３３は、式（３１）により、修正メッシュ頂点三次元座標値ｘ（ボールド体）^{ｉｍｉｎ，ｌ}（ハット）（ｌ（エル）＝０，１，２）を、前述した特徴点の三次元位置・頭部姿勢推定処理のステップＳ７０５の処理において推定した頭部姿勢Ｑ（ボールド体）^ｋ（ハット），０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}に基づいて変換し、これに映像データのｋ枚目の画像フレームにおける二次元の特徴点の重心ｙ（ボールド体）^ｋ（バー），０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}を加算して射影位置ｒ（ボールド体）^{ｉｍｉｎ，ｌ} _ｋ（ハット）を求める。 Next, in step S1205, the texture mapping unit 133 estimates a projection position in each image frame of the video data. Specifically, the texture mapping unit 133 calculates the corrected mesh vertex three-dimensional coordinate value x (bold body) ^{imin, l} (hat) (l (el) = 0, 1, 2) by the equation (31) as described above. _Is converted based on the head posture Q (bold body) ^k (hat), 0 ≦ k <N _frames estimated in the process of step S705 of the three-dimensional position / head posture estimation processing of the feature points. The centroid y (bold body) ^k (bar), 0 ≦ k <N _frames of the two-dimensional feature points in the k-th image frame is added to obtain the projection position r (bold body) ^{imin, l} _k (hat). Ask.

この射影位置ｒ（ボールド体）^{ｉｍｉｎ，ｌ} _ｋ（ハット）は、遮蔽されない（可視である）場合の、映像データのｋ枚目の画像フレームに当該頂点が射影される位置の推定値である。 This projected position r (bold body) ^{imin, l} _k (hat) is an estimated value of the position at which the vertex is projected onto the kth image frame of the video data when it is not occluded (visible).

次に、ステップＳ１２０６において、テクスチャマッピング部１３３は、ＵＶテクスチャ画像上の画素の、映像データの各画像フレームにおける射影位置を推定する。具体的には、テクスチャマッピング部１３３は、ステップＳ１２０５の処理で求めた射影位置ｒ（ボールド体）^{ｉｍｉｎ，ｌ} _ｋ（ハット）の、ステップＳ１２０４の処理で求めたｂ（ボールド体）^ｉｍｉｎ＝ｂ（ボールド体）^ｉｍｉｎ（ｕ，ｖ）による重み付組み合わせｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット）を下の式（３２）のように計算し、ＵＶテクスチャ画像上の画素（ｕ，ｖ）に対応する、映像データのｋ枚目の画像フレームにおけるＲＧＢ値ｐ（ボールド体）_ｋ（ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット））（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）を求める。このＲＧＢ値ｐ（ボールド体）_ｋ（ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット））は、元の画像フレームに含まれていた濃淡・色彩情報である。 Next, in step S1206, the texture mapping unit 133 estimates the projection position of each pixel on the UV texture image in each image frame of the video data. Specifically, the texture mapping unit 133 uses the projection position r (bold body) ^{imin, l} _k (hat) obtained in the process of step ^{S1205 and} b (bold body) ^imin = b () obtained in the process of step S1204. Bold body) A weighted combination r (bold body) _k (u, v) (hat) by ^imin (u, v) is calculated as in the following equation (32), and a pixel (u, v) on the UV texture image is calculated. ), RGB values p (bold body) _k (r (bold body) _k (u, v) (hat)) (0 ≦ k <N _frames ) in the k-th image frame of the video data are obtained. The RGB value p (bold body) _k (r (bold body) _k (u, v) (hat)) is the light / dark information contained in the original image frame.

次に、ステップＳ１２０７において、テクスチャマッピング部１３３は、三次元ＣＧ顔モデルにおけるＵＶテクスチャ画像の画素（ｕ，ｖ）に対応する位置での表面法線を推定する。具体的には、テクスチャマッピング部１３３は、表面法線ｎ（ボールド体）^{ｉｍｉｎ，ｌ}（ｌ（エル）＝０，１，２）の、ステップＳ１２０４の処理で求めたｂ（ボールド体）^ｉｍｉｎ＝ｂ（ボールド体）^ｉｍｉｎ（ｕ，ｖ）による重み付組み合わせｎ（ボールド体）（ｕ，ｖ）（ハット）を下の式（３３）のように計算する。 In step S1207, the texture mapping unit 133 estimates a surface normal at a position corresponding to the pixel (u, v) of the UV texture image in the three-dimensional CG face model. Specifically, the texture mapping unit 133 ^{calculates the} surface normal n (bold body) ^{imin, l} (l (el) = 0, 1, 2), b (bold body) ^imin = determined in the process of step S1204. The weighted combination n (bold body) (u, v) (hat) by b (bold body) ^imin (u, v) is calculated as in the following equation (33).

次に、ステップＳ１２０８において、テクスチャマッピング部１３３は、映像データのｋ枚目の画像フレームにおける頭部姿勢Ｑ（ボールド体）^ｋ（ハット）に基づいて回転された表面法線であるＱ（ボールド体）^ｋ（ハット）ｎ（ボールド体）（ｕ，ｖ）（ハット）と視線の光軸（カメラの光軸）［００１］^Ｔとの間の角度であるθ^ｋ（ｕ，ｖ）（ハット）を用いて、下の式（３４）によって方向の重みα^ｋ（ｕ，ｖ）を計算する。なお、視線の光軸とは、画像フレームに映る頭部姿勢Ｑ（ボールド体）^ｋ（ハット）を見る仮想的な観察者の視線、または頭部姿勢Ｑ（ボールド体）^ｋ（ハット）を撮像する仮想的なカメラの撮像レンズの光軸のことである。 Next, in step S1208, the texture mapping unit 133 determines the Q (bold body) which is the surface normal rotated based on the head posture Q (bold body) ^k (hat) in the ^k-th image frame of the video data. ) ^k (hat) n (bold) (u, v) (hat) and the line of sight of the optical axis (the optical axis of the camera) ^[001] is the angle between the ^{^{T θ k (u, v)}} ( hat) Is used to calculate the direction weight α ^k (u, v) by the following equation (34). Note that the optical axis of the line of sight is a virtual observer's line of sight that looks at the head posture Q (bold body) ^k (hat) reflected in the image frame, or the head posture Q (bold body) ^k (hat). It is the optical axis of the imaging lens of a virtual camera.

方向の重みα^ｋ（ｕ，ｖ）は、式（３４）のｍの値をパラメータ（第２のパラメータ）設定によって調整できるようにする。このパラメータの調整により、映像データのｋ枚目の画像フレームにＵＶテクスチャ画像の画素（ｕ，ｖ）が映る位置における表面法線Ｑ（ボールド体）^ｋ（ハット）ｎ（ボールド体）（ｕ，ｖ）（ハット）と視線の光軸（カメラの光軸）との、角度に対する方向の重みα^ｋ（ｕ，ｖ）の変化度合いを可変にすることができる。 The direction weight α ^k (u, v) enables the value of m in the equation (34) to be adjusted by setting a parameter (second parameter). By adjusting this parameter, the surface normal Q (bold body) ^k (hat) n (bold body) (u, b) at the position where the pixel (u, v) of the UV texture image appears in the k-th image frame of the video data. v) The degree of change in the weight α ^k (u, v) in the direction with respect to the angle between the (hat) and the optical axis of the line of sight (the optical axis of the camera) can be made variable.

一般的に、三次元コンピュータグラフィックスのレンダリング処理において、三次元モデルの表面のある箇所が遮蔽されているか否かを判定するためのＺバッファ計算の負担は大きい。しかしながら、本実施形態におけるテクスチャマッピング処理においては、人物の顔の形状がおおよそ凸面形状を有していることにより、上記の方向の重みα^ｋ（ｕ，ｖ）を用いることによって、ＵＶテクスチャ画像上の画素（ｕ，ｖ）の映像データの各画像フレームに映る位置ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット）（０≦ｋ＜Ｎ_{ｆｒａｍｅｓ}）が遮蔽されているか否かの計算を省略することができる。すなわち、方向の重みα^ｋ（ｕ，ｖ）が大きい程、当該頭部姿勢における当該位置ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット）がより正対に近い形でカメラに向いており、よって当該位置が遮蔽されていない確率が高い。また、その方向からずれるに従って、方向の重みα^ｋ（ｕ，ｖ）の値が単調に減少していく。
方向の重みのこのような性質上、テクスチャマッピング部１３３は、三次元ＣＧ顔モデルに対する視線の光軸と三次元ＣＧ顔モデルの表面法線との角度に基づき、画像フレームに含まれるテクスチャをＵＶテクスチャ画像にマッピングする際の方向の重みα^ｋ（ｕ，ｖ）を調整することが好ましい。このようにすることにより、テクスチャの解像度を高く維持することができる。 Generally, in the rendering process of 3D computer graphics, the burden of Z buffer calculation for determining whether or not a portion of the surface of the 3D model is blocked is large. However, in the texture mapping process according to the present embodiment, since the shape of the human face has a substantially convex shape, the weight α ^k (u, v) in the above direction is used, so that The calculation of whether or not the position r (bold body) _k (u, v) (hat) (0 ≦ k <N _frames ) _appearing in each image frame of the video data of the pixel (u, v) is occluded is omitted. can do. That is, the larger the direction weight α ^k (u, v), the more suitable the position r (bold body) _k (u, v) (hat) in the head posture is facing the camera. Therefore, there is a high probability that the position is not shielded. In addition, the value of the direction weight α ^k (u, v) decreases monotonously as it deviates from that direction.
Due to this property of the direction weight, the texture mapping unit 133 applies the texture included in the image frame to the UV based on the angle between the optical axis of the line of sight with respect to the 3D CG face model and the surface normal of the 3D CG face model. It is preferable to adjust the direction weight α ^k (u, v) when mapping to the texture image. By doing so, the texture resolution can be kept high.

次に、ステップＳ１２０９において、テクスチャマッピング部１３３は、前述した特徴点の三次元位置・頭部姿勢推定処理のステップＳ７０４の処理において推定した特徴点三次元座標値ｍ（ボールド体）_ｊ（ハット），０≦ｊ＜Ｎ_ＦＰに対応するメッシュ頂点に対応付けられたＵＶテクスチャ画像上の画素位置（ｕ_ｊ，ｖ_ｊ），０≦ｊ＜Ｎ_ＦＰへの、ＵＶテクスチャ画像におけるユークリッド距離による距離の重みδ（ｕ，ｖ）を下の式（３５）により計算する。 Next, in step S1209, the texture mapping unit 133 determines the feature point three-dimensional coordinate value m (bold body) _j (hat) estimated in step S704 of the feature point three-dimensional position / head posture estimation process described above. , 0 ≦ j <N _FP of pixel distance (u _j , v _j ) on the UV texture image associated with the mesh vertex corresponding to _FP , 0 ≦ j <N _{FP according} to the Euclidean distance in the UV texture image The weight δ (u, v) is calculated by the following equation (35).

距離の重みδ（ｕ，ｖ）は、式（３５）のｎの値をパラメータ（第１のパラメータ）設定によって調整できるようにする。このパラメータの調整により、ＵＶテクスチャ画像上の特徴点からの距離が遠いほど距離の重みδ（ｕ，ｖ）が小さくなるときに、距離に対する距離の重みδ（ｕ，ｖ）の変化度合いを可変にすることができる。
なお、三次元推定部１３０は、各々の画素についてステップＳ１２０２からＳ１２０９までの処理で計算した結果のデータをメモリに記憶させておく。そして、ステップＳ１２０９の処理が終了すると、ステップＳ１２０２に戻る。 The distance weight δ (u, v) allows the value of n in the equation (35) to be adjusted by setting a parameter (first parameter). By adjusting this parameter, when the distance weight δ (u, v) decreases as the distance from the feature point on the UV texture image increases, the degree of change in the distance weight δ (u, v) with respect to the distance is variable. Can be.
Note that the three-dimensional estimation unit 130 stores, in a memory, data obtained as a result of the calculation in steps S1202 to S1209 for each pixel. Then, when the process of step S1209 ends, the process returns to step S1202.

ステップＳ１２０３からステップＳ１２１０に進んだ場合には、ステップＳ１２１０において、テクスチャマッピング部１３３は、距離の重みδ（ｕ，ｖ）および方向の重みα^ｋ（ｕ，ｖ）を用いてＵＶテクスチャ画像上の画素（ｕ，ｖ）におけるＲＧＢベクトル値を更新する。具体的には、テクスチャマッピング部１３３は、画素（ｕ，ｖ）におけるＲＧＢベクトルｔ（ボールド体）（ｕ，ｖ）を式（３６）によって計算（合成）し、画素（ｕ，ｖ）における初期のテクスチャ値ｔ（ボールド体）_０（ｕ，ｖ）をｔ（ボールド体）（ｕ，ｖ）で更新する。 When the process proceeds from step S1203 to step S1210, in step S1210, the texture mapping unit 133 uses the distance weight δ (u, v) and the direction weight α ^k (u, v) to perform the processing on the UV texture image. The RGB vector value at the pixel (u, v) is updated. Specifically, the texture mapping unit 133 calculates (synthesizes) the RGB vector t (bold body) (u, v) at the pixel (u, v) according to the equation (36), and performs an initial operation at the pixel (u, v). The texture value t (bold body) ₀ (u, v) is updated with t (bold body) (u, v).

このとき、テクスチャマッピング部１３３（テクスチャ画像書き込み部）は、既に第１の実施形態において述べた方法と同様の方法で、低解像度から高解像度に順次、既に得られている貼り付け画像ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット）を用いて所定の解像度によるリファレンス画像を生成し、そのリファレンス画像と各貼り付け画像とに基づき、貼り付け画像の画素の位置ずれ量を算出し、そして、得られた位置ずれ量を用いて画素の位置をアラインする。このとき、既に計算されている重み値、つまりフレーム画像ごとの各画素における方向の重みα^ｋ（ｕ，ｖ）を適宜、用いる。そして低解像度から最高解像度まで順次上記のアラインメント処理を行なうことは、つまり、貼り付け画像ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット）の画素（ｕ，ｖ）の各画素を、ｒ（ボールド体）_ｋ（ｕ＋ｍ^ｋ（ｕ，ｖ），ｖ＋ｎ^ｋ（ｕ，ｖ））（ハット）にアラインすることである。以下の、テクスチャ画像への貼り付けの処理では、アラインされた後の位置を再び（ｕ，ｖ）として表しながら説明する。 At this time, the texture mapping unit 133 (texture image writing unit) is a method similar to the method described in the first embodiment, and the pasted image r (bold image) already obtained sequentially from the low resolution to the high resolution. Body) _k (u, v) (hat) is used to generate a reference image with a predetermined resolution, and based on the reference image and each pasted image, a pixel displacement amount of the pasted image is calculated, and Then, the position of the pixel is aligned using the obtained positional deviation amount. At this time, the weight value that has already been calculated, that is, the direction weight α ^k (u, v) in each pixel for each frame image is appropriately used. The sequential alignment processing from the low resolution to the maximum resolution means that each pixel of the pixel (u, v) of the pasted image r (bold body) _k (u, v) (hat) is represented by r ( Bold) _k (u + m ^k (u, v), v + n ^k (u, v)) (hat). In the following processing for pasting to a texture image, the position after the alignment will be described again as (u, v).

式（３６）おいて、ｔ（ボールド体）_０（ｕ，ｖ）は、三次元ＣＧ顔モデル上のＵＶテクスチャ画像の画素（ｕ，ｖ）における初期ＲＧＢベクトル値（テクスチャ値）である。定数γは、このｔ（ボールド体）_０（ｕ，ｖ）と、映像データから三次元ＣＧ顔モデルに貼り付けるテクスチャとのバランスを調整するものである。ｗ（ボールド体）（ｕ，ｖ）は、映像データの平均輝度をｔ（ボールド体）_０（ｕ，ｖ）の平均輝度に合わせるための調整量である。この調整は、例えば、ヒストグラムイコライゼーション等の前処理によって、映像データの輝度を三次元ＣＧ顔モデルのテクスチャの輝度に合わせるようにする。 In Expression (36), t (bold body) ₀ (u, v) is an initial RGB vector value (texture value) at pixel (u, v) of the UV texture image on the three-dimensional CG face model. The constant γ adjusts the balance between this t (bold body) ₀ (u, v) and the texture to be pasted from the video data to the three-dimensional CG face model. w (bold body) (u, v) is an adjustment amount for adjusting the average luminance of the video data to the average luminance of t (bold body) ₀ (u, v). In this adjustment, for example, the luminance of the video data is adjusted to the luminance of the texture of the three-dimensional CG face model by preprocessing such as histogram equalization.

また、式（３６）に示すように、テクスチャマッピング部１３３は、ｋ番目の画像フレームにおけるＲＧＢ値ｐ（ボールド体）_ｋ（ｒ（ボールド体）_ｋ（ｕ，ｖ）（ハット））に、方向の重みα^ｋ（ｕ，ｖ）を乗じる計算を行っている。つまり、テクスチャマッピング部１３３は、元の画像フレームに含まれる濃淡・色彩情報を二次元顔テクスチャ画像にマッピングする際に、この方向の重みによる調整を行なっている。
同じく、テクスチャマッピング部１３３は、Ｎ_{ｆｒａｍｅｓ}枚の画像フレームについてのＲＧＢ値の総和（但し、上記の方向の重みで調整したもの）に、距離の重みδ（ｕ，ｖ）を乗じる計算を行っている。つまり、テクスチャマッピング部１３３は、元の画像フレームに含まれる濃淡・色彩情報を二次元顔テクスチャ画像にマッピングする際に、この距離の重みによる調整を行なっている。
前述の通り、式（３４）のｍの値をパラメータ（第２のパラメータ）設定によって調整可能としており、これにより、三次元ＣＧ顔モデルの表面法線と視線の光軸（カメラの光軸）とがなす角度に対する方向の重みα^ｋ（ｕ，ｖ）の変化度合いを可変としている。また、式（３５）のｎの値をパラメータ（第１のパラメータ）設定によって調整可能としており、これにより、ＵＶテクスチャ画像上の特徴点からの距離に対する距離の重みδ（ｕ，ｖ）の変化度合いを可変としている。このようにｍとｎの両方を可変とした場合に、両者の重みのバランスを調整することができる。例えば、角度に対する方向の重みの変化度合いが比較的なだらかになるように調整した場合、表面法線と視線の光軸がなす角度がある程度大きくても、単なるＵＶテクスチャ画像の値だけではなく、現実に撮像した結果である画像フレームにおけるＲＧＢ値がある程度反映された特徴量をデータベースに登録することができる。 Further, as shown in Expression (36), the texture mapping unit 133 determines the RGB value p (bold body) _k (r (bold body) _k (u, v) (hat)) in the k-th image frame in the direction. Is calculated by multiplying by the weight α ^k (u, v). That is, the texture mapping unit 133 performs adjustment based on the weight in this direction when mapping the shading / color information included in the original image frame to the two-dimensional face texture image.
Similarly, the texture mapping unit 133 performs a calculation by multiplying the sum of RGB values (adjusted by the weight in the above direction) for N _frames image frames by a weight δ (u, v). Yes. That is, the texture mapping unit 133 performs adjustment based on the weight of this distance when mapping the shading / color information included in the original image frame to the two-dimensional face texture image.
As described above, the value of m in the equation (34) can be adjusted by setting a parameter (second parameter), whereby the surface normal of the three-dimensional CG face model and the optical axis of the line of sight (camera optical axis). The degree of change in the direction weight α ^k (u, v) with respect to the angle between the two is variable. In addition, the value of n in the equation (35) can be adjusted by setting a parameter (first parameter), thereby changing the distance weight δ (u, v) with respect to the distance from the feature point on the UV texture image. The degree is variable. In this way, when both m and n are variable, the balance between the weights of both can be adjusted. For example, when the degree of change in the weight of the direction relative to the angle is adjusted to be relatively gentle, even if the angle formed by the surface normal and the optical axis of the line of sight is large to some extent, It is possible to register in the database a feature amount in which the RGB value in the image frame that is the result of imaging is reflected to some extent.

［レンダリング処理］
レンダリング部１４０は、三次元推定部１３０で修正した修正三次元ＣＧ顔モデルを所定の頭部姿勢（θ_ｙ，θ_ｘ１）に回転させ、レンダリング処理を行って合成顔画像モデルを生成する。なお、θ_ｙは顔の鉛直方向の軸を中心とする角度、θ_ｘ１は顔の水平方向の軸を中心とする角度を示す。レンダリング部１４０は、三次元ＣＧ顔モデルの鉛直方向の軸を中心とした回転、次に水平方向の軸を中心とした回転を行う。よって、鉛直方向の回転軸は三次元ＣＧ顔モデルのｙ軸であり、水平方向の回転軸は水平方向であり且つ顔正面と平行な軸である。即ち、水平方向の回転軸そのものが、鉛直方向の回転軸を中心とする回転によって回転する。 [Rendering]
The rendering unit 140 rotates the modified 3D CG face model corrected by the 3D estimation unit 130 to a predetermined head posture (θ _y , θ _x1 ), performs rendering processing, and generates a composite face image model. Θ _y is an angle around the vertical axis of the face, and θ _x1 is an angle around the horizontal axis of the face. The rendering unit 140 performs rotation about the vertical axis of the three-dimensional CG face model, and then rotation about the horizontal axis. Therefore, the vertical rotation axis is the y-axis of the three-dimensional CG face model, and the horizontal rotation axis is the horizontal direction and an axis parallel to the face front. That is, the horizontal rotation axis itself rotates by rotation about the vertical rotation axis.

具体的には、レンダリング部１４０は、特徴点三次元座標値ｍ（ボールド体）_ｊ（ハット）∈Ｒ^３，０≦ｊ＜Ｎ_ＦＰを推定した各特徴点が、頭部姿勢（θ_ｙ，θ_ｘ１）でレンダリングされた画像Ｒ^{θｙ，θｘ１}（ｚ（ボールド体）），ｚ（ボールド体）∈Ｒ^２に見えるか否かを示す可視性ｖ（ボールド体）^{θｙ，θｘ１} _ｊ∈｛０，１｝（０≦ｊ＜Ｎ_ＦＰ）と、ｖ（ボールド体）^{θｙ，θｘ１} _ｊ＝１（可視）である場合に、その特徴点が画像Ｒ^{θｙ，θｘ１}（ｚ（ボールド体））に映る位置ｚ（ボールド体）^{θｙ，θｘ１} _ｊとを計算してレンダリング処理を行う。つまり、レンダリング部１４０は、特徴点の二次元座標値データを算出する。 Specifically, the rendering unit 140 determines that each feature point estimated from the feature point three-dimensional coordinate value m (bold body) _j (hat) ∈ R ³ , 0 ≦ j <N _FP is the head posture (θ _y , Visibility v (bold body) ^{θy, θx1} _j ∈ {0, indicating whether or not the image R ^{θy, θx1} (z (bold body)), z (bold body) ∈ R ² rendered with θ _x1 ) 1} (0 ≦ j <N _FP ) and v (bold body) ^{θy, θx1} _j = 1 (visible), the position at which the feature point appears in the image R ^{θy, θx1} (z (bold body)) Rendering processing is performed by calculating z (bold body) ^{θy, θx1} _j . That is, the rendering unit 140 calculates two-dimensional coordinate value data of feature points.

［データベース登録処理］
データベース登録部１５０は、レンダリング部１４０がレンダリング処理した画像Ｒ^{θｙ，θｘ１}（ｚ（ボールド体））上の可視である（ｖ（ボールド体）^{θｙ，θｘ１} _ｊ＝１）特徴点の座標ｚ（ボールド体）^{θｙ，θｘ１} _ｊにおける、ガボールウェーブレット特徴ｆ^{θｙ，θｘ１} _ｊ（ｒ，θ）を、Ｎ_ｒｅｓ個の解像度ｒ∈｛ｒ_０，ｒ_１，・・・，ｒ_{Ｎｒｅｓ−１}｝と、Ｎ_ｏｒｎ個の方位φ∈｛φ_０，・・・，φ_{Ｎｏｒｎ−１}｝とによる一点畳み込み計測を式（３７）によって行う。 [Database registration process]
The database registration unit 150 is visible (v (bold body) ^{θy, θx1} _j = 1) on the image R ^{θy, θx1} (z (bold body)) rendered by the rendering unit 140. body) ^{[theta] y,} in ^θx1 _j, Gabor wavelet features ^{_{f θy, θx1 j (r,}} θ _{a), N res} pieces of resolution _{_{r∈ {r 0, r 1,}} ···, r Nres-1} _{and, N orn} One-point convolution measurement is performed according to the equation (37) with the individual orientations φ∈ {φ ₀ ,..., Φ _Norn−1 }.

次に、データベース登録部１５０は、図９に示す可変テンプレート構造体を生成してデータを各メンバに格納し、顔特徴データベース部１６０に登録する。具体的には、データベース登録部１５０は、登録人物ごとに、登録対象人物の氏名または登録人物を特定するための名称、および識別情報を人物識別情報に格納し、頭部姿勢の個数分の頭部姿勢インデックスと各頭部姿勢における特徴点情報とを格納する。頭部姿勢インデックスは、頭部姿勢（θ_ｙ，θ_ｘ１）である。特徴点情報には、特徴点番号ｊ番目（０≦ｊ＜Ｎ_ＦＰ）に対応させて、示す可視性フラグｖ^{θｙ，θｘ１} _ｊと、その特徴点の特徴点二次元座標値ｚ（ボールド体）^{θｙ，θｘ１} _ｊと、Ｎ_ｒｅｓ個の解像度（ウェーブレットサイズ）×Ｎ_ｏｒｎ個の方位におけるガボールウェーブレット特徴ｆ^{θｙ，θｘ１} _ｊ（ｒ，θ）とを格納する。 Next, the database registration unit 150 generates the variable template structure shown in FIG. 9, stores the data in each member, and registers it in the face feature database unit 160. Specifically, the database registration unit 150 stores, for each registered person, the name of the person to be registered or the name for identifying the registered person, and identification information in the person identification information, and heads corresponding to the number of head postures. The head posture index and the feature point information in each head posture are stored. The head posture index is a head posture (θ _y , θ _x1 ). In the feature point information, the visibility flag v ^{θy, θx1} _j shown corresponding to the feature point number jth (0 ≦ j <N _FP ), and the feature point two-dimensional coordinate value z (bold body) of the feature point ^{[theta] y,} and stores the ^θx1 _{_j, N res} number of resolutions (wavelets size) × _{N orn} number of Gabor wavelet features f ^{[theta] y} at an ^{azimuth, θx1} _j ^{(r, θ)} and the.

以上述べたように、本実施形態によれば、非線形的な位置ずれに対応したアラインメントを行いながら、複数の頭部姿勢についての特徴点に係る情報の登録の手間を軽減し、登録用の映像または複数の静止画像の画像フレームに含まれていない頭部姿勢についても容易に登録することができる。 As described above, according to the present embodiment, while performing alignment corresponding to the non-linear positional deviation, the labor for registering information related to feature points for a plurality of head postures is reduced, and the registration video is recorded. Alternatively, head postures that are not included in a plurality of still image frames can be easily registered.

なお、上述した実施形態における画像処理装置あるいは顔画像処理装置の機能の全部または一部を、コンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that all or part of the functions of the image processing apparatus or the face image processing apparatus in the above-described embodiment may be realized by a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It is also possible to include those that hold a program for a certain time, such as a volatile memory inside a computer system serving as a server or client in that case. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、複数の実施形態を説明したが、本発明はさらに次のような変形例でも実施することが可能である。
例えば、第２実施形態においては、人の顔（頭部）を対象とした画像を処理する顔画像処理装置について説明したが、被写体はこれに限らず、三次元形状を有する任意の被写体を対象とする画像処理装置としても良い。但し、被写体がある程度の剛体性を有する場合に、特徴点間の位置関係の推定精度が良くなる。また、被写体表面がある程度の非線形的変形をする場合（但し、その場合には限られない）に、特に本発明特有の効果が得られる。 Although a plurality of embodiments have been described above, the present invention can also be implemented in the following modifications.
For example, in the second embodiment, a face image processing apparatus that processes an image of a human face (head) has been described. However, the subject is not limited to this, and an arbitrary subject having a three-dimensional shape is targeted. An image processing apparatus may be used. However, when the subject has a certain degree of rigidity, the estimation accuracy of the positional relationship between the feature points is improved. In addition, when the subject surface undergoes a certain amount of nonlinear deformation (however, this is not the case), an effect specific to the present invention can be obtained.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

本発明は、撮像画像に基づく三次元形状物体のモデリングに、広く利用可能である。また、ヒューマンマシンインタフェースの分野にも利用可能である。 The present invention can be widely used for modeling a three-dimensional object based on a captured image. It can also be used in the field of human machine interfaces.

１画像処理装置
１１画像データ記憶部
１２逆ポーズ変換処理部
１４テクスチャマッピング部
１５テクスチャ画像記憶部
５１−０，５１−１，５１−２，・・・貼り付け画像取得部
５２−０，５２−１，・・・ワーピング処理部
５３リファレンス画像生成部
５４制御部
５５テクスチャ画像書き込み部
６１−０，６１−１，・・・画像加算部
６２−０，６２−１，・・・ガウシアンウィンドウ処理部
１００顔画像処理装置（画像処理装置）
１１０画像データ記憶部
１２０顔領域検出照合部
１３０三次元推定部
１３１位置・姿勢推定部
１３２メッシュワーピング部
１３３テクスチャマッピング部
１３４顔モデル記憶部
１３５メッシュ頂点割当情報記憶部
１３６テクスチャ画像記憶部
１３７画素割当情報記憶部
１４０レンダリング部
１５０データベース登録部
１６０顔特徴データベース部 DESCRIPTION OF SYMBOLS 1 Image processing apparatus 11 Image data storage part 12 Reverse pose conversion process part 14 Texture mapping part 15 Texture image storage part 51-0, 51-1, 51-2, ... Pasted image acquisition part 52-0, 52- DESCRIPTION OF SYMBOLS 1, ... Warping processing part 53 Reference image generation part 54 Control part 55 Texture image writing part 61-0, 61-1, ... Image addition part 62-0, 62-1, ... Gaussian window processing part 100 face image processing apparatus (image processing apparatus)
DESCRIPTION OF SYMBOLS 110 Image data memory | storage part 120 Face area | region detection collation part 130 Three-dimensional estimation part 131 Position and attitude | position estimation part 132 Mesh warping part 133 Texture mapping part 134 Face model memory | storage part 135 Mesh vertex allocation information storage part 136 Texture image memory | storage part 137 Pixel allocation Information storage unit 140 Rendering unit 150 Database registration unit 160 Face feature database unit

Claims

A pasted image acquisition unit that acquires a pasted image obtained by estimating the positions of the pixels included in the plurality of frame images obtained by capturing the three-dimensional object on the surface of the three-dimensional object;
A reference image generation unit that generates a reference image of the resolution by applying a low-pass filter with a predetermined resolution to the sum of the pasted images acquired by the pasted image acquisition unit;
Based on the pasted image and the reference image, the amount of pixel misregistration included in the pasted image is calculated, and the pixel value of the pixel of the pasted image is separated from the pixel value by the amount of misalignment. The warping processing part to be replaced with,
An image processing apparatus comprising:

In the predetermined resolution, the warping processing unit causes the pasted image acquisition unit to acquire the pasted image obtained by replacing the pixel value of the pixel with the pixel value of the pixel separated by the positional deviation amount , A control unit that controls the reference image generation unit to generate the reference image at the next resolution and to sequentially repeat replacement of pixel values of the pixels of the pasted image from the low resolution side to the high resolution side;
The image processing apparatus according to claim 1, further comprising:

A texture image storage unit for storing a texture image representing the texture of the surface of the three-dimensional object;
A texture image writing unit for writing the texture image obtained by synthesizing a plurality of the pasting images obtained by the replacement of the warping process the pixel values that by the unit to the texture image storage unit,
The image processing apparatus according to claim 2, further comprising:

The three-dimensional object is a face;
A face feature database unit for storing two-dimensional coordinate value data corresponding to facial feature points in association with person identification information and angle data representing a head posture;
A face area detection collation unit that estimates the two-dimensional coordinate value data of the facial feature points included in the read image frame based on the two-dimensional coordinate value data corresponding to the feature points read from the facial feature database unit;
A face model storage unit for storing three-dimensional coordinate value data corresponding to mesh vertices in a predetermined generic model;
Included in the image frame based on the two-dimensional coordinate value data of the feature points estimated by the face area detection collation unit and the three-dimensional coordinate value data corresponding to the mesh vertices read from the face model storage unit A position / posture estimator for estimating three-dimensional coordinate value data of the feature points and angle data representing a head posture of the image frame;
Based on the three-dimensional coordinate value data and the angle data of the feature point estimated by the position / posture estimation unit, a corrected face model is generated by warping the mesh vertex in the generic model, and the mesh vertex is generated. Correspondingly, a mesh warping unit for calculating three-dimensional coordinate value data of mesh vertices in the modified face model,
Based on the corrected face model generated by the mesh warping unit, a plurality of synthesized face image models are generated by performing rendering processing when the angle data representing the head posture is changed, and two-dimensional coordinate values of the feature points A rendering unit for calculating data;
A database registration unit that registers the two-dimensional coordinate value data of the feature points calculated by the rendering unit in the face feature database unit in association with the corresponding angle data;
With
The face feature database unit stores at least image feature information in the vicinity of the feature points in association with the person identification information and the angle data of the head posture;
The rendering unit reads the texture image written by the texture image writing unit from the texture image storage unit, and performs a rendering process based on the texture image.
The database registration unit registers the image feature information based on the result of the rendering process performed by the rendering unit in the face feature database unit in association with the corresponding angle data.
The image processing apparatus according to claim 3.

A pasted image acquisition unit that acquires a pasted image obtained by estimating the positions of the pixels included in the plurality of frame images obtained by capturing the three-dimensional object on the surface of the three-dimensional object;
A reference image generation unit that generates a reference image of the resolution by applying a low-pass filter with a predetermined resolution to the sum of the pasted images acquired by the pasted image acquisition unit;
Based on the pasted image and the reference image, a pixel displacement amount included in the pasted image is calculated, and a pixel value of the pixel of the pasted image is calculated based on the obtained positional displacement amount. A warping processor that replaces the pixel values of pixels separated by an amount ;
A computer program that causes a computer to function as an image processing apparatus.