JP5552519B2

JP5552519B2 - Construction of face feature vector

Info

Publication number: JP5552519B2
Application number: JP2012230281A
Authority: JP
Inventors: エリックソレムジャン; ルーソンマイケル
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2011-12-20
Filing date: 2012-09-28
Publication date: 2014-07-16
Anticipated expiration: 2032-09-28
Also published as: KR20130071341A; TWI484444B; CN103198292A; US8593452B2; JP2013131209A; KR101481225B1; TW201327478A; EP2608108A1; AU2012227166A1; CA2789887C; CA2789887A1; US20130155063A1; AU2012227166B2; WO2013095727A1

Description

本開示は、一般に、顔認識の分野に関する。より具体的には、本開示は、複数のタイプの顔認識記述子を組み合わせて、単一の要素である顔特徴ベクトルにするための多数の技術を説明する。顔特徴ベクトルは、顔認識の用途に用いることができる。そのような用途の例として、ｉＰｈｏｔｏ（登録商標）及びＡｐｅｒｔｕｒｅ（登録商標）において画像（スチル画像及びビデオ画像）を管理し、ソートし、注釈を付けることが挙げられるが、これらに限定されるものではない。（ＩＰＨＯＴＯ及びＡＰＥＲＴＵＲＥはＡｐｐｌｅＩｎｃ．社の登録商標である。） The present disclosure relates generally to the field of face recognition. More specifically, this disclosure describes a number of techniques for combining multiple types of face recognition descriptors into a single element face feature vector. The face feature vector can be used for face recognition. Examples of such applications include, but are not limited to, managing (sorting and annotating) images (still images and video images) in iPhoto® and Aperture®. is not. (IPHOTO and APERTURE are registered trademarks of Apple Inc.)

一般論として、顔認識演算は、人物の顔を走査し、そこから特定のパラメータのセットを抽出又は検出し、これらのパラメータを、既に識別情報が割り当てられているかそうでなければ知られている既知の顔データのライブラリと照合する。新たな画像のパラメータが比較されるデータ・セットは、モデルによって特徴付け又は記述される場合が多い。実際には、これらのモデルは、パラメータ・セットのグループを定めるものであり、所与のグループに入る全ての画像は、同一人物のものとして分類される。 In general, face recognition operations scan a person's face, extract or detect a specific set of parameters from them, and these parameters are already assigned identification information or otherwise known Check against a library of known face data. The data set to which the new image parameters are compared is often characterized or described by the model. In practice, these models define a group of parameter sets, and all images that fall within a given group are classified as those of the same person.

ロバスト（例えば、画像ノイズ、人物のポーズ、及び場面の照度に対して安定である）且つ正確（例えば、高い認識率をもたらす）であるためには、特定のパラメータ・セットは、典型的な人物内での変動に対して繰り返し可能で不変であると同時に、ある人物を別の人物と区別することが可能な方法で、顔を記述する情報をコード化する必要がある。この必要性は、全ての顔認識システムが直面する中心的な問題である。従って、ロバスト且つ正確な顔認識をもたらすパラメータのセットを定めるための機構（方法、装置、及びシステム）を特定することが有益である。 In order to be robust (eg, stable against image noise, human poses, and scene illumination) and accurate (eg, resulting in a high recognition rate), a particular parameter set is a typical person It is necessary to code the information describing the face in a way that is repeatable and unchanging with respect to variations in the face, and at the same time distinguishing one person from another. This need is a central problem facing all face recognition systems. Thus, it is beneficial to identify a mechanism (method, apparatus, and system) for defining a set of parameters that provides robust and accurate face recognition.

種々の実施形態においては、本発明は、デジタル画像において検出された顔を識別するのに用いることができる新規な顔特徴ベクトルを生成するための装置（例えば、パーソナル・コンピュータ）、方法、及びコンピュータ・プログラム・コードを提供する。本方法は、コンピュータ・プログラム・コードを実施（又は実行）して、（例えば、顔検出技術によって）第１の画像における第１の顔についてランドマーク検出情報を得ることを含む。ランドマーク検出情報を第１及び第２の形状モデルに適用して第１及び第２の形状特徴ベクトルを生成し、第１及び第２のテクスチャ・モデルに適用して第１及び第２のテクスチャ特徴ベクトルを生成することができる。これらの４つの特徴ベクトルの全てを組み合わせて、顔特徴ベクトルを形成することができる。 In various embodiments, the present invention provides an apparatus (eg, a personal computer), method, and computer for generating a novel facial feature vector that can be used to identify a detected face in a digital image.・ Provide program code. The method includes implementing (or executing) the computer program code to obtain landmark detection information for the first face in the first image (eg, by face detection techniques). The landmark detection information is applied to the first and second shape models to generate first and second shape feature vectors, and is applied to the first and second texture models to apply the first and second textures. A feature vector can be generated. All of these four feature vectors can be combined to form a face feature vector.

１つの実施形態においては、第１の形状モデルは、検出された顔の二次元形状モデルであり、第２の形状モデルは、検出された顔の三次元形状モデルである。第１及び第２の形状モデルは、互いに独立に線形又は非線形とすることができる。 In one embodiment, the first shape model is a two-dimensional shape model of the detected face, and the second shape model is a three-dimensional shape model of the detected face. The first and second shape models can be linear or non-linear independently of each other.

別の実施形態においては、ランドマーク検出情報は、第１及び第２のテクスチャ特徴ベクトルを生成するのに用いられる前に、正規化することができる。幾つかの実施形態においては、第１のテクスチャ特徴ベクトルは、正規化されたランドマーク検出情報内の特定の領域に基づくものとすることができる（この領域は、正規化されたランドマーク検出情報の全てより少ない情報を含む）。 In another embodiment, the landmark detection information can be normalized before being used to generate the first and second texture feature vectors. In some embodiments, the first texture feature vector may be based on a specific region in the normalized landmark detection information (this region is normalized landmark detection information). Contains less information than all of

更に別の実施形態においては、正規化されたランドマーク検出情報を用いて第２のテクスチャ特徴ベクトルを生成する前に、正規化されたランドマーク検出情報にモーフィング演算を適用することができる。 In yet another embodiment, a morphing operation can be applied to the normalized landmark detection information before generating the second texture feature vector using the normalized landmark detection information.

更にまた別の実施形態においては、こうした２つの顔特徴ベクトルを比較することによって、類似性測度を決定することができる。この類似性測度を用いて、２つの顔特徴ベクトルが同じ顔を表す可能性が高いかどうかを判定することができる。この実施形態及び類似の実施形態においては、類似性測度は、マハラノビス距離測度に基づくことができる。 In yet another embodiment, the similarity measure can be determined by comparing these two facial feature vectors. This similarity measure can be used to determine whether two face feature vectors are likely to represent the same face. In this and similar embodiments, the similarity measure can be based on the Mahalanobis distance measure.

１つの実施形態による顔特徴ベクトル生成及び実行時顔認識の演算をブロック図形式で示す。FIG. 5 illustrates in face diagram form face feature vector generation and runtime face recognition operations according to one embodiment. FIG. １つの実施形態による形状モデル及びテクスチャ・モデルの合成をブロック図形式で示す。FIG. 3 illustrates in block diagram form a composition of a shape model and a texture model according to one embodiment. 別の実施形態による顔特徴ベクトル生成の演算をブロック図形式で示す。FIG. 6 illustrates in a block diagram form a face feature vector generation operation according to another embodiment. １つの実施形態によるローカル画像記述子の演算を示す。Fig. 4 illustrates the computation of a local image descriptor according to one embodiment. １つの実施形態による密な画像記述子の演算を示す。Fig. 4 illustrates dense image descriptor operations according to one embodiment. １つの実施形態による密な歪み画像記述子の領域を示す。Fig. 4 illustrates a region of a dense distortion image descriptor according to one embodiment. １つの実施形態による顔特徴ベクトルの構造を示す。Fig. 4 illustrates the structure of a face feature vector according to one embodiment. １つの実施形態による顔認識演算をフローチャート形式で示す。2 shows a face recognition operation according to one embodiment in flowchart form. １つの実施形態による開示された顔特徴ベクトルの識別性能を示す例示的な受信者操作特性（ＲＯＣ）曲線を示す。FIG. 6 illustrates an exemplary receiver operating characteristic (ROC) curve illustrating the disclosed facial feature vector identification performance according to one embodiment. FIG. 本開示による１つ又はそれ以上の演算を実装するのに用いることができる例示的な電子装置をブロック図形式で示す。1 illustrates in block diagram form an exemplary electronic device that can be used to implement one or more operations in accordance with the present disclosure.

本開示は、顔認識パラメータ・セットを決定して適用するためのシステム、方法、及びコンピュータ可読媒体に関する。一般に、顔認識識別子の固有の組み合わせを特定して、従来技術の認識手法に比べてよりロバスト（例えば、画像ノイズ、人物のポーズ、及び場面の照度に対して安定である）且つより正確（例えば、高い認識率をもたらす）であることが見出された「顔特徴ベクトル」を構築するための技術が開示される。より具体的には、顔特徴ベクトルは、形状記述子とテクスチャ記述子との組み合わせによって生成することができる。１つの実装形態においては、顔特徴ベクトルは、顔の二次元（２Ｄ）形状と、顔の三次元（３Ｄ）形状と、顔の全体即ちグローバルなテクスチャと、細部即ちローカルなテクスチャ情報（例えば皮膚の色）とを記述する情報を含む。 The present disclosure relates to systems, methods, and computer-readable media for determining and applying a face recognition parameter set. In general, a unique combination of face recognition identifiers is identified to be more robust (eg, stable to image noise, human poses, and scene illumination) and more accurate (eg, compared to prior art recognition methods) A technique for constructing a “facial feature vector” that has been found to yield a high recognition rate. More specifically, the face feature vector can be generated by a combination of a shape descriptor and a texture descriptor. In one implementation, the face feature vector includes a two-dimensional (2D) shape of the face, a three-dimensional (3D) shape of the face, the entire face or global texture, and details or local texture information (eg, skin Information).

以下の説明においては、本発明の概念が完全に理解されるように、多数の具体的な詳細が説明目的で示される。当業者には公知である詳細によって本発明を不明確にしないように、この記述の一部として、本開示の図面の幾つかは、構造及び装置をブロック図の形式で表す。さらに、本開示において用いられる言葉は、主として読みやすさ及び教示の目的で選択されたものであり、本発明の主題を画定又は制限するために選択されたものではなく、こうした本発明の主題を判断するためには特許請求の範囲に頼る必要がある。本開示における「１つの実施形態」又は「一実施形態」への言及は、その実施形態に関連して説明された特定の特徴、構造、又は特性が本発明の少なくとも１つの実施形態に含まれていることを意味しており、「１つの実施形態」又は「一実施形態」への複数の言及が必ずしも全て同一の実施形態を指していると理解すべきではない。 In the following description, numerous specific details are set forth for purposes of explanation in order to provide a thorough understanding of the concepts of the present invention. As part of this description, some of the drawings in this disclosure represent structures and devices in block diagram form in order not to obscure the present invention with details that are well known to those skilled in the art. Further, the language used in this disclosure is selected primarily for readability and teaching purposes, and is not selected to define or limit the subject matter of the present invention. In order to make a decision, it is necessary to rely on the claims. Reference to “one embodiment” or “one embodiment” in this disclosure includes in the at least one embodiment of the invention the particular feature, structure, or characteristic described in connection with that embodiment. It is not to be understood that all references to “one embodiment” or “an embodiment” all refer to the same embodiment.

当然のことながら、いずれかの実際の実装形態の開発においては（いずれかの開発プロジェクトなどで）、開発者の特定の目的（例えば、システム関連及びビジネス関連の制約への適合）を達成するために多数の意思決定が行われなければならず、これらの目的は実装形態ごとに変わることになる。同様に当然のことながら、こうした開発努力は、複雑且つ時間のかかるものとなる可能性があるが、それにも関わらず、本開示の利益を有する顔認識分野の当業者にとっては通常業務に当たることになる。 Of course, in developing any actual implementation (such as in any development project), to achieve a developer's specific purpose (eg, conformance to system-related and business-related constraints). Many decisions must be made and these objectives will vary from implementation to implementation. Similarly, it will be appreciated that such development efforts can be complex and time consuming, but nevertheless will be routine for those skilled in the art of face recognition having the benefit of this disclosure. Become.

図１を参照すると、１つの実施形態による顔特徴ベクトル生成及び実行時顔認識の演算１００が、ブロック図形式で示される。始めに、入力画像１０５が顔検出器１１０に従って処理され、ランドマーク画像１１５が生成される。本明細書において用いられるときには、「ランドマーク画像」という用語は、ランドマーク点が検出された顔の画像を指す。ランドマーク特徴は、目、眉、鼻、口、及び頬といった１つ又はそれ以上の顔の特徴の位置を含むことができる。入力画像１０５は、例えば、デジタル・スチル・カメラ又はビデオ・カメラから得られた画像とすることができる。顔検出器１１０は、設計者の目的／制約に適したいずれかの方法を用いることができる。例示的な顔検出技術は、知識ベース、特徴不変、テンプレート・マッチング、及び外観ベースの方法を含むが、これらに限定されるものではない。顔を検出するための正確な方法は、以下の説明にとって重要なものではないため、本明細書ではこの演算に関してこれ以上説明しない。限定されるものではないが、１つの実施形態においては、ランドマーク画像１１５は、検出される特徴が目立つグレースケール画像とすることができる。説明を簡単にするために、以下においては、入力画像（例えば画像１０５）は単一の顔のみを含むものと仮定される。しかしながら、開示される技術は本来こうした限定を含むものではないことを理解されたい。 Referring to FIG. 1, a face feature vector generation and runtime face recognition operation 100 according to one embodiment is shown in block diagram form. First, the input image 105 is processed according to the face detector 110, and a landmark image 115 is generated. As used herein, the term “landmark image” refers to an image of a face from which landmark points have been detected. Landmark features can include the location of one or more facial features such as eyes, eyebrows, nose, mouth, and cheeks. The input image 105 can be, for example, an image obtained from a digital still camera or a video camera. The face detector 110 can use any method suitable for the designer's objectives / constraints. Exemplary face detection techniques include, but are not limited to, knowledge base, feature invariance, template matching, and appearance based methods. The exact method for detecting the face is not important for the following description, and will not be described further in this specification. Although not limited, in one embodiment, the landmark image 115 can be a grayscale image in which the detected features are noticeable. For ease of explanation, it will be assumed below that the input image (eg, image 105) contains only a single face. However, it should be understood that the disclosed technology does not inherently include such limitations.

ランドマーク画像１１５は、１つ又はそれ以上の形状モデル１２０及び１つ又はそれ以上のテクスチャ・モデル１２５に適用することができる。示されるように、形状モデル１２０は形状記述子１３０を生成し、テクスチャ・モデル１２５はテクスチャ記述子１３５を生成する。形状モデル１２０及びテクスチャ・モデル１２５は、典型的には、既知の画像のライブラリを用いてオフラインで生成され、互いに独立に線形又は非線形とすることができることを認識すべきである。これらのモデルはまた、ランドマーク点がそれ自体の外観モデルを有する「幾何学的制約のある部分に基づくモデル（geometry constrained part-based model）」を含むことができる。記述子１３０及び１３５は、ブロック１４０に従って、開発者の目的及び／又は制約を満たすいずれかの方法で組み合わせることができる。一例として、演算１４０は、与えられた形状記述子及びテクスチャ記述子の各々を連結することができる。別の実施形態においては、演算１４０は、記述子要素の線形組み合わせのセットを生成することができる。更に別の実施形態においては、形状記述子１３０をある方法で組み合せ、テクスチャ記述子１３５をそれとは異なる方法で組み合わせて、その各々の組み合わせを連結することができる。更に別の実施形態においては、１つ又はそれ以上の記述子をそれぞれのモデルによって生成されたものとして組み合わせ、一方他の記述子は、組み合わされる前に付加的な処理（例えば、次元縮小、平滑化など）を経るようにすることができる。どのように組み合わされた場合でも、演算１４０の結果は顔特徴ベクトル１４５である。顔特徴ベクトル１４５は、ストレージ１５０内（例えば、永続的な磁気又は固体ディスク・ユニット）に保持することができる。実際上は、顔特徴ベクトル１４５は、入力画像１０５内に（例えば、そのメタデータ内に）組み込むこと、及び／又は、画像１０５を参照する別個のデータ・ストア内に保持することができる。 The landmark image 115 can be applied to one or more shape models 120 and one or more texture models 125. As shown, shape model 120 generates shape descriptor 130 and texture model 125 generates texture descriptor 135. It should be appreciated that the shape model 120 and the texture model 125 are typically generated off-line using a library of known images and can be linear or non-linear independently of each other. These models can also include "geometry constrained part-based models" where the landmark points have their own appearance model. Descriptors 130 and 135 may be combined in any manner that meets the developer's objectives and / or constraints, according to block 140. As an example, operation 140 can concatenate each of the given shape and texture descriptors. In another embodiment, operation 140 can generate a set of linear combinations of descriptor elements. In yet another embodiment, shape descriptors 130 can be combined in one way and texture descriptors 135 can be combined in a different way to concatenate each combination. In yet another embodiment, one or more descriptors are combined as generated by the respective model, while other descriptors are subject to additional processing (eg, dimension reduction, smoothing) before being combined. ). Regardless of how they are combined, the result of operation 140 is a face feature vector 145. The face feature vector 145 can be maintained in the storage 150 (eg, a permanent magnetic or solid disk unit). In practice, the facial feature vector 145 can be incorporated into the input image 105 (eg, in its metadata) and / or kept in a separate data store that references the image 105.

顔特徴ベクトル１４５は、生成されると、（例えば、画像１０５内の）対応する画像の顔を識別するために、アプリケーション１５５によって用いることができる。例えば、アプリケーション１５５は、関連する顔特徴ベクトル＜ｆ＞が顔‘Ｆ’と関連付けられている又は識別される画像１６０を取り出すことができる。顔特徴ベクトル１４５は、取り出されると、顔特徴ベクトル＜ｆ＞と比較する１６５ことができ、（例えば何らかの都合のよい測度によって）両者が十分に類似している場合には、画像１０５は、顔‘Ｆ’を含んでいると言うことができる。１つの実施形態においては、アプリケーション１５５は、ユーザ・レベルのグラフィックス・アプリケーション（例えばｉＰｈｏｔｏ又はＡｐｅｒｔｕｒｅ）とすることができる。別の実施形態においては、アプリケーション１５５は、ユーザ・レベルのアプリケーションによって用いることができる顔認識フレームワークに組み込むことができる。更に別の実施形態においては、アプリケーション１５５の幾つか又は全てを、専用の画像処理ハードウェアに組み込むことができる。 Once generated, the face feature vector 145 can be used by the application 155 to identify the face of the corresponding image (eg, in the image 105). For example, the application 155 may retrieve an image 160 in which the associated face feature vector <f> is associated with or identified from the face 'F'. Once extracted, the facial feature vector 145 can be compared 165 with the facial feature vector <f>, and if they are sufficiently similar (eg, by some convenient measure), the image 105 is It can be said that it contains 'F'. In one embodiment, the application 155 may be a user level graphics application (eg, iPhoto or Aperture). In another embodiment, application 155 can be incorporated into a face recognition framework that can be used by user-level applications. In yet another embodiment, some or all of the applications 155 can be incorporated into dedicated image processing hardware.

図２を参照すると、形状モデル１２０は、二次元（２Ｄ）モデル２００及び三次元（３Ｄ）モデル２０５（それぞれ２Ｄ及び３Ｄの形状記述子２１０及び２１５を生成する）を含み、一方、テクスチャ・モデル１２５は、グローバル・テクスチャ・モデル２２０及びローカル・テクスチャ・モデル２２５（それぞれグローバル及びローカルのテクスチャ記述子２３０及び２３５を生成する）を含んでいることが分かる。 Referring to FIG. 2, shape model 120 includes a two-dimensional (2D) model 200 and a three-dimensional (3D) model 205 (generating 2D and 3D shape descriptors 210 and 215, respectively), while a texture model. It can be seen that 125 includes a global texture model 220 and a local texture model 225 (which generates global and local texture descriptors 230 and 235, respectively).

１つの実施形態においては、２Ｄモデル２００、３Ｄモデル２０５、及びグローバル・テクスチャ・モデル２００は、以下の形態の線形モデル、即ち、

とすることができ、ここで、

（これ以降ベクトルｉと記載する）は、（モデルが形状モデルであるかテクスチャ・モデルであるかに応じて）画像又は画像点を表し、Ｂは、基底ベクトル（通常、直交する）のセットを表し、

（これ以降ベクトルｃと記載する）は、モデル係数のセットを表し、

（これ以降ベクトルｍと記載する）は、（モデルが形状モデルであるかテクスチャ・モデルであるかに応じて）平均形状ベクトル又は平均テクスチャ・ベクトルを表す。（トレーニング）画像のセットが与えられると、例えば、主成分分析（ＰＣＡ）、独立成分分析（ＩＣＡ）、線形判別分析（ＬＤＡ）、弾性バンチ・グラフ・マッチング（ＥＢＧＭ）、トレース変換、アクティブ外観モデル（２Ｍ）、ベイズ・フレームワーク、サポート・ベクター・マシン（ＳＶＭ）、隠れマルコフ・モデル（Ｈ８）、及び固有顔といったあらゆる技術を用いて、基底ベクトルＢ及び平均形状／テクスチャ・ベクトルｍを決定することができる。Ｂを構成する基底ベクトルの数が、モデルの正確性をある程度まで決定する。従って、Ｂの大きさは、所望の正確性を達成するように設計者が選択することができる。１つの実装形態においては、１０個の基底ベクトルで十分な場合があるが、別の実装形態においては、２０、５０、又は７５個の基底ベクトルが必要となる場合がある。 In one embodiment, the 2D model 200, the 3D model 205, and the global texture model 200 are linear models of the form:

And where

(Hereinafter referred to as vector i) represents an image or image point (depending on whether the model is a shape model or a texture model) and B represents a set of basis vectors (usually orthogonal) Represent,

(Hereinafter referred to as vector c) represents a set of model coefficients,

(Hereinafter referred to as vector m) represents an average shape vector or average texture vector (depending on whether the model is a shape model or a texture model). Given a set of (training) images, for example, Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), Elastic Bunch Graph Matching (EBGM), Trace Transformation, Active Appearance Model (2M), base vector B and average shape / texture vector m are determined using any technique such as Bayesian framework, support vector machine (SVM), hidden Markov model (H8), and eigenface be able to. The number of basis vectors that make up B determines the accuracy of the model to some extent. Thus, the size of B can be selected by the designer to achieve the desired accuracy. In one implementation, 10 basis vectors may be sufficient, while in other implementations, 20, 50, or 75 basis vectors may be required.

図３を参照すると、顔特徴ベクトル構築演算３００の１つの実施形態に関するブロック図が示される。図１及び図２に関して上述されたように、入力画像１０５が、ランドマーク画像１１５を生成する顔検出器１１０に与えられる。例示された実施形態においては、ランドマーク画像１１５は、２Ｄ及び３Ｄの形状モデル２００及び２０５に直接与えることができる。これらのモデルを数式１によって特徴づけることができると仮定すると、２Ｄ形状モデル２００については、ベクトルｉはランドマーク画像１１５を表し、Ｂは２Ｄモデルの基底ベクトルのセットを表し、ベクトルｃは２Ｄモデル係数（即ち、２Ｄ記述子２１０）のセットを表し、ベクトルｍは平均２Ｄ形状ベクトルを表す。同様に、３Ｄ形状モデル２０５については、ベクトルｉは同じくランドマーク画像１１５を表し、Ｂは３Ｄモデル基底ベクトルのセットを表し、ベクトルｃは３Ｄモデル係数（即ち、３Ｄ記述子２１５）のセットを表し、ベクトルｍは平均３Ｄ形状ベクトルを表す。 Referring to FIG. 3, a block diagram for one embodiment of a face feature vector construction operation 300 is shown. As described above with respect to FIGS. 1 and 2, the input image 105 is provided to the face detector 110 that generates the landmark image 115. In the illustrated embodiment, the landmark image 115 can be provided directly to the 2D and 3D shape models 200 and 205. Assuming that these models can be characterized by Equation 1, for the 2D shape model 200, the vector i represents the landmark image 115, B represents the set of basis vectors of the 2D model, and the vector c represents the 2D model. Represents a set of coefficients (ie, 2D descriptor 210), and vector m represents an average 2D shape vector. Similarly, for 3D shape model 205, vector i also represents landmark image 115, B represents a set of 3D model basis vectors, and vector c represents a set of 3D model coefficients (ie, 3D descriptor 215). , Vector m represents an average 3D shape vector.

次に、正規化された画像３１０を生成するために、ランドマーク画像１１５に正規化演算３０５を施すことができる。当業者であれば、正規化演算３０５は、画像のランドマーク特徴（例えば、眉、目、鼻、口及び顎）を所与のサイズのフレーム内における特定の位置に現れるように調整することができる処理を指すことが分かるであろう。 Next, a normalization operation 305 can be performed on the landmark image 115 to generate a normalized image 310. One of ordinary skill in the art can adjust the image's landmark features (eg, eyebrows, eyes, nose, mouth, and chin) to appear at specific locations within a frame of a given size. It will be understood that this refers to a process that can be performed.

正規化されると、画像３１０は、グローバル・テクスチャ記述子２３０を生成するために、グローバル・テクスチャ・モデル２２０に与えることができる。数式１がグローバル・テクスチャ・モデル２２０を特徴づけるものである場合には、ベクトルｉは正規化された画像３１０を表し、Ｂはテクスチャ・モデルの基底ベクトルのセットを表し、ベクトルｃはテクスチャ・モデル係数（即ち、グローバル・テクスチャ記述子２３０）のセットを表し、ベクトルｍは平均テクスチャ・ベクトルを表す。 Once normalized, the image 310 can be provided to the global texture model 220 to generate a global texture descriptor 230. If Equation 1 characterizes the global texture model 220, vector i represents the normalized image 310, B represents the set of texture model basis vectors, and vector c represents the texture model. Represents a set of coefficients (ie, global texture descriptor 230), and vector m represents the average texture vector.

２Ｄモデル２００、３Ｄモデル３００、及びグローバル・テクスチャ・モデル２２０についての基底ベクトル（Ｂ）及び平均ベクトル（ベクトルｍ）をオフラインで決定し、それらを実行時の使用のために格納した後で、数式１をベクトルｃについて解くことによって、（２Ｄ記述子２１０、３Ｄ記述子２１５、及びグローバル・テクスチャ記述子２３０を表す）モデル係数を決定することができる。Ｂは必ずしも正方行列ではないため、ベクトルｃを決定するための数式１の単純な代数的解法を利用できない場合がある。従って、ベクトルｃは、多くの最適化手続きのうちのいずれか１つによって実行時に決定することができる。こうした手続きの１つは、以下の関係、即ち、

を評価することである。 After the base vectors (B) and mean vectors (vector m) for the 2D model 200, 3D model 300, and global texture model 220 are determined offline and stored for runtime use, the formula By solving 1 for vector c, the model coefficients (representing 2D descriptor 210, 3D descriptor 215, and global texture descriptor 230) can be determined. Since B is not necessarily a square matrix, the simple algebraic solution of Equation 1 for determining the vector c may not be available. Thus, vector c can be determined at runtime by any one of a number of optimization procedures. One such procedure has the following relationship:

Is to evaluate.

例として、ランドマーク画像１１５及び正規化された画像３０５の各々が、（１２８×１２８）の要素配列によって表される場合には、ベクトルｉは（１６，３８４×１）のベクトルになることが、数式１からわかる。さらに、‘ｎ１’がＢにおける基底ベクトルの数を表す場合には、Ｂは（１６，３８４×ｎ１）の行列であり、ベクトルｍは（１６，３８４×１）のベクトルである。この例においては、２Ｄ記述子２１０、３Ｄ記述子２１５、及びグローバル・テクスチャ記述子２３０は、（ｎ１×１）のベクトルである。１つの実施形態においては、３Ｄモデル係数は、「３ＤＯｂｊｅｃｔＲｅｃｏｇｎｉｔｉｏｎ」という名称の同時係属中の米国特許出願番号第１３／２９９，２１１号において説明されている技術を用いて得ることができる。 As an example, if each of the landmark image 115 and the normalized image 305 is represented by a (128 × 128) element array, the vector i may be a (16,384 × 1) vector. It can be seen from Equation 1. Further, when ‘n1’ represents the number of basis vectors in B, B is a matrix of (16,384 × n1), and vector m is a vector of (16,384 × 1). In this example, the 2D descriptor 210, the 3D descriptor 215, and the global texture descriptor 230 are (n1 × 1) vectors. In one embodiment, the 3D model coefficients can be obtained using the technique described in co-pending US patent application Ser. No. 13 / 299,211 entitled “3D Object Recognition”.

再び図３を参照すると、正規化された画像３１０は、ローカル・テクスチャ・モデル２２５にも与えることができる。図示されるように、ローカル・テクスチャ・モデル２２５自体は、ローカル画像記述子３１５、密な画像記述子３２０、及び密な歪み画像記述子３２５を含むことができる。 Referring again to FIG. 3, the normalized image 310 can also be provided to the local texture model 225. As shown, the local texture model 225 itself can include a local image descriptor 315, a dense image descriptor 320, and a dense distorted image descriptor 325.

図４ａを参照すると、１つの実施形態においては、ローカル画像記述子３１５は、ランドマーク特徴のうちの１つ又はそれ以上の周囲の小領域又はタイル４００のテクスチャに基づくものとすることができる（図４ａにおいては例示的な領域の１つのみが挙げられている）。タイルの正確な数は、画像の解像度及び設計者の目的／制約によって決まるが、（１２８×１２８）の正規化された画像の場合には、１０から２０のタイルで十分であることが分かっている。各々のタイルの大きさは、トレーニング・データに基づいて決定することができ、一定数のスケールで変わる可能性があり、その場合における各々の点は、異なる大きさの複数のタイルをもつことがある。実際に用いられる設定は、（確立された設計制約の範囲内で）最良の認識性能をもたらすものに基づくことができることが理解されるであろう。例として、ローカル画像記述子は、勾配ヒストグラム（ＨｏＧ）、スピードアップ・ロバスト特徴（ＳＵＲＦ）、スケール不変な特徴変換（ＳＩＦＴ）、バイナリ・ロバストＢｉｎａｒｙＲｏｂｕｓｔＩｎｄｅｐｅｎｄｅｎｔＥｌｅｍｅｎｔａｒｙＦｅａｔｕｒｅｓ（ＢＲＩＥＦ）、及びＯｒｉｅｎｔｅｄＢＲＩＥＦ（ＯＲＢ）などのベクトル勾配演算子、又は同様のタイプの記述子に従って、生成することができる。図４ｂを参照すると、１つの実施形態においては、密な画像記述子３２０は、画像全体に基づいて画像記述子を決定する。例えば、画像３０５を覆う複数の領域（例えば、５×４のグリッドのような２０個の領域４０５）の各々に、選択された記述子算（例えば、ＨｏＧ又はＳＩＦＴ）を適用することができる。ローカル画像記述子３１５の結果は、ｊ要素記述子である。密な画像記述子の結果は、ｋ要素記述子である。 Referring to FIG. 4a, in one embodiment, the local image descriptor 315 may be based on the texture of one or more surrounding small regions or tiles 400 of landmark features ( Only one exemplary region is listed in FIG. 4a). The exact number of tiles depends on the resolution of the image and the designer's objectives / constraints, but for a (128 × 128) normalized image, 10-20 tiles have been found to be sufficient. Yes. The size of each tile can be determined based on training data and can vary on a fixed number of scales, where each point can have multiple tiles of different sizes. is there. It will be appreciated that the settings used in practice can be based on those that provide the best recognition performance (within established design constraints). As examples, local image descriptors include gradient histograms (HoG), speed-up robust features (SURF), scale-invariant feature transformations (SIFT), binary robust binary robust independent features (BREF), and oriented BRIEF (ORB). ) Or similar type descriptors. Referring to FIG. 4b, in one embodiment, the dense image descriptor 320 determines the image descriptor based on the entire image. For example, a selected descriptor operation (eg, HoG or SIFT) can be applied to each of a plurality of regions (eg, 20 regions 405 such as a 5 × 4 grid) covering the image 305. The result of the local image descriptor 315 is a j element descriptor. The result of the dense image descriptor is a k-element descriptor.

ローカル画像記述子３１５と密な画像記述子３２０の両方が、勾配ベクトル記述子を用いるものとして説明されたが、これは必須ではない。例えば、強度に基づく記述子及び画像テクスチャ・ベースなどの他の記述子を用いることもできる。さらに、ローカル画像記述子３１５が１つの手法（例えば強度）を用い、密な画像記述子３２０が別の手法（例えば勾配ベクトル）を用いてもよい。 Although both local image descriptor 315 and dense image descriptor 320 have been described as using gradient vector descriptors, this is not required. For example, other descriptors such as intensity-based descriptors and image texture bases may be used. Further, the local image descriptor 315 may use one technique (eg intensity) and the dense image descriptor 320 may use another technique (eg gradient vector).

正規化された画像３１０を直接的に用いることに加えて、ローカル・テクスチャ・モデル２２５は、画像３１０の歪みバージョンを用いてもよい。再び図３を参照すると、正規化された画像３１０を歪み演算子又は変形演算子３３０に適用して、歪み画像又は変形画像３３５を生成することができる。１つの実施形態においては、歪み演算子３３０は、歪み画像３３５が対象者の顔の真正面像に近づくように、面外回転によって顔を調整する。図５を参照すると、密な画像記述子３２０を生成することについて説明された演算と同様に、密な画像記述子３２０に関して上述されたように歪み画像３３５の全体を領域単位で（例えば５００）評価することができる。１つの実施形態においては、密な歪み画像記述子は、ｌ要素記述子である。演算は同様であるが、密な歪み画像記述子３２５は、密な画像記述子３２０を生成するのに用いられるものと同じ技術又は同じ数の領域／タイルを用いる必要はない。 In addition to using the normalized image 310 directly, the local texture model 225 may use a distorted version of the image 310. Referring again to FIG. 3, the normalized image 310 can be applied to the distortion operator or deformation operator 330 to generate a distortion image or deformation image 335. In one embodiment, the distortion operator 330 adjusts the face by out-of-plane rotation so that the distortion image 335 approximates a true front image of the subject's face. Referring to FIG. 5, similar to the operations described for generating dense image descriptor 320, the entire distorted image 335 is region-wise (eg, 500) as described above for dense image descriptor 320. Can be evaluated. In one embodiment, the dense distortion image descriptor is an l-element descriptor. The operation is similar, but the dense distortion image descriptor 325 need not use the same technique or the same number of regions / tiles used to generate the dense image descriptor 320.

再び図３を参照すると、組み合わせ演算３４０は、生成されたローカル画像記述子、密な画像記述子、及び密な歪み画像記述子のいずれか２つ、いずれか３つ、又はいずれかの組み合わせを組み合わせて、中間のローカル・テクスチャ記述子３４５を生成することができる。組み合わせ演算３４０は、各々の記述子の全体、若しくは各々の記述子の一部のみを扱うことができ、又は、１つの記述子の全体と別の記述子の一部のみとを扱うことが〔００２２〕を参照のこと）、組み合わせ演算３４０は、ローカル画像記述子（ｊ要素）、密な画像記述子（ｋ要素）、及び密な歪み画像記述子（ｌ要素）の各々の連結とすることができる。このような実施形態においては、中間のローカル・テクスチャ記述子３４５は、（ｊ＋ｋ＋ｌ）である。１つの実装形態においては、

である。 Referring back to FIG. 3, the combination operation 340 may perform any two, any three, or any combination of the generated local image descriptor, dense image descriptor, and dense distortion image descriptor. In combination, an intermediate local texture descriptor 345 can be generated. The combinatorial operation 340 can handle the whole of each descriptor, or only a part of each descriptor, or can handle the whole of one descriptor and only part of another descriptor. The combination operation 340 is a concatenation of each of the local image descriptor (j element), the dense image descriptor (k element), and the dense distortion image descriptor (l element). Can do. In such an embodiment, the intermediate local texture descriptor 345 is (j + k + l). In one implementation,

It is.

この記述子の大きさを実時間でより演算し易い値に減らすために、次元縮小演算３５０を行って、ローカル・テクスチャ記述子２３５を生成することができる。代替的に、次元縮小は、３４０において個々の構成要素（３１５、３２０、３２５）を組み合わせる前に、これらの構成要素に対して行うことができる。次元縮小は、以下のように、すなわち、

として表現することができる変換と見なすことができ、ここで、

（これ以降ベクトルｙと記載する）は、ローカル・テクスチャ記述子２３５を表し、Ｍは、所望の変換を行う基底ベクトル（通常、直交する）のセットを表し、

（これ以降ベクトルｘと記載する）は、中間のローカル・テクスチャ記述子３４５を表す。顔の大規模な集合についてベクトルｙの分布を知っている場合には、実質的に同じ情報を表す、より少ない数の要素（次元）を識別及び保持することができる。変換行列Ｍは、既知の多数の最適化技術（例えば、メトリック学習、特徴選択、又は主成分分析）のいずれかを用いて、オフラインで決定することができる。決定されると、Ｍは、実行時の使用のために格納することができる。上記で始めた数値例を続けると、中間のローカル・テクスチャ記述子３４５（ベクトルｘ）が３，０００個の要素を有し、Ｍがこの次元をｎ２次元まで縮小する場合には、ベクトルｙは（ｎ２×１）のベクトルであり、Ｍは（ｎ２×３，０００）の行列であり、ベクトルｘは（３，０００×１）のベクトルである。 In order to reduce the size of this descriptor to a value that is easier to compute in real time, a dimension reduction operation 350 can be performed to generate a local texture descriptor 235. Alternatively, dimension reduction can be performed on these components prior to combining the individual components (315, 320, 325) at 340. The dimension reduction is as follows:

Can be viewed as a transformation that can be expressed as

(Hereinafter referred to as vector y) represents the local texture descriptor 235, M represents the set of basis vectors (usually orthogonal) that perform the desired transformation,

(Hereinafter referred to as vector x) represents an intermediate local texture descriptor 345. If the distribution of the vector y is known for a large set of faces, a smaller number of elements (dimensions) representing substantially the same information can be identified and retained. The transformation matrix M can be determined offline using any of a number of known optimization techniques (eg, metric learning, feature selection, or principal component analysis). Once determined, M can be stored for runtime use. Continuing with the numerical example started above, if the intermediate local texture descriptor 345 (vector x) has 3,000 elements and M reduces this dimension to n2 dimensions, then the vector y is (N2 × 1) vector, M is a (n2 × 3,000) matrix, and vector x is a (3,000 × 1) vector.

再度図３を参照すると、記述子２１０、２１５、２３０、及び２３５の各々が決定された後で、演算子１４０によってこれらの記述子を組み合わせて、顔特徴ベクトル１４５を作成することができる。図６に示されるように、顔特徴ベクトルは、２Ｄ形状記述子２１０のためのフィールド（６００）と、３Ｄ形状記述子２１５のためのフィールド（６０５）と、グローバル・テクスチャ記述子２３０のためのフィールド（６１０）と、ローカル・テクスチャ記述子２３５のためのフィールド（６１５）とを含むことができる。 Referring again to FIG. 3, after each of the descriptors 210, 215, 230, and 235 is determined, these descriptors can be combined by the operator 140 to create a face feature vector 145. As shown in FIG. 6, the face feature vector includes a field for the 2D shape descriptor 210 (600), a field for the 3D shape descriptor 215 (605), and a global texture descriptor 230. A field (610) and a field (615) for the local texture descriptor 235 may be included.

再び、上記で始めた数値例を参照すると、２Ｄモデル２００、３Ｄモデル２０５、及びグローバル・テクスチャ・モデル２２０が、数式１によって与えられる形式の線形モデルであり、モデル入力画像が（１２８×１２８）の要素から成り、２Ｄモデル、３Ｄモデル、及びグローバル・テクスチャ・モデルの各々にｎ１の基底ベクトルが存在する場合には、例示的なモデル・パラメータは、表１に示されるようなものになる。

さらに、ローカル画像記述子３１５、密な画像記述子３２０、及び密な歪み画像記述子３２５の組み合わせが、３，０００要素を有する中間のローカル・テクスチャ記述子３４５を生成し、次元縮小演算３５０が、数式３によって特徴付けられて、次元の数をｎ２次元に縮小する場合には、次元縮小演算３５０についての例示的なモデル・パラメータは、表２に示されるようなものになる。

最後に、組み合わせ演算子１４０が２Ｄ記述子２１０、３Ｄ記述子２１５、グローバル・テクスチャ記述子２３０、及びローカル・テクスチャ記述子２３５の各々を連結する場合には、顔特徴ベクトル１４５は、（（３ｎ１＋ｎ２）×１）のベクトルである。 Referring again to the numerical example that began above, the 2D model 200, 3D model 205, and global texture model 220 are linear models of the form given by Equation 1, and the model input image is (128 × 128) If there are n1 basis vectors in each of the 2D model, the 3D model, and the global texture model, exemplary model parameters are as shown in Table 1.

In addition, the combination of local image descriptor 315, dense image descriptor 320, and dense distorted image descriptor 325 generates an intermediate local texture descriptor 345 having 3,000 elements, and dimension reduction operation 350 Characterized by Equation 3, when reducing the number of dimensions to n2 dimensions, exemplary model parameters for the dimension reduction operation 350 would be as shown in Table 2.

Finally, if the combination operator 140 concatenates each of the 2D descriptor 210, 3D descriptor 215, global texture descriptor 230, and local texture descriptor 235, the face feature vector 145 is ((3n1 + n2 ) × 1) vector.

図７を参照すると、１つの実施形態による顔特徴を用いる顔認識演算７００が示される。始めに、未知及び既知の顔／身元についての顔特徴ベクトルを得る（ブロック７０５及び７１０）。次いで、これらのベクトルに類似性評価基準を適用し（ブロック７１５）、評価基準が一致するどうかを判定する検査が行われる（ブロック７２０）。２つの顔特徴ベクトルが十分に類似している場合には（ブロック７２０の「はい」分岐）、未知の顔特徴ベクトルが既知の顔特徴ベクトルに関連する同一の身元を表しているとの判定を行うことができる（ブロック７２５）。２つの顔特徴ベクトルが一致するほど十分に類似していない場合には（ブロック７２０の「いいえ」分岐）、別の既知の顔特徴ベクトルが利用可能であるかどうかを判定する検査がさらに行われる（ブロック７３０）。既知の身元に関連する顔特徴ベクトルがそれ以上存在しない場合には（ブロック７３０の「いいえ」分岐）、未知の顔特徴ベクトル（即ち、ブロック７０５による動作の間に得られたもの）は、未知の顔に対応すると結論付けることができる（ブロック７３５）。既知の身元に関連する顔特徴ベクトルがさらに存在する場合には（ブロック７３０の「はい」分岐）、例えばストレージ１５０から「次の」既知の顔特徴ベクトルを得ることができ（ブロック７４０）、その後、演算７００はブロック７１５で再開する。 Referring to FIG. 7, a face recognition operation 700 using facial features according to one embodiment is shown. Initially, facial feature vectors for unknown and known faces / identities are obtained (blocks 705 and 710). A similarity metric is then applied to these vectors (block 715) and a check is made to determine if the metric matches (block 720). If the two facial feature vectors are sufficiently similar (“Yes” branch of block 720), a determination is made that the unknown facial feature vector represents the same identity associated with the known facial feature vector. (Block 725). If the two facial feature vectors are not similar enough to match (the “No” branch of block 720), a further check is made to determine if another known facial feature vector is available. (Block 730). If there are no more face feature vectors associated with the known identity (the “No” branch of block 730), the unknown face feature vector (ie, obtained during operation by block 705) is unknown. Can be concluded (block 735). If there are more facial feature vectors associated with known identities (“Yes” branch of block 730), for example, the “next” known facial feature vector can be obtained from storage 150 (block 740), and then The operation 700 resumes at block 715.

１つの実施形態においては、類似性評価基準（ブロック７１５を参照のこと）は、ハミング距離の線に沿った距離評価基準とすることができる。本明細書において説明される顔特徴ベクトルのような大きな次元のベクトルの場合には、数式４で記述されるマハラノビス距離測度が有効な類似性測度となることが分かった。

ここで、ベクトルｘは第１の顔特徴ベクトル（例えば未知の顔に関連する顔特徴ベクトル）を表し、ベクトルｙは第２の顔特徴ベクトル（例えば既知の顔に関連する顔特徴ベクトル）を表し、Ｓ（）は類似性又は比較演算を表し、Ｗは重み行列を表す。本質的に、重み行列Ｗは、比較演算の際に、顔特徴ベクトルにおける各々の要素がどの程度重要又は有意であるかを特定する。既知の身元に関連する大量の顔特徴ベクトルを用いて、評価基準学習技術を適用し、Ｗをオフラインで決定することができる。Ｗが分かると、Ｗは、図７による実行時の使用のために格納することができる。例として、顔特徴ベクトルが５００個の要素を有する場合、即ち（５００×１）のベクトルによって表される場合には、Ｗは、（５００×５００）要素の重み行列となる。 In one embodiment, the similarity metric (see block 715) may be a distance metric along the Hamming distance line. In the case of vectors of large dimensions such as the face feature vectors described herein, it has been found that the Mahalanobis distance measure described by Equation 4 is an effective similarity measure.

Here, the vector x represents a first face feature vector (for example, a face feature vector related to an unknown face), and the vector y represents a second face feature vector (for example, a face feature vector related to a known face). , S () represents similarity or comparison operation, and W represents a weight matrix. In essence, the weight matrix W specifies how important or significant each element in the face feature vector is during the comparison operation. Using a large number of facial feature vectors associated with known identities, an evaluation criterion learning technique can be applied to determine W offline. Once W is known, it can be stored for runtime use according to FIG. As an example, if the face feature vector has 500 elements, i.e., represented by a (500 x 1) vector, W is a (500 x 500) element weight matrix.

図８を参照すると、受信者操作特性（ＲＯＣ）曲線８００は、顔特徴ベクトルを構成する個々の構成要素、即ち（１）標準的な２Ｍ記述子（８１０）、密な勾配記述子（８１５）、ローカル勾配記述子（８２０）、及び密な歪み勾配記述子（８２５）を単独で用いた場合に対する、本明細書に開示される顔特徴ベクトル（８０５）の性能を示す。図から分かるように、本開示による顔特徴ベクトルの使用によって、これらの他の記述子より高い性能が得られている。 Referring to FIG. 8, the receiver operating characteristic (ROC) curve 800 shows the individual components that make up the face feature vector: (1) standard 2M descriptor (810), dense gradient descriptor (815). FIG. 6 illustrates the performance of the face feature vector (805) disclosed herein for a local gradient descriptor (820) and a dense distortion gradient descriptor (825) used alone. As can be seen, the use of facial feature vectors in accordance with the present disclosure provides higher performance than these other descriptors.

ここで図９を参照すると、本発明の一実施形態による例示的な電子装置９００の簡略化された機能ブロック図が示される。電子装置９００は、プロセッサ９０５、ディスプレイ９１０、ユーザ・インターフェース９１５、グラフィックス・ハードウェア９２０、デバイス・センサ９２５（例えば、近接センサ／環境光センサ、加速度計、及び／又はジャイロスコープ）、マイクロフォン９３０、１つ又はそれ以上の音声コーデック９３５、１つ又はそれ以上のスピーカ９４０、通信回路９４５、デジタル画像取り込みユニット９５０、１つ又はそれ以上の映像コーデック９５５、メモリ９６０、ストレージ９６５、及び通信バス９７０を含むことができる。電子デバイス９００は、例えば、携帯情報端末（ＰＤＡ）、携帯音楽プレーヤ、携帯電話、ノートブック型コンピュータ、ラップトップ型コンピュータ、又はタブレット型コンピュータとすることができる。 Referring now to FIG. 9, a simplified functional block diagram of an exemplary electronic device 900 according to one embodiment of the present invention is shown. The electronic device 900 includes a processor 905, a display 910, a user interface 915, graphics hardware 920, a device sensor 925 (eg, proximity sensor / ambient light sensor, accelerometer, and / or gyroscope), microphone 930, One or more audio codecs 935, one or more speakers 940, a communication circuit 945, a digital image capture unit 950, one or more video codecs 955, a memory 960, a storage 965, and a communication bus 970 Can be included. The electronic device 900 can be, for example, a personal digital assistant (PDA), a portable music player, a mobile phone, a notebook computer, a laptop computer, or a tablet computer.

プロセッサ９０５は、装置９００によって行われる多数の関数の演算（例えば、顔特徴ベクトル構築及び実行時顔認識の演算１００又は顔認識演算７００など）を実行又は制御するのに必要な命令を実行することができる。プロセッサ９０５は、例えば、ディスプレイ９１０を駆動し、ユーザ・インターフェース９１５からユーザ入力を受信することができる。ユーザ・インターフェース９１５によって、ユーザはデバイス９００と対話することができるようになる。例えば、ユーザ・インターフェース９１５は、ボタン、キーパッド、ダイアル、クリックホイール、キーボード、表示スクリーン、及び／又はタッチ・スクリーンといった様々な形態をとることができる。プロセッサ９０５はまた、例えば、携帯装置において見られるようなシステム・オン・チップとすることができ、専用グラフィックス処理ユニット（ＧＰＵ）を含むことができる。プロセッサ９０５は、縮小命令セット・コンピュータ（ＲＩＳＣ）若しくは複合命令セット・コンピュータ（ＣＩＳＣ）アーキテクチャ、又は他のいずれかの適切なアーキテクチャに基づくものとすることができ、１つ又はそれ以上の処理コアを含むことができる。グラフィックス・ハードウェア９２０は、グラフィックスを処理するための専用計算ハードウェア、及び／又は、グラフィックス情報を処理する支援プロセッサ９０５とすることができる。１つの実施形態においては、グラフィックス・ハードウェア９２０は、プログラム可能グラフィックス処理ユニット（ＧＰＵ）を含むことができる。 The processor 905 executes instructions necessary to execute or control a number of function operations performed by the apparatus 900 (eg, face feature vector construction and runtime face recognition operation 100 or face recognition operation 700, etc.). Can do. The processor 905 can drive the display 910 and receive user input from the user interface 915, for example. User interface 915 allows a user to interact with device 900. For example, the user interface 915 can take various forms such as buttons, keypads, dials, click wheels, keyboards, display screens, and / or touch screens. The processor 905 can also be a system-on-chip, such as found in a portable device, and can include a dedicated graphics processing unit (GPU). The processor 905 may be based on a reduced instruction set computer (RISC) or compound instruction set computer (CISC) architecture, or any other suitable architecture, and may include one or more processing cores. Can be included. Graphics hardware 920 may be dedicated computing hardware for processing graphics and / or support processor 905 for processing graphics information. In one embodiment, the graphics hardware 920 can include a programmable graphics processing unit (GPU).

センサ及びカメラ回路９５０は、少なくとも一部が１つ又はそれ以上の映像コーデック９５５及び／又はプロセッサ９０５及び／又はグラフィックス・ハードウェア９２０によって及び／又は回路９５０内に組み込まれた専用画像処理ユニットによって処理される場合がある、スチル画像及びビデオ画像を取り込むことができる。そのようにして取り込まれた画像は、メモリ９６０及び／又はストレージ９６５に格納することができる。メモリ９６０は、装置の機能を実施するプロセッサ９０５及びグラフィックス・ハードウェア９２０によって用いられる１つ又はそれ以上の異なるタイプの媒体を含むことができる。例えば、メモリ９６０は、メモリ・キャッシュ、読み出し専用メモリ（ＲＯＭ）、及び／又はランダム・アクセス・メモリ（ＲＡＭ）を含むことができる。ストレージ９６５は、音声、画像、及び映像ファイル、コンピュータ・プログラム命令又はソフトウェア、選択情報、デバイス・プロファイル情報、並びに他のいずれかの適切なデータを保持するための媒体を含む。ストレージ９６５は、例えば、（固定、フロッピー（登録商標）、及び取り外し可能の）磁気ディスク及びテープ、ＣＤ−ＲＯＭ及びデジタル・ビデオ・ディスク（ＤＶＤ）などの光媒体、並びに、電気的プログラム可能読み出し専用メモリ（ＥＰＲＯＭ）及び電気的消去可能プログラム可能読み出し専用メモリ（ＥＥＰＲＯＭ）などの半導体メモリ・デバイスを含む、１つ又はそれ以上の永続的ストレージ媒体を含むことができる。メモリ９６０及びストレージ９６５を用いて、１つ又はそれ以上のモジュールに編成され、いずれかの所望のコンピュータ・プログラミング言語で書かれたコンピュータ・プログラム命令又はコードを、有形に保持することができる。例えばプロセッサ９０５によって実行されたときに、こうしたコンピュータ・プログラム・コードは、本明細書において説明された１つ又はそれ以上の方法を実装することができる。 The sensor and camera circuit 950 may be at least in part by one or more video codecs 955 and / or processors 905 and / or graphics hardware 920 and / or by dedicated image processing units incorporated within the circuit 950. Still images and video images that may be processed can be captured. Images so captured can be stored in memory 960 and / or storage 965. Memory 960 may include one or more different types of media used by processor 905 and graphics hardware 920 to perform the functions of the device. For example, the memory 960 can include a memory cache, read only memory (ROM), and / or random access memory (RAM). Storage 965 includes media for holding audio, image, and video files, computer program instructions or software, selection information, device profile information, and any other suitable data. Storage 965 can be, for example, optical media such as magnetic disks and tapes (fixed, floppy, and removable), CD-ROMs and digital video disks (DVDs), and electrically programmable read-only. One or more permanent storage media may be included, including semiconductor memory devices such as memory (EPROM) and electrically erasable programmable read only memory (EEPROM). Using memory 960 and storage 965, computer program instructions or code organized in one or more modules and written in any desired computer programming language can be tangibly maintained. For example, when executed by the processor 905, such computer program code may implement one or more of the methods described herein.

特許請求の範囲から逸脱することなく、材料、構成要素、回路要素の様々な変更、及び、例示された演算方法の詳細の様々な変更が可能である。例えば、本明細書において説明されたモデルは線形の形態であったが、そのような限定は、開示された技術に固有のものではない。さらに、種々のモデルは異なるものとすることができ、幾つかのモデルを線形とし、他のモデルを非線形とすることができる。それに加えて、組み合わせ演算（例えば１４０及び３４０）は、連結演算に限定されるものではなく、それらが同じものである必要もない。設計者の目的に即したあらゆる組み合わせを用いることができる。例えば、線形組み合わせ、記述子の値のサブセットの選択、及びその加重組み合わせは、全て実現可能である。また、モデル記述子の次元が次元縮小を必要としない場合（例えば、演算３１５、３２０、及び３２５）には、この演算を行う必要はない。 Various modifications of materials, components, circuit elements, and details of the illustrated method of operation are possible without departing from the scope of the claims. For example, although the models described herein were in linear form, such limitations are not specific to the disclosed technique. Further, the various models can be different, some models can be linear and others can be non-linear. In addition, the combination operations (eg, 140 and 340) are not limited to concatenation operations and they need not be the same. Any combination that suits the designer's purpose can be used. For example, linear combinations, selection of a subset of descriptor values, and their weighted combinations are all feasible. Further, when the dimension of the model descriptor does not require dimensional reduction (for example, operations 315, 320, and 325), it is not necessary to perform this operation.

最後に、上記の説明は例示的なものであるように意図されており、限定することを意図ものではないことを理解されたい。例えば、上述の実施形態は、互いに組み合わせて用いることができる。上記の説明を検討すれば、当業者には他の多くの実施形態が明らかであろう。従って、本発明の範囲は、特許請求の範囲を参照することによって、並びにそのような特許請求の範囲が適用される等価物の十分な範囲によって、決定されるべきである。特許請求の範囲においては、「含む（including）」及び「特徴とする（in which）」という用語は、「含む（comprising）」及び「特徴とする（wherein）」というそれぞれの用語の平易な英語と等価な用語として用いられる。 Finally, it should be understood that the above description is intended to be illustrative and not restrictive. For example, the above-described embodiments can be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. Accordingly, the scope of the invention should be determined by reference to the claims, and by the full scope of equivalents to which such claims are applicable. In the claims, the terms "including" and "in which" mean plain English for the terms "comprising" and "wherein", respectively. Is used as an equivalent term.

１００：顔特徴ベクトル生成及び実行時顔認識の演算
１０５：入力画像
１１０：顔検出器
１１５：ランドマーク画像
１２０：形状モデル
１２５：テクスチャ・モデル
１３０：形状記述子
１３５：テクスチャ記述子
１４０：演算
１４５：顔特徴ベクトル
１５０：ストレージ
１５５：アプリケーション
１６０：画像
１６５：比較
２００：２Ｄ形状モデル
２０５：３Ｄ形状モデル
２１０：２Ｄ形状記述子
２１５：３Ｄ形状記述子
２２０：グローバル・テクスチャ・モデル
２２５：ローカル・テクスチャ・モデル
２３０：グローバル・テクスチャ記述子
２３５：ローカル・テクスチャ記述子
３００：顔特徴ベクトル構築演算
３０５：正規化演算
３１０：正規化された画像
３１５：ローカル画像記述子
３２０：密な画像記述子
３２５：密な歪み画像記述子
３３０：歪み演算子
３３５：歪み画像
３４０：組み合わせ演算
３４５：中間のローカル・テクスチャ記述子
３５０：次元縮小演算
４００：タイル
４０５、５００：領域
６００：２Ｄ形状フィールド
６０５：３Ｄ形状フィールド
６１０：グローバル・テクスチャ・フィールド
６１５：ローカル・テクスチャ・フィールド
８００：受信者操作特性曲線
８０５：顔特徴ベクトル
８１０：標準的な２Ｍ記述子
８１５：密な勾配記述子
８２０：ローカル勾配記述子
８２５：歪み勾配記述子
９００：電子装置
９０５：プロセッサ
９１０：ディスプレイ
９１５：ユーザ・インターフェース
９２０：グラフィックス・ハードウェア
９２５：デバイス・センサ
９３０：マイクロフォン
９３５：音声コーデック
９４０：スピーカ
９４５：通信回路
９５０：デジタル画像取り込みユニット（センサ／カメラ回路）
９５５：映像コーデック
９６０：メモリ
９６５：ストレージ
９７０：通信バス 100: face feature vector generation and runtime face recognition calculation 105: input image 110: face detector 115: landmark image 120: shape model 125: texture model 130: shape descriptor 135: texture descriptor 140: calculation 145 : Face feature vector 150: storage 155: application 160: image 165: comparison 200: 2D shape model 205: 3D shape model 210: 2D shape descriptor 215: 3D shape descriptor 220: global texture model 225: local texture Model 230: Global texture descriptor 235: Local texture descriptor 300: Face feature vector construction operation 305: Normalization operation 310: Normalized image 315: Local image descriptor 320: Dense image descriptor 325: Dense distortion image description 330: Distortion operator 335: Distortion image 340: Combination operation 345: Intermediate local texture descriptor 350: Dimension reduction operation 400: Tile 405, 500: Region 600: 2D shape field 605: 3D shape field 610: Global texture Field 615: Local texture field 800: Receiver operating characteristic curve 805: Face feature vector 810: Standard 2M descriptor 815: Dense gradient descriptor 820: Local gradient descriptor 825: Distorted gradient descriptor 900: Electronic device 905: Processor 910: Display 915: User interface 920: Graphics hardware 925: Device sensor 930: Microphone 935: Audio codec 940: Speaker 945: Communication circuit 950: Digital The image capturing unit (sensor / camera circuit)
955: Video codec 960: Memory 965: Storage 970: Communication bus

Claims

Computer code for obtaining landmark detection information for a first face in a first image;
Computer code for generating normalized landmark detection information based at least in part on the landmark detection information;
Computer code for generating a first shape model feature vector based at least in part on the landmark detection information, wherein the landmark detection information is applied to a two-dimensional model of the first face. Including such computer code,
Computer code for generating a second shape model feature vector based at least in part on the landmark detection information, comprising applying the landmark detection information to the three-dimensional model of the first face The computer code and
Computer code for generating a first texture model feature vector based at least in part on the landmark detection information, wherein the first code is generated at least in part on the normalized landmark detection information. Computer code comprising generating a texture model feature vector of
Computer code for generating a second texture model feature vector based at least in part on the landmark detection information , wherein at least in part, the normalized landmark detection information and a specific morphing operation The computer code comprising generating distortion landmark detection information based on
The first face feature is obtained by combining the first shape model feature vector, the second shape model feature vector, the first texture model feature vector, and the second texture model feature vector. Computer code forming a vector;
Computer code for storing the first facial feature vector in a storage device;
A permanent computer readable medium characterized by comprising:

The computer code for generating a first texture model feature vector is:
Identifying a plurality of regions less than all of the normalized landmark detection information in the normalized landmark detection information;
Generating a first texture model feature vector based on the plurality of regions;
The persistent computer readable medium of claim 1 , comprising computer code for:

The computer code for generating a second texture model feature vector comprises computer code for generating a second texture model feature vector based at least in part on the distortion landmark detection information. The persistent computer readable medium of claim 1 , comprising:

The computer code for generating a second texture model feature vector combines the first texture model feature vector and the second texture model feature vector to form a first face feature vector before carrying out the computer code for, characterized by further comprising computer code for reducing the dimensionality of the second texture model feature vector, persistent computer according to claim 1 A readable medium.

The computer code for acquiring landmark detection information includes computer code for acquiring landmark detection information about the first face in the first image by face detection calculation. The persistent computer readable medium of claim 1.

The computer code for combining the first shape model feature vector, the second shape model feature vector, the first texture model feature vector, and the second texture model feature vector A computer for concatenating the first shape model feature vector, the second shape model feature vector, the first texture model feature vector, and the second texture model feature vector; The persistent computer readable medium of claim 1, comprising code.

The computer code for storing the first facial feature vector in a storage device is:
Incorporating the first facial feature vector into the metadata of the first image;
Storing the first image having the first face feature vector in a storage device;
The persistent computer readable medium of claim 1, comprising computer code for:

Extracting the first face feature vector from the storage device;
A second facial feature vector further corresponding to a known person is extracted from the storage device,
Comparing the first face feature vector and the second face feature vector to generate a similarity value;
If the similarity values indicate matching, it is determined that the first face corresponds to the known person;
The persistent computer readable medium of claim 1, further comprising computer code for:

A storage device storing a plurality of images;
A memory that is communicatively coupled to the storage device and that stores the computer code of claim 1;
A programmable control unit, communicatively coupled to the storage device and the memory, and configured to retrieve and execute the computer code stored in the memory;
Including electronic devices.

Computer code for obtaining a landmark image identifying a plurality of aspects of the first face for the first face in the first image;
Computer code for generating a normalized landmark image based at least in part on the landmark image;
Computer code for generating a distorted landmark image based at least in part on the normalized landmark image;
Computer code for generating a first shape model feature vector based at least in part on the landmark image, the computer code comprising applying the landmark image to a two-dimensional model of the first face Computer code,
Computer code for generating a second shape model feature vector based at least in part on the landmark image, the computer code comprising applying the landmark image to a three-dimensional model of the first face Computer code,
Computer code for generating a first texture model feature vector based at least in part on the normalized landmark image, the plurality of regions in the normalized landmark image The computer code including using gradient vector operations;
Computer code for generating a second texture model feature vector based at least in part on the distorted landmark image , wherein the first and second are based at least in part on the normalized landmark image. Generating the second descriptor and generating a third descriptor based at least in part on the distorted landmark image;
The first face feature is obtained by combining the first shape model feature vector, the second shape model feature vector, the first texture model feature vector, and the second texture model feature vector. Computer code forming a vector;
Computer code for storing the first facial feature vector in a storage device;
A permanent computer readable medium characterized by comprising:

The persistent computer readable medium of claim 10 , wherein the plurality of regions are less than all of the normalized landmark images .

The computer code for generating a second texture model feature vector combines the first, second, and third descriptors to form the second texture model feature vector The persistent computer readable medium of claim 10 , further comprising computer code.

The computer code for combining the first, second, and third descriptors comprises computer code for reducing the dimensions of the combined first, second, and third descriptors. The persistent computer readable medium of claim 12 , further comprising:

A storage device storing a plurality of images;
A memory operatively coupled to the storage device and having stored the computer code of claim 10 ;
A programmable controller that is communicatively coupled to the storage device and the memory and configured to execute the computer code stored in the memory;
A computer system including: