JP2015185042A

JP2015185042A - Information processing device, authentication device and methods thereof

Info

Publication number: JP2015185042A
Application number: JP2014062728A
Authority: JP
Inventors: 大介中嶋; Daisuke Nakajima
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-03-25
Filing date: 2014-03-25
Publication date: 2015-10-22
Anticipated expiration: 2034-03-25
Also published as: JP6312485B2

Abstract

PROBLEM TO BE SOLVED: To obtain a feature amount which is low dimensional and suitable to pattern recognition by learning a conversion parameter for converting a bit string into a scalar value when converting a bit string composed of a plurality of elements into a scalar value to generate a feature amount.SOLUTION: An arithmetic processing part 105 respectively uses a plurality of pieces of data near target data within a processing object area 108 in input data 101 to perform arithmetic processing. A binarization processing part 106 performs binarization processing of an arithmetic processing result 102 corresponding to each of the plurality of pieces of data. A feature data generation part 107 uses a conversion parameter to generate feature data 104 to the target data from a binarization processing result 103 corresponding to each of the plurality of pieces of data. A learning part 109 learns the conversion parameter such that distance between feature data generated from input data of the same class is small and that distance between feature data generated from input data of different classes is large.

Description

本発明は、パターン識別に好適な特徴量を抽出する情報処理に関する。 The present invention relates to information processing for extracting feature values suitable for pattern identification.

顔認証などのパターン識別に有効な特徴量として、非特許文献1に記載されたlocal binary pattern（LBP、局所二値パターン）が提案されている。図1によりLBPの抽出処理の概要を説明する。 As a feature quantity effective for pattern identification such as face authentication, local binary pattern (LBP, local binary pattern) described in Non-Patent Document 1 has been proposed. An outline of LBP extraction processing will be described with reference to FIG.

座標(x, y)のLBPは、3×3画素領域における中心画素(x, y)を注目画素とし、注目画素に隣接する八つの参照画素(x＋x_n, y＋y_n)を用いて、式(1)によって計算される。
t_n(x, y) = i(x+x_n, y+y_n) - i(x, y)；
if (t_n(x, y) ≧ 0)
s(t_n(x, y)) = ‘1’；
else
s(t_n(x, y)) = ‘0’；
LBP(x, y) = Σ_n=0 ⁷{s(t_n(x, y))・2ⁿ}； …(1)
ここで、i(x, y)は注目画素の画素値、
i(x+x_n, y+y_n)はn番目の参照画素の画素値、
x_n={-1, 0, 1}、y_n={-1, 0, 1}、x_n ²+y_n ²≠0。 The LBP of the coordinates (x, y) is _expressed by an equation (8) using the center pixel (x, y) in the 3 × 3 pixel region as the target pixel and eight reference pixels (x + x _n , y + y _n ) adjacent to the target pixel. Calculated by 1).
t _n (x, y) = i (x + x _n , y + y _n )-i (x, y);
if (t _n (x, y) ≥ 0)
s (t _n (x, y)) = '1';
else
s (t _n (x, y)) = '0';
LBP (x, y) = Σ _{n = 0} ⁷ {s (t _n (x, y)) · 2 ⁿ };… (1)
Where i (x, y) is the pixel value of the target pixel,
i (x + x _n , y + y _n ) is the pixel value of the nth reference pixel,
x _n = {-1, 0, 1}, y _n = {-1, 0, 1}, x _n ² + y _n ² ≠ 0.

つまり、LBPは、注目画素の値と各参照画素の値の差分から得られるビット列の各要素に2ⁿを乗算したスカラ値として得られる8ビットの特徴量である。 That is, LBP is an 8-bit feature quantity obtained as a scalar value obtained by multiplying each element of the bit string obtained from the difference between the value of the target pixel and each reference pixel by 2 ⁿ .

図1に示す例は、注目画素の左隣の画素を(x₀, y₀)とし、注目画素を中心に反時計回りに(x₁, y₁)、(x₂, y₂)、…、(x₇, y₇)とした例である。つまり、図1に示す例では、破線矢印の順にビットを並べてビット列‘00111010’が得られ、LBPの値は「58」である。 In the example shown in FIG. 1, the pixel adjacent to the left of the target pixel is (x ₀ , y ₀ ), and (x ₁ , y ₁ ), (x ₂ , y ₂ ), , (X ₇ , y ₇ ). In other words, in the example shown in FIG. 1, the bit string “00111010” is obtained by arranging the bits in the order of the dashed arrows, and the value of the LBP is “58”.

また、特許文献1は、LBPにおけるエンコード前のビット列を特徴量とする手法を提案する。非特許文献1におけるLBPが8ビット、一次元の特徴量（図1の例では58）に対して、特許文献1は、特徴量として1ビット、八次元の特徴量（図1の例では‘00111010’）を生成する。 Patent Document 1 proposes a technique in which a bit string before encoding in LBP is used as a feature amount. Non-Patent Document 1 has an LBP of 8 bits and a one-dimensional feature value (58 in the example of FIG. 1), whereas Patent Document 1 has a 1-bit feature value and an eight-dimensional feature value (in the example of FIG. 00111010 ′).

また、特許文献2は、ビット列の変換方法を学習によって決定する手法を提案する。この手法は、画素パターンが似た3×3画素領域から得られるビット列同士のユークリッド距離を小さくする変換方法を学習によって決定する。具体的には、3×3画素領域の画素パターンの類似性を表現するP次元（P＜8）の空間にビット列を射影する変換式を学習する。この手法によれば、元のデータの類似性を表現する、比較的低次元の特徴量を生成することができる。 Patent Document 2 proposes a method for determining a bit string conversion method by learning. In this method, a conversion method for reducing the Euclidean distance between bit strings obtained from 3 × 3 pixel regions having similar pixel patterns is determined by learning. Specifically, a conversion formula for projecting a bit string to a P-dimensional (P <8) space expressing the similarity of pixel patterns in a 3 × 3 pixel region is learned. According to this method, it is possible to generate a relatively low-dimensional feature amount that represents the similarity of original data.

LBPは、パターン識別に有効な特徴量として広く利用されている。LBPは、元来、LBPヒストグラムとしての使用を前提に設計されたため、LBPの値は単なるインデックスであり、値の大小関係に意味はない。パターン識別の中でも顔認証のように、詳細なテクスチャ情報が識別に有効な場合、ヒストグラム化前のデータをそのまま特徴量として使用する方が好ましい。しかし、LBPの値は大きさ自体に意味がないため、LBPの間の距離によって元のデータの間の類似度を適切に表現することができない。 LBP is widely used as an effective feature amount for pattern identification. Since LBP was originally designed on the assumption that it is used as an LBP histogram, the value of LBP is merely an index, and the magnitude relationship between the values is meaningless. When detailed texture information is effective for identification, such as face authentication, in pattern identification, it is preferable to use data before histogram formation as it is as a feature quantity. However, since the size of the LBP has no meaning in itself, the similarity between the original data cannot be appropriately expressed by the distance between the LBPs.

一方、特許文献1の手法によれば、各ビットそれぞれについて距離を計算することにより、元のデータの間の類似度を表現することが可能である。しかし、特徴量の次元が元のデータの八倍になるため、後段の処理における計算時間が増加する。 On the other hand, according to the method of Patent Document 1, it is possible to express the similarity between the original data by calculating the distance for each bit. However, since the dimension of the feature amount is eight times that of the original data, the calculation time in the subsequent processing increases.

また、特許文献2の手法によれば、比較的低次元の特徴量により元のデータの間の類似度を表現することが可能である。しかし、この手法は、元のデータの類似性を表現可能な変換方法を学習するものであって、対象を識別するのに適した変換方法を学習するわけではない。そのため、学習によって得られる変換方法は、必ずしも、パターン識別に適しているとは限らない。 Further, according to the method of Patent Document 2, it is possible to express the similarity between original data with a relatively low-dimensional feature amount. However, this method learns a conversion method capable of expressing the similarity of original data, and does not learn a conversion method suitable for identifying an object. Therefore, the conversion method obtained by learning is not necessarily suitable for pattern identification.

特開2009-86926号公報JP 2009-86926 特開2011-08631号公報JP 2011-08631 A

T. Ojala、M. Pietikainen、D. Harwood「A Comparative Study of Texture Measures with Classification Based on Featured Distributions」Pattern Recognition、Vol. 29、pp. 51-59、1996年T. Ojala, M. Pietikainen, D. Harwood "A Comparative Study of Texture Measures with Classification Based on Featured Distributions" Pattern Recognition, Vol. 29, pp. 51-59, 1996 S. Chopra、R. Hadsell、Y. LeCun「Learning a similarity metric discriminatively, with application to face verification」Proc. IEEE Conference on Computer Vision and Pattern Recognition、pp. 539-546、2005年S. Chopra, R. Hadsell, Y. LeCun “Learning a similarity metric discriminatively, with application to face verification” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 539-546, 2005

本発明は、複数の要素から構成されるビット列をスカラ値に変換して特徴量を生成する際に、低次元かつパターン識別の対象に適した特徴量を得ることを目的とする。 An object of the present invention is to obtain a feature quantity suitable for a low-dimensional pattern identification target when generating a feature quantity by converting a bit string composed of a plurality of elements into a scalar value.

本発明は、前記の目的を達成する一手段として、以下の構成を備える。 The present invention has the following configuration as one means for achieving the above object.

本発明にかかる情報処理は、入力データにおける処理対象領域内の注目データの近傍の複数データをそれぞれ用いて演算処理を行い、前記複数データそれぞれに対応する演算処理結果を二値化処理し、変換パラメータを用いて、前記複数データそれぞれに対応する二値化処理結果から前記注目データに対する特徴データを生成し、同じクラスの入力データから生成される特徴データの間の距離が小さく、異なるクラスの入力データから生成される特徴データの間の距離が大きくなるように、前記変換パラメータを学習する。 The information processing according to the present invention performs arithmetic processing using each of a plurality of data near the target data in the processing target area in the input data, binarizes the arithmetic processing result corresponding to each of the plurality of data, and converts Using the parameters, feature data for the data of interest is generated from the binarization processing result corresponding to each of the plurality of data, and the distance between the feature data generated from the input data of the same class is small, and the input of different classes The conversion parameter is learned so that the distance between the feature data generated from the data is increased.

本発明によれば、ビット列をスカラ値に変換する変換パラメータの学習により、複数の要素から構成されるビット列をスカラ値に変換して特徴量を生成する際に、低次元かつパターン識別の対象に適した特徴量を得ることができる。 According to the present invention, when learning a conversion parameter for converting a bit string into a scalar value, a bit string composed of a plurality of elements is converted into a scalar value to generate a feature quantity, the object of low-dimensional pattern identification is obtained. A suitable feature amount can be obtained.

LBPの抽出処理の概要を説明する図。The figure explaining the outline | summary of the extraction process of LBP. 実施例1における信号処理部の構成例を説明するブロック図。FIG. 2 is a block diagram illustrating a configuration example of a signal processing unit according to the first embodiment. 入力画像データと処理対象領域の関係を説明する図。The figure explaining the relationship between input image data and a process target area. 注目画素と参照画素の位置関係の一例を示す図。The figure which shows an example of the positional relationship of an attention pixel and a reference pixel. 学習部の構成例を説明するブロック図。The block diagram explaining the structural example of a learning part. 実施例の情報処理装置の構成例を示すブロック図。The block diagram which shows the structural example of the information processing apparatus of an Example. 実施例の顔認証処理を説明するフローチャート。The flowchart explaining the face authentication process of an Example. 処理対象領域、参照画素の相対位置、重み係数群に関する情報のメモリ格納形式の一例を示す図。The figure which shows an example of the memory storage format of the information regarding a process target area | region, the relative position of a reference pixel, and a weighting coefficient group. 学習処理を説明するフローチャート。The flowchart explaining a learning process. 識別処理を説明するフローチャート。The flowchart explaining an identification process. 顔認証結果の出力例を示す図。The figure which shows the example of an output of a face authentication result. 登録処理を説明するフローチャート。The flowchart explaining a registration process. 実施例2における信号処理部を説明するブロック図。FIG. 6 is a block diagram illustrating a signal processing unit according to the second embodiment. 二分木処理を説明する図。The figure explaining a binary tree process. 複数の処理対象領域の設定例を説明する図。The figure explaining the example of a setting of a some process target area | region. 二つの参照画素を比較する例を示す図。The figure which shows the example which compares two reference pixels.

以下、本発明にかかる実施例の情報処理を図面を参照して詳細に説明する。 Hereinafter, information processing according to an embodiment of the present invention will be described in detail with reference to the drawings.

以下では、実施例の信号処理とその学習方法について説明し、実施例の信号処理および学習方法をパターン識別に適用する情報処理装置およびその方法を説明する。実施例の信号処理は、入力データ群からパターン識別に使用する特徴量を抽出するために用いられる。入力データ群は、複数の要素からなるデータの集合であり、例えば画像データである。 In the following, the signal processing and learning method of the embodiment will be described, and an information processing apparatus and method for applying the signal processing and learning method of the embodiment to pattern identification will be described. The signal processing of the embodiment is used to extract a feature amount used for pattern identification from an input data group. The input data group is a set of data composed of a plurality of elements, for example, image data.

また、以下では、顔画像から顔認証に適した特徴量を抽出するために実施例の信号処理を使用する例を挙げる。顔認証においては、入力された顔画像から特徴量を抽出し、抽出した特徴量と予め作成され登録された特徴量を比較することで個人を特定する。なお、実施例においては、本発明を顔認証における特徴抽出に適用する例を説明するが、本発明は他のパターン識別における特徴抽出にも適用可能である。 Hereinafter, an example in which the signal processing of the embodiment is used to extract a feature amount suitable for face authentication from a face image will be described. In face authentication, a feature quantity is extracted from an input face image, and an individual is specified by comparing the extracted feature quantity with a previously created and registered feature quantity. In the embodiment, an example in which the present invention is applied to feature extraction in face authentication will be described, but the present invention can also be applied to feature extraction in other pattern identification.

［信号処理部］
図2のブロック図により実施例1における信号処理部の構成例を説明する。 [Signal processing section]
A configuration example of the signal processing unit in the first embodiment will be described with reference to the block diagram of FIG.

入力画像データ101には、予め処理対象領域108が設定されている。図3により入力画像データ101と処理対象領域108の関係を説明する。目、鼻、口といった顔の特徴をよく表す部位を含む領域が処理対象領域108として設定される。 A processing target area 108 is set in advance in the input image data 101. The relationship between the input image data 101 and the processing target area 108 will be described with reference to FIG. A region including a part that well represents facial features such as eyes, nose, and mouth is set as the processing target region 108.

演算処理部105a-105cは、処理対象領域108の画像データに局所演算処理を施し、二次元データの局所演算処理結果102a-102cを生成する。局所演算処理は、予め定められた近傍領域に存在する複数データを用いる演算処理である。演算処理部105a-105cは、下式に示す、注目画素と予め指定された参照画素の間の画素値の差分r_nを局所演算処理結果として計算する。
r_n(x, y) = i(x+x_n, y+y_n) - i(x, y) …(2)
ここで、r_n(x, y)はn番目の参照画素に対応する局所演算処理結果の画素値、
i(x, y)は注目画素の画素値、
i(x+x_n, y+y_n)はn番目の参照画素の画素値、
(x_n, y_n)はn番目の参照画素の注目画素に対する相対位置。 The arithmetic processing units 105a to 105c perform local arithmetic processing on the image data of the processing target area 108, and generate local arithmetic processing results 102a to 102c of two-dimensional data. The local calculation process is a calculation process using a plurality of data existing in a predetermined neighborhood area. Processing unit 105a-105c are shown in the following equation, it calculates a difference r _n of the pixel values between the pre-specified reference pixel and the pixel of interest as a local operation result.
r _n (x, y) = i (x + x _n , y + y _n )-i (x, y)… (2)
Here, r _n (x, y) is the pixel value of the local calculation processing result corresponding to the nth reference pixel,
i (x, y) is the pixel value of the target pixel,
i (x + x _n , y + y _n ) is the pixel value of the nth reference pixel,
(x _n , y _n ) is the relative position of the nth reference pixel with respect to the target pixel.

図4により注目画素と参照画素の位置関係の一例を示す。相対位置(x_n, y_n)は、例えば、図4(a)(b)(c)の順に、それぞれ(x₀, y₀)=(0, -1)、(x₁, y₁)=(-1, 0)、(x₂, y₂)=(1, 1)である。 FIG. 4 shows an example of the positional relationship between the target pixel and the reference pixel. The relative positions (x _n , y _n ) are, for example, in the order of FIGS. 4 (a) (b) (c), (x ₀ , y ₀ ) = (0, −1), (x ₁ , y ₁ ) = (-1, 0), (x ₂ , y ₂ ) = (1, 1).

二値化処理部106a-106cは、局所演算処理結果102a-102cに二値化処理を施し、二次元データの二値化処理結果103a-103cを生成する。二値化処理部106a-106cは、例えば、二値化処理に下式のステップ関数処理を行う。
if (r_n(x, y) ≧ 0)
b_n(x, y) = ‘1’；
else
b_n(x, y) = ‘0’； …(3)
ここで、b_n(x, y)はn番目の参照画素に対応する二値化処理結果の画素値。 The binarization processing units 106a-106c perform binarization processing on the local calculation processing results 102a-102c, and generate binarization processing results 103a-103c of two-dimensional data. For example, the binarization processing units 106a to 106c perform step function processing of the following equation for binarization processing.
if (r _n (x, y) ≥ 0)
b _n (x, y) = '1';
else
b _n (x, y) = '0';… (3)
Here, b _n (x, y) is a pixel value of the binarization processing result corresponding to the nth reference pixel.

特徴データ生成部107は、積算器107a-107cにより二値化処理結果103a-103cそれぞれに重み係数w₀-w₂を乗算し、加算器107dにより乗算結果を加算処理して二次元データの特徴データ104を生成する生成処理を行う。以下では、処理対象領域108に対して使用する重み係数のセットw₀-w₂を「重み係数群」と呼ぶ。下式は二値化処理結果103a-103cから特徴データ104を生成する計算式である。
v(x, y) = Σ_n=0 ^N-1{b_n(x, y)・w_n} …(4)
ここで、v(x, y)は特徴データ104の画素値、
w_nはn番目の参照画素に対応する二値処理結果の重み係数、
Nは二値化処理結果の数。 The feature data generation unit 107 multiplies each of the binarization processing results 103a-103c by the multipliers 107a-107c by the weighting coefficient w ₀ -w ₂ and adds the multiplication result by the adder 107d to perform the feature processing of the two-dimensional data. A generation process for generating data 104 is performed. Hereinafter, the set of weighting factors w ₀ -w ₂ used for the processing target area 108 is referred to as a “weighting factor group”. The following formula is a calculation formula for generating the feature data 104 from the binarization processing results 103a-103c.
v (x, y) = Σ _{n = 0} ^N-1 {b _n (x, y) ・ w _n }… (4)
Where v (x, y) is the pixel value of the feature data 104,
w _n is a weighting factor of the binary processing result corresponding to the nth reference pixel,
N is the number of binarization processing results.

図2に示す信号処理部はN=3の例であり、b₀(x, y)からb₂(x, y)は二値化処理結果103a-103cの画素値である。 The signal processing unit illustrated in FIG. 2 is an example of N = 3, and b ₀ (x, y) to b ₂ (x, y) are pixel values of the binarization processing results 103a to 103c.

学習部109は、詳細は後述するが、学習により、パターン識別の対象に適合する重み係数群w₀-w₂を決定する。 Although details will be described later, the learning unit 109 determines a weighting coefficient group w ₀ -w ₂ suitable for the target of pattern identification through learning.

LBPは、ビット列の各ビットに2^Nを乗算して、スカラ値を得る。実施例の信号処理部において、二値化処理結果103a-103cは、LBPにおけるエンコード処理前のビット列と等価である。そして、パターン識別の対象に適合するように学習された重み係数群w₀-w₂と各ビットの乗算結果を加算してスカラ値である特徴データ104を得る。つまり、重み係数群w₀-w₂がビット列をスカラ値に変換する変換パラメータに相当する。 LBP multiplies each bit of the bit string by 2 ^N to obtain a scalar value. In the signal processing unit of the embodiment, the binarization processing results 103a to 103c are equivalent to a bit string before encoding processing in LBP. Then, the weight coefficient group w ₀ -w ₂ learned so as to match the object of pattern identification and the multiplication result of each bit are added to obtain feature data 104 as a scalar value. That is, the weight coefficient group w ₀ -w ₂ corresponds to a conversion parameter for converting a bit string into a scalar value.

［学習部］
学習部109は、図2に示す信号処理部の出力である特徴データ104がパターン識別対象に有効な特徴量を示すように重み係数群w₀-w₂を学習により決定する。 [Learning Department]
The learning unit 109 determines the weight coefficient group w ₀ -w ₂ by learning so that the feature data 104, which is the output of the signal processing unit shown in FIG.

図5のブロック図により学習部109の構成例を説明する。学習部109は、Siamese学習器（非特許文献2参照）を基本構成とする。Siamese学習器は、入力データのペアとそれらデータのクラスを示すラベル情報を基に、特徴データ間の距離を、同じクラスのデータに対しては小さくし、異なるクラスのデータに対しては大きくする、変換パラメータを学習する。 A configuration example of the learning unit 109 will be described with reference to the block diagram of FIG. The learning unit 109 has a basic configuration of a Siamese learner (see Non-Patent Document 2). The Siamese learner reduces the distance between feature data based on the pair of input data and the label information indicating the class of those data, and increases the distance for the same class of data. Learn the conversion parameters.

データベース(DB)406は、顔画像と当該顔画像に対応する人物IDからなる顔認証用のデータ群を格納する。人物IDは、顔画像に対応する人物を特定するための識別情報であり、例えば整数値で表される。例えば、DB406に登録された人物の順に人物IDとして例えば値0、1、2、…が設定される。さらに、人物IDに、名前やニックネームなどの文字列データを関連付けることが好ましい。 The database (DB) 406 stores a face authentication data group including a face image and a person ID corresponding to the face image. The person ID is identification information for specifying a person corresponding to the face image, and is represented by an integer value, for example. For example, values 0, 1, 2,... Are set as person IDs in the order of persons registered in the DB 406, for example. Furthermore, it is preferable to associate character string data such as a name and a nickname with the person ID.

DB406に登録する顔画像は、両目が水平に配置され、かつ、予め定められたサイズになるように画像変換された画像が好ましい。さらに、信号処理の出力として、各種変動に対してロバストな特徴量が得られるように、顔画像は、パン方向およびチルト方向への顔の傾き、表情、照明条件などについて、様々な変動を含むことが望ましい。 The face image registered in the DB 406 is preferably an image obtained by converting the images so that both eyes are arranged horizontally and have a predetermined size. Furthermore, the face image includes various variations in the tilt, facial expression, lighting conditions, etc. of the face in the pan direction and tilt direction so that a robust feature amount can be obtained as a signal processing output. It is desirable.

画像ペア選択部407は、DB406から学習に使用する顔画像のペアを選択する。顔画像のペアは、DB406に格納されたすべての顔画像の中から、毎回、ランダムに選択される。選択された顔画像の画像データ401a、401bはそれぞれ、特徴抽出部408aと408bに入力される。また、画像ペア選択部407は、選択した顔画像の人物IDが同じ場合は値0、異なる場合は値1のラベル405を設定する。ラベル405は、損失計算部404において損失が計算される際に使用される。 The image pair selection unit 407 selects a face image pair to be used for learning from the DB 406. A pair of face images is randomly selected from every face image stored in the DB 406 each time. The image data 401a and 401b of the selected face image are input to the feature extraction units 408a and 408b, respectively. Further, the image pair selection unit 407 sets a label 405 having a value of 0 when the person IDs of the selected face images are the same, and a value of 1 when they are different. The label 405 is used when the loss is calculated in the loss calculation unit 404.

特徴抽出部408a、408bは、入力された顔画像401a、401bに対して、図2に示す信号処理部と同等の処理を実行して特徴データ104を生成する。特徴抽出部408a、408bは同じ構成を有し、重み係数群402を共有するため、入力された顔画像が同じであれば同じ特徴データが得られる。 The feature extraction units 408a and 408b perform the same processing as the signal processing unit shown in FIG. 2 on the input face images 401a and 401b to generate the feature data 104. Since the feature extraction units 408a and 408b have the same configuration and share the weight coefficient group 402, the same feature data can be obtained if the input face images are the same.

距離計算部403は、特徴抽出部408a、408bが生成した二つの特徴データの距離を計算する。本実施例においては距離尺度として、特徴データをベクトルとした場合のベクトル間のL1ノルムを使用する。例えば、特徴データのサイズをW×Hとすると、ベクトルの次元はW×Hになる。下式は、特徴抽出処理408a、bが生成した特徴データ間のL1ノルムを計算する計算式である。
E(w) = ‖v₁(w) - v₂(w)‖₁ …(5)
ここで、E(w)は特徴データ間のL1ノルム、
wは重み係数群、
v_m(w)は入力画像mから生成した特徴データ、
mは画像ペアにおけるインデックス。 The distance calculation unit 403 calculates the distance between the two feature data generated by the feature extraction units 408a and 408b. In this embodiment, an L1 norm between vectors when feature data is a vector is used as a distance scale. For example, if the feature data size is W × H, the vector dimension is W × H. The following equation is a calculation equation for calculating the L1 norm between the feature data generated by the feature extraction processing 408a and 408b.
E (w) = ‖v ₁ (w)-v ₂ (w) ‖ ₁ … (5)
Where E (w) is the L1 norm between feature data,
w is a group of weight coefficients,
v _m (w) is feature data generated from the input image m,
m is the index in the image pair.

式(5)において、重み係数群wは、本実施例の信号処理における重み係数を要素とするベクトルである。特徴データv、L1ノルムEは何れも重み係数により値が変化するため、wの関数として表す。特徴データv₁(w)、v₂(w)は、顔画像401a、401bから生成される特徴データである。つまり、式(5)により、第一の特徴データv₁(w)の各要素を並べた一次元のベクトルと、第二の特徴データv₂(w)の各要素を並べた一次元のベクトルの間のL1ノルムが計算される。 In Equation (5), the weighting factor group w is a vector having the weighting factors in the signal processing of this embodiment as elements. The feature data v and the L1 norm E are expressed as functions of w because the values change depending on the weighting coefficient. The feature data v ₁ (w) and v ₂ (w) are feature data generated from the face images 401a and 401b. That is, according to Equation (5), a one-dimensional vector in which the elements of the first feature data v ₁ (w) are arranged and a one-dimensional vector in which the elements of the second feature data v ₂ (w) are arranged. The L1 norm between is calculated.

なお、距離尺度はベクトル間のL1ノルムに限らず、ユークリッド距離、コサイン距離など他の距離尺度を使用してもよい。 The distance scale is not limited to the L1 norm between vectors, and other distance scales such as Euclidean distance and cosine distance may be used.

損失計算部404は、距離計算部403が計算したL1ノルムと、画像ペア選択部407が生成したラベル405に基づき、損失を計算する。下式は、L1ノルムとラベル405から損失L(w)を計算する計算式である。
L(w) = (1 - Y)2/Q・E(w)² + Y・2Q・exp{-2.77E(w)/Q} …(6)
ここで、Yはラベル405（値0は人物IDが同じ、値1は人物IDが異なる）、
QはL1ノルムE(w)の上限値（設定値）。 The loss calculation unit 404 calculates a loss based on the L1 norm calculated by the distance calculation unit 403 and the label 405 generated by the image pair selection unit 407. The following formula is a formula for calculating the loss L (w) from the L1 norm and the label 405.
L (w) = (1-Y) 2 / Q ・ E (w) ² + Y ・ 2Q ・ exp {-2.77E (w) / Q}… (6)
Where Y is label 405 (value 0 is the same person ID, value 1 is the different person ID),
Q is the upper limit (set value) of L1 norm E (w).

顔画像401a、401bの人物IDが同じ場合、ラベル405はY=0になり、損失L(w)は、L1ノルムE(w)が小さければ小さな値に、L1ノルムE(w)が大きければ大きな値になる（距離と損失が比例関係を示す）。つまり、同じ人物の顔画像については、特徴データ間の距離が小さいほど損失L(w)が小さな値になることを意味する。 If the face images 401a and 401b have the same person ID, the label 405 is Y = 0, and the loss L (w) is small if the L1 norm E (w) is small, and if the L1 norm E (w) is large. Large value (distance and loss are proportional). In other words, for the same person's face image, the smaller the distance between the feature data, the smaller the loss L (w).

また、顔画像401a、401bの人物IDが異なる場合、ラベル405はY=1になり、損失L(w)は、L1ノルムE(w)が小さければ大きな値に、L1ノルムE(w)が大きければ小さな値になる（距離と損失が反比例関係を示す）。つまり、異なる人物の顔画像については、特徴データ間の距離が大きいほど損失L(w)が小さな値になることを意味する。 Further, when the person IDs of the face images 401a and 401b are different, the label 405 is Y = 1, the loss L (w) is a large value if the L1 norm E (w) is small, and the L1 norm E (w) is The larger the value, the smaller the value (distance and loss are inversely related). That is, for face images of different persons, it means that the loss L (w) becomes smaller as the distance between the feature data is larger.

顔画像のペアから損失L(w)が計算されると、係数更新部409は、損失L(w)を基に勾配降下法を用いて重み係数群402を更新する。なお、重み係数群402は、学習を始める前に初期化する必要があり、係数更新部409は、乱数を用いて重み係数群402を初期化するか、以前の学習により得られた重み係数群wを重み係数群402の初期値に設定し、追加再学習を行うようにする。 When the loss L (w) is calculated from the pair of face images, the coefficient updating unit 409 updates the weighting coefficient group 402 using the gradient descent method based on the loss L (w). Note that the weighting coefficient group 402 needs to be initialized before starting learning, and the coefficient updating unit 409 initializes the weighting coefficient group 402 using random numbers, or a weighting coefficient group obtained by previous learning. w is set to the initial value of the weighting coefficient group 402, and additional relearning is performed.

下式は、勾配降下法により重み係数群wのi番目の要素の更新を示す式である。
w'_i = w_i - ρ・∂L(w)/∂w_i …(7)
ここで、w_iは更新前のi番目の要素、
w'_iは更新後のi番目の要素、
ρは更新係数。 The following expression is an expression indicating the update of the i-th element of the weight coefficient group w by the gradient descent method.
w ' _i = w _i -ρ ・ ∂L (w) / ∂w _i … (7)
Where w _i is the i-th element before update,
w ' _i is the i-th element after the update,
ρ is an update coefficient.

式(7)によりw_iを更新するには∂L(w)/∂w_iを求める必要がある。損失L(w)は、L1ノルムE(w)を通してのみw_iに依存するため、偏微分の連鎖法則を適用し、∂L(w)/∂w_iを下式のように変形する。
∂L(w)/∂w_i = ∂L(w)/∂E(w)・∂E(w)/∂w_i …(8) In order to update w _i by equation (7), it is necessary to obtain (L (w) / ∂w _i . Since the loss L (w) depends on w _i only through the L1 norm E (w), the partial differential chain law is applied to transform ∂L (w) / ∂w _i as shown in the following equation.
∂L (w) / ∂w _i = ∂L (w) / ∂E (w) ・ ∂E (w) / ∂w _i … (8)

式(8)において、∂L(w)/∂E(w)は、式(6)をL1ノルムE(w)について偏微分することにより得られる。また、∂E(w)/∂w_iは、∂L(w)/∂w_iを分解した場合と同様に、下式のように変形することができる。
∂E(w)/∂w_i = Σ_j{∂L(w)/∂v_j・∂v_j/∂w_i} …(9)
ここで、v_jは特徴データv(w)のj番目の要素。 In equation (8), ∂L (w) / ∂E (w) is obtained by partial differentiation of equation (6) with respect to L1 norm E (w). Further, ∂E (w) / ∂w _i, as in the case of decomposing ∂L (w) / ∂w _i, can be transformed into the following equation.
∂E (w) / ∂w _i = Σ _j {∂L (w) / ∂v _j・ ∂v _j / ∂w _i }… (9)
Here, v _j is the j-th element of the feature data v (w).

式(9)において、∂E(w)/∂v_jは、式(5)をv_jについて偏微分することにより得られる。ただし、∂E(w)/∂v_jは、顔画像401aの特徴データv₁(w)に含まれるv_j、顔画像401bの特徴データv₂(w)に含まれるv_jそれぞれについて計算する必要がある。また、式(9)は、特徴データv(w)の各要素v_jについて分解した結果を総和する。これは、L1ノルムE(w)が特徴データv(w)の全要素v_jを通してw_iに依存するためである。 In Equation (9), ∂E (w) / ∂v _j is obtained by partial differentiation of Equation (5) with respect to v _j . However, ∂E (w) / ∂v _j is calculated for v _j, respectively v _j included in the feature data v ₂ (w) of the face image 401b included in the feature data v ₁ (w) of the face image 401a There is a need. Equation (9) sums the results of decomposition for each element v _j of the feature data v (w). This is because the L1 norm E (w) depends on w _i through all elements v _{j of the} feature data v (w).

また、式(9)において、∂v_j/∂w_iは、式(4)をw_iについて偏微分することにより得られる。このようにして得られる∂L(w)/∂w_iを用いて、式(7)によりw_iを更新する。 Further, in the equation _{(9), ∂v j / ∂w} i is obtained by equation (4) is partially differentiated for w _i. The thus obtained ∂L (w) / using ∂w _i, and updates the w _i by the equation (7).

なお、∂E(w)/∂v_jには、特徴データv₁(w)から算出されたものと、特徴データv₂(w)から算出されたものがある。重み係数群wは、前者の∂E(w)/∂v_jを用いて更新された後、後者の∂E(w)/∂v_jを用いて更新されるものとする。 Note that ∂E (w) / ∂v _j includes those calculated from the feature data v ₁ (w) and those calculated from the feature data v ₂ (w). Weight coefficient group w, after being updated using the former ∂E (w) / ∂v _j, it shall be updated using the latter ∂E (w) / ∂v _j.

以上の勾配降下法による重み係数群wの更新は、顔画像のペア一組に対して一回実行する。例えばM組（M≧1）のペアが選択される場合、式(7)による重み係数wの更新はM回実行される。 The updating of the weight coefficient group w by the gradient descent method described above is executed once for each pair of face images. For example, when M pairs (M ≧ 1) are selected, the updating of the weighting coefficient w by Expression (7) is executed M times.

［情報処理装置の構成］
図6のブロック図により実施例の情報処理装置の構成例を示す。 [Configuration of information processing device]
A block diagram of FIG. 6 shows a configuration example of the information processing apparatus of the embodiment.

CPU503は、RAM505をワークメモリとして、ROM504やデータ保存部501に格納されたOSや各種プログラムを実行し、システムバス509を介して、後述する構成を制御し、後述する機能を実行する。データ保存部501は、ハードディスクドライブ(HDD)、ディスクドライブ、メモリカード、USBメモリなどで構成され、記録媒体に画像データ、プログラムやその他のデータを保持する。データ保存部501に格納されたプログラムには、前述した信号処理部および学習部109を実現するプログラムやデータ、後述する顔認証を実行するプログラムやデータが含まれる。 The CPU 503 uses the RAM 505 as a work memory, executes an OS and various programs stored in the ROM 504 and the data storage unit 501, controls a configuration described later via the system bus 509, and executes a function described later. The data storage unit 501 includes a hard disk drive (HDD), a disk drive, a memory card, a USB memory, and the like, and holds image data, programs, and other data on a recording medium. The programs stored in the data storage unit 501 include programs and data for realizing the signal processing unit and the learning unit 109 described above, and programs and data for executing face authentication described later.

表示部507は、ビデオカードおよびモニタから構成され、CPU503によって、画像処理前後の画像や、グラフィックスユーザインタフェイス(GUI)などを表示する。 The display unit 507 includes a video card and a monitor. The CPU 503 displays images before and after image processing, a graphics user interface (GUI), and the like.

入力部506は、キーボード、ポインティングデバイス、表示部507のモニタに重ねられたタッチパネルなどを有し、ユーザ指示を入力する。実施例の情報処理装置を例えばディジタルカメラやプリンタなどの機器に適用する場合、入力部506は、ボタン、ダイヤル、テンキー、タッチパネルなどに相当する。勿論、ソフトウェアキーボードをモニタに表示し、タッチパネルの操作によってユーザ指示が入力される構成も可能である。 The input unit 506 includes a keyboard, a pointing device, a touch panel superimposed on the monitor of the display unit 507, and the like, and inputs user instructions. When the information processing apparatus according to the embodiment is applied to a device such as a digital camera or a printer, the input unit 506 corresponds to a button, a dial, a numeric keypad, a touch panel, or the like. Of course, a configuration in which a software keyboard is displayed on a monitor and a user instruction is input by operating the touch panel is also possible.

通信部502は、有線または無線ネットワークを介して、機器間の通信を行うためのインタフェイスである。 The communication unit 502 is an interface for performing communication between devices via a wired or wireless network.

CPU503は、顔認証対象の画像データを、例えば、入力部506に設けられたUSBなどのシリアルインタフェイスなどを介して外部の撮像装置から入力する。あるいは、通信部502を介して、ネットワーク上の撮像装置または情報処理装置やサーバ装置から顔認証対象の画像データを入力してもよい。また、図5に示すDB406は、データ保存部501またはネットワーク上のサーバ装置に格納されている。 The CPU 503 inputs face authentication target image data from an external imaging device via, for example, a serial interface such as USB provided in the input unit 506. Alternatively, face authentication target image data may be input from an imaging device, an information processing device, or a server device on the network via the communication unit 502. 5 is stored in the data storage unit 501 or a server device on the network.

なお、実施例の情報処理装置は、前述した信号処理部および学習部、後述する顔認証処理を実現するプログラムを汎用のコンピュータ機器に供給することで実現可能である。 The information processing apparatus according to the embodiment can be realized by supplying the above-described signal processing unit and learning unit, and a program for realizing face authentication processing described later to a general-purpose computer device.

［顔認証処理］
図7のフローチャートにより実施例の顔認証処理を説明する。なお、図7に示す処理はCPU503によって実行される。 [Face recognition processing]
The face authentication process of the embodiment will be described with reference to the flowchart of FIG. Note that the processing shown in FIG.

CPU503は、顔認証処理を開始すると、上述した信号処理に使用する処理対象領域108、参照画素の相対位置、重み係数群、後述する変換行列などに関する情報をRAM505の所定領域にロードする初期化処理を実行する(S601)。 When the face authentication process is started, the CPU 503 loads the information on the processing target area 108 used for the above-described signal processing, the relative position of the reference pixel, the weighting coefficient group, a later-described transformation matrix, and the like into a predetermined area of the RAM 505. Is executed (S601).

図8により処理対象領域108、参照画素の相対位置、重み係数群に関する情報のメモリ格納形式の一例を示す。RAM505には、次の情報が格納される。処理対象領域の情報として、処理対象領域108の対角線上の二頂点の座標(Xlt, Ylt)と(Xrb, Yrb)が格納される。続いて、局所演算処理結果の数に対応する参照画素の数Nが格納される。続いて、参照画素数N分の参照画素の相対位置として(x₀, y₀)、(x₁, y₁)、…、(x_N-1, y_N-1)が格納される。さらに、参照画素数N分の重み係数群としてw₀、w₁、…、w_N-1が格納される。 FIG. 8 shows an example of a memory storage format of information on the processing target area 108, the relative position of the reference pixel, and the weighting coefficient group. The RAM 505 stores the following information. As information of the processing target area, coordinates (Xlt, Ylt) and (Xrb, Yrb) of two vertices on the diagonal line of the processing target area 108 are stored. Subsequently, the number N of reference pixels corresponding to the number of local calculation processing results is stored. Subsequently, (x ₀ , y ₀ ), (x ₁ , y ₁ ),..., (X _N−1 , y _N−1 ) are stored as the relative positions of the reference pixels corresponding to the reference pixel number N. Further, w ₀ , w ₁ ,..., W _N−1 are stored as weight coefficient groups corresponding to the number of reference pixels N.

重み係数群として、例えば乱数または以前の学習により得られた重み係数群が初期値として設定される。なお、図8に示す処理対象領域108、参照画素の相対位置、重み係数群に関する情報の格納形式は一例であり、CPU503がそれら情報を識別可能な形式であれば任意の格納方法でよい。 As the weight coefficient group, for example, a random number or a weight coefficient group obtained by previous learning is set as an initial value. Note that the storage format of information regarding the processing target area 108, the relative position of the reference pixel, and the weighting coefficient group illustrated in FIG. 8 is an example, and any storage method may be used as long as the CPU 503 can identify the information.

次に、CPU503は、入力部506を介して、動作モードの指示が入力されたか否かを判定する(S602)。そして、動作モードの指示が入力されると、指示された動作モードに従い処理を分岐し(S603)、指示された動作モードに従う処理が終了すると、処理をステップ601に戻す。 Next, the CPU 503 determines whether or not an operation mode instruction has been input via the input unit 506 (S602). When the operation mode instruction is input, the process branches according to the instructed operation mode (S603). When the process according to the instructed operation mode is completed, the process returns to step 601.

実施例における動作モードには、(a)学習モード、(b)識別モード、(c)登録モードの三つがある。学習モードにおいては、上述した学習処理(S604)が実行される。識別モードにおいては、上述した信号処理を用いて顔認証処理(S605)が実行される。登録モードにおいては、上述した信号処理を用いて顔認証に使用する登録データの作成処理(S606)が実行される。 There are three operation modes in the embodiment: (a) learning mode, (b) identification mode, and (c) registration mode. In the learning mode, the above-described learning process (S604) is executed. In the identification mode, face authentication processing (S605) is executed using the signal processing described above. In the registration mode, registration data creation processing (S606) used for face authentication is executed using the signal processing described above.

また、学習処理に使用される顔画像は、学習処理に先立ち、以下の手順に従い作成され、人物IDに関連付けられてデータ保存部501など割り当てられたDB406に格納されているとする。 Further, it is assumed that the face image used for the learning process is created according to the following procedure prior to the learning process and stored in the DB 406 assigned to the data storage unit 501 or the like in association with the person ID.

CPU503は、学習対象の顔画像の画像データをデータ保存部501などからRAM505の所定領域にロードし、ロードした画像データを8ビット符号なし輝度画像に変換する。そして、顔検出手法を用いて、画像データから顔領域を検出し、所定サイズにリサイズした顔画像の画像データを人物IDに関連付けてDB406に格納する。 The CPU 503 loads the image data of the face image to be learned from the data storage unit 501 or the like to a predetermined area of the RAM 505, and converts the loaded image data into an 8-bit unsigned luminance image. Then, using a face detection method, a face area is detected from the image data, and the image data of the face image resized to a predetermined size is stored in the DB 406 in association with the person ID.

さらに、動的アピアランスモデルや動的形状モデルなどを用いて、目、鼻、口などの部位の位置を検出し、検出した位置に基づき、両目を水平に配置し、かつ、顔画像が所定サイズになるように画像変換することが好ましい。さらに、信号処理の出力として、各種変動に対してロバストな特徴量が得られるように、顔画像は、パン方向およびチルト方向への顔の傾き、表情、照明条件などについて、様々な変動を含むことが望ましい。 In addition, the position of the eyes, nose, mouth, and other parts are detected using a dynamic appearance model, a dynamic shape model, etc., and the eyes are horizontally arranged based on the detected positions, and the face image has a predetermined size. It is preferable to convert the image so that Furthermore, the face image includes various variations in the tilt, facial expression, lighting conditions, etc. of the face in the pan direction and tilt direction so that a robust feature amount can be obtained as a signal processing output. It is desirable.

なお、学習処理に使用される顔画像を作成する準備作業を外部の画像処理装置などによって実行することもできる。この場合、CPU503は、ネットワークや記録媒体を介して、外部で作成された顔画像と人物IDを入力し、顔画像と人物IDをDB406に格納する。 It should be noted that the preparatory work for creating the face image used for the learning process can also be executed by an external image processing device or the like. In this case, the CPU 503 inputs an externally created face image and person ID via a network or a recording medium, and stores the face image and person ID in the DB 406.

●学習モード
図9のフローチャートにより学習処理(S604)を説明する。 Learning Mode The learning process (S604) will be described with reference to the flowchart of FIG.

学習モードが指示されると、CPU503は、学習の完了を判定するためのカウンタpをp=0に初期化し(S611)、カウンタpのカウント値に基づき所定の繰返回数K（≧1）の学習が完了したか否かを判定する(S612)。つまり、p＜Kの場合は学習が未了と判定され処理はステップS613に進み、p=Kになると学習が完了と判定され処理はステップS602に戻り、動作モードの指示入力を待つ。 When the learning mode is instructed, the CPU 503 initializes a counter p for determining completion of learning to p = 0 (S611), and a predetermined number of repetitions K (≧ 1) based on the count value of the counter p. It is determined whether learning has been completed (S612). That is, if p <K, learning is determined to be incomplete, and the process proceeds to step S613. If p = K, learning is determined to be complete, and the process returns to step S602 to wait for an operation mode instruction input.

学習が未了の場合、CPU503は、学習に使用する一枚目の顔画像をDB406からランダムに選択し、選択した顔画像の画像データ401aと当該顔画像の人物IDをRAM505の所定領域にロードする(S613)。そして、上述した信号処理により、一枚目の顔画像の画像データ401aから特徴データを抽出する(S614)。 If the learning has not been completed, the CPU 503 randomly selects the first face image to be used for learning from the DB 406, and loads the image data 401a of the selected face image and the person ID of the face image into a predetermined area of the RAM 505. (S613). Then, feature data is extracted from the image data 401a of the first face image by the signal processing described above (S614).

次に、CPU503は、学習に使用する二枚目の顔画像をDB406から選択し、選択した顔画像の画像データ401bと当該顔画像の人物IDをRAM505の所定領域にロードする(S615)。二枚目の顔画像は一枚目の顔画像を除く顔画像の中からランダムに選択される。そして、上述した信号処理により、二枚目の顔画像の画像データ401bから特徴データを抽出する(S616)。 Next, the CPU 503 selects a second face image to be used for learning from the DB 406, and loads the image data 401b of the selected face image and the person ID of the face image into a predetermined area of the RAM 505 (S615). The second face image is randomly selected from the face images excluding the first face image. Then, feature data is extracted from the image data 401b of the second face image by the signal processing described above (S616).

次に、CPU503は、一枚目の顔画像の人物IDと二枚目の顔画像の人物IDが一致するか否かを判定してラベル405を生成する(S617)。前述したように、人物IDが一致すればラベルY=0、一致しなければラベルY=1が生成される。 Next, the CPU 503 determines whether the person ID of the first face image matches the person ID of the second face image, and generates a label 405 (S617). As described above, the label Y = 0 is generated if the person IDs match, and the label Y = 1 is generated if they do not match.

次に、CPU503は、二つの特徴データとラベルYを用いて損失L(w)を計算し(S618)、損失L(w)を用いて、勾配降下法により、RAM505に格納した重み係数群を更新する(S619)。そして、カウンタpをインクリメントして(S620)、処理をステップS612に戻す。 Next, the CPU 503 calculates the loss L (w) using the two feature data and the label Y (S618), and uses the loss L (w) to calculate the weight coefficient group stored in the RAM 505 by the gradient descent method. Update (S619). Then, the counter p is incremented (S620), and the process returns to step S612.

●識別モード
図10のフローチャートにより識別処理(S605)を説明する。 Identification Mode The identification process (S605) will be described with reference to the flowchart of FIG.

識別モードが指示されると、CPU503は、顔認証対象の画像データをRAM505の所定領域にロードし(S631)、当該画像データから顔画像を抽出する(S632)。なお、顔認証対象の画像データが複数の人物の顔画像を含む場合、複数の顔画像が抽出される。 When the identification mode is instructed, the CPU 503 loads face authentication target image data into a predetermined area of the RAM 505 (S631), and extracts a face image from the image data (S632). When the face authentication target image data includes a plurality of person face images, a plurality of face images are extracted.

次に、CPU503は、動的アピアランスモデルや動的形状モデルなどを用いて、抽出した顔画像から目、鼻、口などの部位の位置を検出し、検出位置に基づき両目を水平に配置し、かつ、顔画像を所定サイズにする画像変換を前処理として行う(S633)。 Next, the CPU 503 detects the positions of the eyes, nose, mouth, and other parts from the extracted face image using a dynamic appearance model, a dynamic shape model, etc., and horizontally arranges both eyes based on the detected position, In addition, image conversion for converting the face image into a predetermined size is performed as preprocessing (S633).

次に、CPU503は、上述した信号処理により、顔画像の画像データ（複数の顔画像を抽出した場合は一つの画像データ）から特徴データを抽出し(S634)、特徴データの次元を削減する(S635)。以下では、次元削減後の特徴データを「射影ベクトル」と呼ぶ。 Next, the CPU 503 extracts feature data from the image data of the face image (single image data if a plurality of face images are extracted) by the signal processing described above (S634), and reduces the dimension of the feature data ( S635). Hereinafter, feature data after dimension reduction is referred to as a “projection vector”.

次元削減は、特徴データから識別に効果的な情報のみを抽出して、後段の処理における計算量を少なくする処理である。次元削減は、主成分分析(principal component analysis)や局所保存射影(locality preserving projection)などの予め決定された変換行列を用いて行えばよい。 The dimension reduction is processing for extracting only information effective for identification from feature data and reducing the amount of calculation in the subsequent processing. The dimension reduction may be performed using a predetermined transformation matrix such as principal component analysis or locality preserving projection.

変換行列は、次元削減後のベクトル空間を規定する基底ベクトルを配置したものである。変換行列を用いて、特徴データを一列に並べた特徴ベクトルを、元の空間から基底ベクトルが規定する空間へと射影する。変換行列は、ROM504やデータ保存部501に格納されていたり、または、プログラムの一部として提供され、初期化処理(S601)においてRAM505にロードされる。 The transformation matrix is an arrangement of base vectors that define a vector space after dimension reduction. Using the transformation matrix, the feature vector in which the feature data is arranged in a line is projected from the original space to the space defined by the base vector. The conversion matrix is stored in the ROM 504 and the data storage unit 501, or is provided as a part of the program, and is loaded into the RAM 505 in the initialization process (S601).

次に、CPU503は、射影ベクトルとDB406に格納された登録データを照合する識別処理を実行する(S636)。登録データは、例えば、登録された顔画像の画像データの射影ベクトル（以下、登録ベクトル）と、当該顔画像の人物IDとを有するデータであり、識別処理の照合用データである。なお、人物IDに関連付けて、当該人物の名前やニックネームなどの文字列データ、および、顔画像の画像データがDB406に登録されている。 Next, the CPU 503 executes an identification process for collating the projection vector with the registration data stored in the DB 406 (S636). The registered data is, for example, data having a projection vector (hereinafter referred to as a registered vector) of image data of a registered face image and a person ID of the face image, and is verification data for identification processing. In association with the person ID, character string data such as the name and nickname of the person and image data of the face image are registered in the DB 406.

識別処理においては、射影ベクトルと登録ベクトルの間の類似度および所定の閾値に基づき、識別対象の顔画像の人物IDを特定する。ここでは、類似度を、次元削減後の特徴空間におけるベクトルの間のユークリッド距離として説明する。この場合、距離が小さいほど射影ベクトルと登録ベクトルは類似すると判定することができ、距離が小さい登録ベクトルに対応する顔画像ほど、識別対象の顔画像に類似すると判定される。 In the identification process, the person ID of the face image to be identified is specified based on the similarity between the projection vector and the registered vector and a predetermined threshold. Here, the similarity is described as the Euclidean distance between vectors in the feature space after dimension reduction. In this case, it can be determined that the projection vector and the registration vector are similar as the distance is small, and the face image corresponding to the registration vector with a small distance is determined to be similar to the face image to be identified.

CPU503は、射影ベクトルと登録ベクトルすべての間の距離を計算し、距離が小さい順に登録ベクトルをソートする。ソート後、先頭の登録ベクトルと射影ベクトルの間の距離（最小距離）を計算し、最小距離Dminと所定の閾値Dthを比較する。最小距離が閾値以下（Dmin≦Dth）であれば、識別対象の顔画像は、先頭の登録ベクトルに対応する人物IDに一致し、登録された顔画像であると判定する。他方、最小距離が閾値よりも大きい（Dmin＞Dth）場合、識別対象の顔画像は未登録と判定する。なお、未登録の顔画像の場合、例えば、未登録人物に対応する所定のID値を識別対象の顔画像の人物IDとする。 The CPU 503 calculates the distance between the projection vector and all the registered vectors, and sorts the registered vectors in ascending order of the distance. After sorting, the distance (minimum distance) between the first registered vector and the projection vector is calculated, and the minimum distance Dmin is compared with a predetermined threshold value Dth. If the minimum distance is equal to or smaller than the threshold (Dmin ≦ Dth), the face image to be identified matches the person ID corresponding to the first registered vector and is determined to be a registered face image. On the other hand, when the minimum distance is larger than the threshold (Dmin> Dth), it is determined that the face image to be identified is not registered. In the case of an unregistered face image, for example, a predetermined ID value corresponding to an unregistered person is set as the person ID of the face image to be identified.

次に、CPU503は、識別処理によって得た人物IDを識別対象の顔画像に関連付けてRAM505に格納する(S637)。RAM505に格納された顔画像には、顔認証結果を表示するための情報（例えば、顔認証対象の画像データにおける顔画像の位置とサイズなど）が関連付けられて保存されている。 Next, the CPU 503 stores the person ID obtained by the identification process in the RAM 505 in association with the face image to be identified (S637). The face image stored in the RAM 505 is associated with information for displaying the face authentication result (for example, the position and size of the face image in the face authentication target image data) and stored.

CPU503は、ステップS638の判定により、顔認証対象の画像データから抽出した顔画像すべての識別処理が終了するまで、ステップS634からS637の処理を繰り返す。そして、顔画像すべてについて識別処理が終了すると、識別結果を顔認証結果として出力し(S639)、処理をステップS602に戻し、動作モードの指示入力を待つ。 The CPU 503 repeats the processing from step S634 to S637 until the identification processing for all face images extracted from the face authentication target image data is completed by the determination in step S638. When the identification process is completed for all the face images, the identification result is output as a face authentication result (S639), the process returns to step S602, and an operation mode instruction input is awaited.

図11により顔認証結果の出力例を示す。RAM505に保存されている顔認証対象の画像データ、各顔画像の位置とサイズ、顔画像の人物IDに基づき、図11に示すような顔認証結果を示す画像を生成して表示部507に表示する。図11は、顔認証対象の画像データにおける各顔画像の領域を矩形枠で表し、矩形枠の上部に当該顔画像に関連する文字列を表示した出力例である。 FIG. 11 shows an output example of the face authentication result. Based on the face authentication target image data stored in the RAM 505, the position and size of each face image, and the person ID of the face image, an image showing the face authentication result as shown in FIG. 11 is generated and displayed on the display unit 507. To do. FIG. 11 shows an output example in which each face image area in the face authentication target image data is represented by a rectangular frame, and a character string related to the face image is displayed above the rectangular frame.

顔認証結果の出力は表示部507への表示に限らず、顔認証対象の画像データ、顔画像の位置とサイズ、顔画像の人物IDなどを関連付けて、データ保存部501に保存してもよい。さらに、顔認証結果を通信部502を介して外部の機器に送信してもよい。 The output of the face authentication result is not limited to display on the display unit 507, but may be stored in the data storage unit 501 in association with the image data to be authenticated, the position and size of the face image, the person ID of the face image, and the like. . Further, the face authentication result may be transmitted to an external device via the communication unit 502.

また、顔認証対象の画像データから抽出したすべての顔画像について識別処理が終了した後、顔認証結果を出力する例を説明したが、顔画像ごとの識別処理が終了する都度、顔認証結果を出力してもよい。 Further, the example in which the face authentication result is output after the identification process is completed for all the face images extracted from the image data to be face-authenticated has been described, but each time the identification process for each face image is completed, the face authentication result is It may be output.

また、識別精度を高めるために、前処理(S623)において、両目を水平に配置し、かつ、所定サイズにする画像変換を行う例を説明した。しかし、例えば、精度は多少低下してもよいが、顔認証速度の向上や顔認証処理のリソース削減を図るシステム要件がある場合など、前処理(S623)を省略してもよい。 Further, an example has been described in which image conversion is performed in which both eyes are horizontally arranged and have a predetermined size in the preprocessing (S623) in order to increase the identification accuracy. However, for example, the accuracy may be slightly reduced, but the preprocessing (S623) may be omitted when there is a system requirement for improving the face authentication speed or reducing the resources of the face authentication process.

●登録モード
前述したように、登録データは、例えば、登録された顔画像の画像データの射影ベクトル（登録ベクトル）と、当該顔画像の人物IDとを有するデータである。好ましくは、名前やニックネームなどの文字列データが人物IDに関連付けられて登録されている。 Registration Mode As described above, registration data is data having, for example, a projection vector (registration vector) of image data of a registered face image and a person ID of the face image. Preferably, character string data such as a name and a nickname is registered in association with the person ID.

図12のフローチャートにより登録処理(S606)を説明する。 The registration process (S606) will be described with reference to the flowchart of FIG.

登録モードが指示されると、CPU503は、登録対象の顔画像の画像データをRAM505の所定領域にロードし(S651)、当該画像データから顔画像を抽出し(S652)、抽出した顔画像を矩形枠で囲った画像を表示部507に表示する(S653)。なお、登録対象の画像データが複数の人物の顔画像を含む場合、複数の顔画像が抽出される。 When the registration mode is instructed, the CPU 503 loads the image data of the face image to be registered into a predetermined area of the RAM 505 (S651), extracts the face image from the image data (S652), and extracts the extracted face image as a rectangle. The image surrounded by the frame is displayed on the display unit 507 (S653). Note that when the registration target image data includes face images of a plurality of persons, a plurality of face images are extracted.

ユーザは、表示画像を参照し、入力部506を操作して、登録すべき顔画像を選択する(S654)。その際、ユーザは複数の顔画像を選択することができる。また、もし登録したい顔画像が存在しない場合は、次の顔画像の画像データの入力を指示することができる。 The user refers to the display image and operates the input unit 506 to select a face image to be registered (S654). At that time, the user can select a plurality of face images. If there is no face image to be registered, it is possible to instruct input of image data of the next face image.

次に、CPU503は、識別モードにおけるステップS633-S635の処理と同様に、ユーザが選択した顔画像の画像データに前処理を施す(S655)。そして、顔画像の画像データ（複数の顔画像が選択された場合は一つの画像データ）から特徴データを抽出し(S656)、特徴データの次元を削減した射影ベクトルを生成する(S657)。 Next, the CPU 503 performs preprocessing on the image data of the face image selected by the user in the same manner as the processing in steps S633 to S635 in the identification mode (S655). Then, feature data is extracted from the image data of the face image (single image data when a plurality of face images are selected) (S656), and a projection vector in which the dimension of the feature data is reduced is generated (S657).

次に、CPU503は、既に登録されている人物ID（または人物IDに関連付けられた文字列データ）を表示部507に表示する(S658)。好ましくは、人物ID（または文字列データ）に対応する顔画像も表示する。ユーザは、表示を参照し、入力部506を操作して、選択した顔画像に該当すると思われる人物ID（または文字列データ）を指定するか、選択した顔画像に該当する人物ID（または文字列データ）が存在しない旨を入力する(S659)。 Next, the CPU 503 displays the registered person ID (or character string data associated with the person ID) on the display unit 507 (S658). Preferably, a face image corresponding to the person ID (or character string data) is also displayed. The user refers to the display and operates the input unit 506 to specify a person ID (or character string data) that seems to correspond to the selected face image, or a person ID (or character that corresponds to the selected face image). (Column data) is entered (S659).

CPU503は、人物IDが指定されたか否かを判定し(S660)、選択した顔画像に該当する人物IDが存在しない旨が入力された場合は、新たな人物IDを発行する(S661)。そして、指定された人物IDまたは発行した人物IDに射影ベクトルを関連付けた登録データをDB406に格納する(S662)。 The CPU 503 determines whether or not a person ID has been designated (S660). If it is input that no corresponding person ID exists in the selected face image, a new person ID is issued (S661). Then, the registration data in which the projection vector is associated with the designated person ID or the issued person ID is stored in the DB 406 (S662).

CPU503は、ステップS663の判定により、ユーザが選択した顔画像すべての登録処理が終了するまで、ステップS656からS662の処理を繰り返す。そして、顔画像すべてについて識別処理が終了すると、処理をステップS602に戻し、動作モードの指示入力を待つ。 The CPU 503 repeats the processing from step S656 to S662 until the registration processing for all the face images selected by the user is completed according to the determination in step S663. When the identification process is completed for all the face images, the process returns to step S602 to wait for an operation mode instruction input.

上記では、学習処理と顔認証処理の流れの違いが分かる好適な例として、学習処理と識別処理、登録処理を一つの機器またはプログラムによって実行する例を説明した。しかし、事前の学習処理によって重み係数群が用意され、それを別の機器やプログラムが参照して識別処理や登録処理を実行する構成も可能である。 In the above description, an example in which the learning process, the identification process, and the registration process are executed by a single device or program has been described as a preferable example in which the difference between the flows of the learning process and the face authentication process can be understood. However, a configuration is also possible in which a weighting coefficient group is prepared by prior learning processing, and identification processing and registration processing are performed by referring to the weighting coefficient group.

このように、LBPのような画素間の大小関係を表すビット列を値に変換して生成される特徴量において、スカラ値への変換に使用するパラメータ（重み係数群）を学習することで、低次元かつパターン識別の対象に適した特徴量を抽出することが可能になる。 In this way, by learning the parameters (weighting coefficient group) used for conversion to scalar values in the feature values generated by converting bit strings representing the magnitude relationship between pixels such as LBP to values, It is possible to extract a feature quantity suitable for a dimension and pattern identification target.

以下、本発明にかかる実施例2の情報処理を説明する。なお、実施例2において、実施例1と略同様の構成については、同一符号を付して、その詳細説明を省略する。 The information processing according to the second embodiment of the present invention will be described below. Note that the same reference numerals in the second embodiment denote the same parts as in the first embodiment, and a detailed description thereof will be omitted.

図13のブロック図により実施例2における信号処理部を説明する。 The signal processing unit according to the second embodiment will be described with reference to the block diagram of FIG.

二分木処理部901は、処理対象領域108に二分木処理を施して二次元データの二値化処理結果902a-902cを生成する。特徴データ生成部107は、二値化処理結果902a-902cそれぞれに重み係数群w₀-w₂を乗算し、乗算結果を加算処理して、二次元データの特徴データ104を生成する。なお、二値化処理結果902a-902cから特徴データ104を生成する処理と、重み係数群の学習方法は実施例1と同様である。 The binary tree processing unit 901 performs binary tree processing on the processing target area 108 to generate two-dimensional data binarization processing results 902a to 902c. The feature data generation unit 107 multiplies each of the binarization processing results 902a-902c by the weight coefficient group w ₀ -w ₂ and adds the multiplication results to generate the feature data 104 of two-dimensional data. Note that the processing for generating the feature data 104 from the binarization processing results 902a to 902c and the learning method for the weighting coefficient group are the same as in the first embodiment.

図14により二分木処理を説明する。実施例2では、すべてのノードが葉か二つの子をもち、かつ、すべての葉が同じ深さである完全二分木を使用する。図14に示す二分木は深さが「3」、葉の数が「8」の完全二分木である。 The binary tree process will be described with reference to FIG. Example 2 uses a complete binary tree where all nodes have leaves or two children and all leaves are the same depth. The binary tree shown in FIG. 14 is a complete binary tree having a depth of “3” and a number of leaves of “8”.

二分木処理は、入力データについて、根ノード1001から葉ノード1004a-1004hに到達するまで、順に、二つの子ノードの何れに分岐するかを判定する。実施例2では、子ノードをもつ各ノード（1001、1002a-1002b、1003a-1003d）それぞれについて、式(2)における参照画素の相対位置を予め設定しておく。 In the binary tree processing, it is determined which of the two child nodes is branched in order until the input data reaches the leaf nodes 1004a-1004h from the root node 1001. In the second embodiment, the relative position of the reference pixel in Expression (2) is set in advance for each node (1001, 1002a-1002b, 1003a-1003d) having child nodes.

データが入力されると、各注目画素(x, y)について、式(2)により注目画素と参照画素の間の差分r_nを計算する（局所演算処理）。そして、式(3)に示すステップ関数処理の結果に基づき、分岐先を判定する（つまり、何れの子ノードに分岐するかを判定する二値化処理）。図14に示す例は、ステップ関数処理の結果が‘0’の場合は左の子ノードに、‘1’の場合は右の子ノードに進む。 When data is input, for each target pixel (x, y), calculates a difference r _n between the reference pixel and the pixel of interest by the formula (2) (local processing). Then, based on the result of the step function processing shown in Expression (3), a branch destination is determined (that is, binarization processing for determining which child node to branch to). In the example shown in FIG. 14, when the result of the step function processing is “0”, the process proceeds to the left child node, and when it is “1”, the process proceeds to the right child node.

葉ノード1004a-1004hに到達すると、すべての判定結果を並べたバイナリ列を生成する。図14に破線矢印で示す例においてバイナリ列は‘001’になる。そして、生成したバイナリ列の各要素を図13に示す二値化処理結果902a-902cの座標(x, y)における画素値とする。 When the leaf node 1004a-1004h is reached, a binary string in which all the determination results are arranged is generated. In the example indicated by the dashed arrow in FIG. 14, the binary string is “001”. Then, each element of the generated binary string is set as a pixel value at the coordinates (x, y) of the binarization processing results 902a-902c shown in FIG.

このように、二分木の各階層における分岐の判定結果に対する重みを学習することにより、低次元かつパターン識別の対象に適した特徴量を抽出することができる。 In this way, by learning the weights for the branch determination results in each hierarchy of the binary tree, it is possible to extract a feature quantity suitable for a low-dimensional pattern identification target.

［変形例］
上記では、二次元の画像データに対して、上述した信号処理を適用する例を示したが、三次元以上のデータに対して、あるいは、音声信号などの一次元のデータに対しても本発明を適用することができる。つまり、様々な次元の入力データにおける処理対象領域内の注目データの近傍の複数データを用いて演算処理を行い、上述した信号処理を適用すればよい。 [Modification]
In the above, an example in which the above-described signal processing is applied to two-dimensional image data has been described. However, the present invention is also applied to three-dimensional data or one-dimensional data such as an audio signal. Can be applied. In other words, the above-described signal processing may be applied by performing arithmetic processing using a plurality of data in the vicinity of the data of interest in the processing target area in input data of various dimensions.

上記では、処理対象領域が一つの例を説明したが、処理対象領域の数は一つに限らず、複数の処理対象領域を設定し、処理対象領域おとに異なる重み係数群を設定してもよい。 In the above, an example in which there is one processing target area has been described. However, the number of processing target areas is not limited to one, a plurality of processing target areas are set, and different weight coefficient groups are set for each processing target area. Also good.

図15により複数の処理対象領域の設定例を説明する。図15は、左目領域1201、右目領域1202、鼻領域1203、口領域1204の四つの処理対象領域を設定する例を示す。これら四つの処理対象領域に対して、上述した学習手法を適用することにより、各処理対象領域に適切な重み係数群を決定することができる。ただし、左目領域1201と右目領域1202のように、特性が類似する領域については、それらを区別せずに重み係数群を学習し、得られた重み係数群をそれら領域に適用してもよい。 An example of setting a plurality of processing target areas will be described with reference to FIG. FIG. 15 shows an example in which four processing target areas, that is, a left eye area 1201, a right eye area 1202, a nose area 1203, and a mouth area 1204 are set. By applying the learning method described above to these four processing target areas, it is possible to determine an appropriate weighting coefficient group for each processing target area. However, for regions having similar characteristics, such as the left eye region 1201 and the right eye region 1202, a weighting factor group may be learned without distinguishing them, and the obtained weighting factor group may be applied to these regions.

上記では、二値化処理として式(3)に示すステップ関数を使用する例を説明したが、二値化処理として下式に示すパルス関数を用いてもよい。
if (|t| ≧ Th)
p(t) = ‘1’；
else
p(t) = ‘0’； …(10)
ここで、tは局所演算結果、
Thは所定の閾値。 In the above, an example in which the step function shown in Expression (3) is used as the binarization process has been described, but the pulse function shown in the following expression may be used as the binarization process.
if (| t | ≧ Th)
p (t) = '1';
else
p (t) = '0';… (10)
Where t is the result of local operation,
Th is a predetermined threshold.

ステップ関数を用いる場合、局所演算結果≧0であれば‘1’、局所演算結果＜0であれば‘0’である。つまり、実施例1、2で説明したように、局所演算が注目画素と参照画素の大小比較である場合、参照画素値≧注目画素値であれば‘1’、参照画素値＜注目画素値であれば‘0’になる。 In the case of using a step function, “1” if the local calculation result ≧ 0, and “0” if the local calculation result <0. That is, as described in the first and second embodiments, when the local calculation is a size comparison between the target pixel and the reference pixel, '1' if the reference pixel value ≧ the target pixel value, and the reference pixel value <the target pixel value. If there is, it will be '0'.

これに対して、パルス関数を用いる場合、局所演算結果の絶対値≧閾値であれば‘1’、絶対値＜閾値であれば‘0’とする。つまり、注目画素値と参照画素値の差分絶対値が、閾値以上であれば‘1’、閾値未満であれば‘0’になる。そのため、画素値の大小関係よりも、画素値の差分に注目する場合はパルス関数の利用が有効である。 On the other hand, when the pulse function is used, the absolute value of the local calculation result ≧ 1 is set to ‘1’, and the absolute value <threshold is set to ‘0’. That is, the absolute value of the difference between the target pixel value and the reference pixel value is “1” if it is greater than or equal to the threshold, and “0” if it is less than the threshold. For this reason, the use of a pulse function is effective when focusing on the difference in pixel values rather than the magnitude relationship between pixel values.

また、局所演算処理として二つの画素値の比を計算し、パルス関数を用いて二値化してもよい。画像データ全体の値のレベルが変化（例えば明度の増減）しても、画素値の比は変化しないため、画素値の比を用いると画像データ全体の値のレベルの変化に依らず、同一の処理結果が得られる利点がある。 Further, as a local calculation process, a ratio of two pixel values may be calculated and binarized using a pulse function. Even if the level of the value of the entire image data changes (for example, increase or decrease in brightness), the ratio of the pixel values does not change. There is an advantage that a processing result can be obtained.

また、上記では、局所演算処理として入力画像データの注目画素と参照画素の画素値を比較する例を説明したが、これに限られない。注目画素を使用せずに、注目画素に対して所定の位置にある二つの参照画素を比較してもよい。図16により二つの参照画素を比較する例を示す。あるいは、注目画素および参照画素の画素値と所定の固定値の四則演算（加算、減算、乗算、除算）を実行してもよい。また、注目画素および参照画素の画素値の絶対値を計算してもよい。 Moreover, although the example which compares the pixel value of the attention pixel and reference pixel of input image data as local calculation processing was demonstrated above, it is not restricted to this. Two reference pixels at a predetermined position with respect to the target pixel may be compared without using the target pixel. FIG. 16 shows an example in which two reference pixels are compared. Alternatively, four arithmetic operations (addition, subtraction, multiplication, and division) of the pixel value of the target pixel and the reference pixel and a predetermined fixed value may be executed. Further, the absolute values of the pixel values of the target pixel and the reference pixel may be calculated.

また、上記では、局所演算処理において、入力画像データの画素値を比較する例を説明したが、一画素の画素値の代わりに、m×m画素領域の画素値の平均値を使用してもよい。 In the above description, an example in which the pixel value of the input image data is compared in the local calculation process has been described. Good.

［その他の実施例］
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記録媒体を介してシステム或いは装置に供給し、そのシステムあるいは装置のコンピュータ（又はCPUやMPU等）がプログラムを読み出して実行する処理である。 [Other Examples]
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various recording media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed.

105a-105c … 演算処理部、106a-106c … 二値化処理部、107 … 特徴データ生成部、109 … 学習部 105a-105c ... arithmetic processing unit, 106a-106c ... binarization processing unit, 107 ... feature data generation unit, 109 ... learning unit

Claims

Arithmetic means for performing arithmetic processing using a plurality of data in the vicinity of the data of interest in the processing target area in the input data,
Binarization means for binarizing the arithmetic processing result corresponding to each of the plurality of data;
Generation means for performing generation processing for generating feature data for the attention data from a binarization processing result corresponding to each of the plurality of data using a conversion parameter;
Learning means for learning the conversion parameter so as to reduce the distance between feature data generated from input data of the same class and increase the distance between feature data generated from input data of different classes; Information processing apparatus.

2. The information processing apparatus according to claim 1, wherein the calculation unit calculates a difference between the attention data and each of the plurality of data having a predetermined positional relationship with respect to the attention data.

2. The information processing apparatus according to claim 1, wherein the calculation unit calculates a difference between the plurality of pieces of data having a predetermined positional relationship with respect to the attention data.

The computing means calculates a difference between the attention data and one of the plurality of data in a predetermined positional relationship with respect to the attention data at each node of the binary tree,
2. The information processing apparatus according to claim 1, wherein the binarization unit determines a branch destination as the binarization process based on the difference at each node of the binary tree.

5. The information processing apparatus according to claim 1, wherein the binarization unit performs the binarization process using a step function.

5. The information processing apparatus according to claim 1, wherein the binarization unit performs the binarization process using a pulse function.

7. The information processing apparatus according to claim 1, wherein the generation unit performs multiplication of the binarization processing result and the conversion parameter, and adds the multiplication results as the generation processing. .

7. The information processing apparatus according to claim 1, wherein there are a plurality of processing target areas, and the generation unit applies the same conversion parameter or different conversion parameters to the plurality of processing target areas.

9. The information processing apparatus according to claim 8, wherein the generation unit applies the same conversion parameter to a region having similar characteristics among the plurality of processing target regions.

10. The information processing apparatus according to claim 1, wherein the conversion parameter is a weighting factor corresponding to each of the plurality of data.

The learning means includes
Randomly select first data and second data different from the first data from a plurality of data, and generate label information indicating a relationship between the class of the first data and the class of the second data A selection means;
Means for generating first feature data from the first data and generating second feature data from the second data by the arithmetic processing, the binarization processing and the generation processing;
Distance calculating means for calculating a distance between the first feature data and the second feature data;
A loss calculating means for calculating a loss based on the distance and the label information;
11. The information processing apparatus according to claim 1, further comprising an update unit that updates the conversion parameter based on the loss.

The selection unit generates the label information indicating the same class when the identification information of the first and second data is the same, and indicates a different class when the identification information of the first and second data is different. 12. The information processing apparatus according to claim 11, wherein the label information is generated.

The distance calculation means calculates a norm between a one-dimensional vector in which the elements of the first feature data are arranged and a one-dimensional vector in which the elements of the second feature data are arranged as the distance. 13. An information processing apparatus according to claim 11 or claim 12.

The loss calculating means calculates a loss indicating a proportional relationship with the distance when the label information indicates the same class, and calculates a loss indicating an inverse proportional relationship with the distance when the label information indicates a different class. 14. The information processing device according to any one of claims 11 to 13.

15. The information processing apparatus according to claim 11, wherein the update unit updates the conversion parameter using the loss and gradient descent method.

An information processing method for an information processing apparatus having a calculation means, a binarization means, a generation means, and a learning means,
The arithmetic means performs arithmetic processing using each of a plurality of data in the vicinity of the data of interest in the processing target area in the input data,
The binarization means binarizes the calculation processing result corresponding to each of the plurality of data,
The generation unit generates feature data for the attention data from a binarization processing result corresponding to each of the plurality of data using a conversion parameter;
The learning means learns the conversion parameter so as to reduce the distance between feature data generated from input data of the same class and increase the distance between feature data generated from input data of different classes. Information processing method.

16. An authentication device that performs face authentication processing using the information processing device according to any one of claims 1 to 15.

The authentication apparatus includes a storage unit that stores image data and collation data of a face image associated with identification information, an input unit that inputs a user instruction and image data, and an authentication unit, and the authentication unit includes:
When a user instruction indicating a learning mode is input, the conversion parameter is updated based on the image data stored in the storage unit using the information processing apparatus,
When a user instruction indicating a registration mode is input, a face image included in the input image data is extracted, feature data is extracted from the image data of the face image using the information processing device, and the feature data The collation data with reduced dimensions and the image data of the face image are associated with identification information and registered in the storage means,
When a user instruction indicating an identification mode is input, a face image included in the input image data is extracted, feature data is extracted from the image data of the face image using the information processing device, and the feature data 18. The authentication apparatus according to claim 17, wherein the face authentication process is performed using data with reduced dimensions and collation data stored in the storage unit.

16. The storage device according to claim 1, further comprising: a storage unit that stores image data and collation data of a face image associated with the identification information; an input unit that inputs a user instruction and image data; and an authentication unit. An authentication processing method of an authentication device that performs face authentication processing using the information processing device according to claim 1, wherein the authentication unit includes:
When a user instruction indicating a learning mode is input, the conversion parameter is updated based on the image data stored in the storage unit using the information processing apparatus,
When a user instruction indicating a registration mode is input, a face image included in the input image data is extracted, feature data is extracted from the image data of the face image using the information processing device, and the feature data The collation data with reduced dimensions and the image data of the face image are associated with identification information and registered in the storage means,
When a user instruction indicating an identification mode is input, a face image included in the input image data is extracted, feature data is extracted from the image data of the face image using the information processing device, and the feature data An authentication processing method for performing the face authentication processing using data with reduced dimensions and collation data stored in the storage means.

16. A program for causing a computer to function as each unit of the information processing apparatus according to any one of claims 1 to 15.

19. A program for causing a computer to function as each unit of the authentication device according to claim 17 or 18.

A computer-readable recording medium on which the program according to claim 20 or 21 is recorded.