JP2014203135A

JP2014203135A - Signal processor, signal processing method, and signal processing system

Info

Publication number: JP2014203135A
Application number: JP2013076454A
Authority: JP
Inventors: 大介中嶋; Daisuke Nakajima
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-04-01
Filing date: 2013-04-01
Publication date: 2014-10-27
Anticipated expiration: 2033-04-01
Also published as: JP6137916B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique to make it possible to improve identification accuracy when an identification object is specialized.SOLUTION: A signal processor includes: generation means for generating data of processing results by performing spatial filter processing to input data; first encoding means for performing predetermined encoding processing using a discontinuous function to the data of processing results to generate data of encoding processing results; second encoding means for performing approximation encoding processing in which the discontinuous function is approximated by a continuous function to the data of processing results to generate data of approximation encoding processing results; update means for updating the weighting factor of the spatial filter processing based on the data of approximation encoding processing results; and control means for performing control such that when updating the weighting factor by the update means, data of processing results is provided with the second encoding means, and in other case, data of processing results is provided with the first encoding means.

Description

本発明は、画像データ等からパターン識別に好適な特徴量を抽出する信号処理技術に関するものである。 The present invention relates to a signal processing technique for extracting a feature amount suitable for pattern identification from image data or the like.

画像中に出現する特定の画素パターンを検出したり、他と識別したりする画像処理が知られている。前者は例えば顔検出（画像中から顔らしいパターンを検出する）であり、後者は例えば顔認証処理（検出した顔から個人を特定する）である。例えば、顔認証処理は、あらかじめ顔画像を特徴量データに変換して登録しておき、与えられた顔画像（顔領域の部分画像）を同様の特徴量データに変換し、予め登録してある顔画像の特徴量との類似性を判定し、その結果に応じて個人を識別する処理である。 Image processing for detecting a specific pixel pattern appearing in an image or distinguishing it from another is known. The former is, for example, face detection (detects a face-like pattern from the image), and the latter is, for example, face authentication processing (identifies an individual from the detected face). For example, in the face authentication process, a face image is converted into feature amount data and registered in advance, and a given face image (a partial image of a face area) is converted into similar feature amount data and registered in advance. This is a process of determining similarity with the feature amount of the face image and identifying an individual according to the result.

デジタルカメラ等により撮影したスナップ写真を用いて顔認証処理を実行する場合、顔画像には様々な照明条件下で撮影されたものが含まれる。照明条件が異なる顔画像同士であっても正しく識別するために、パターン識別に用いる特徴量は、画素パターンの照明条件による変動に対して頑健であることが望ましい。そのような特性を備えた特徴量としてＬＢＰ（Local Binary Pattern）が提案されている（非特許文献２）。 When the face authentication process is executed using a snapshot photographed by a digital camera or the like, the face image includes those photographed under various illumination conditions. In order to correctly identify even face images with different illumination conditions, it is desirable that the feature amount used for pattern identification is robust against fluctuations due to the illumination conditions of the pixel pattern. LBP (Local Binary Pattern) has been proposed as a feature quantity having such characteristics (Non-patent Document 2).

また、入力画像に対して数十種のGabor Waveletフィルタ処理を施し、各フィルタ処理後データからＬＢＰを抽出することにより得られる特徴量が提案されている。例えば、非特許文献１、６においてＬＧＢＰ（Local Gabor Binary Pattern）が提案されている。 In addition, a feature amount obtained by applying several tens of Gabor Wavelet filter processes to an input image and extracting LBP from each post-filter data has been proposed. For example, Non-Patent Documents 1 and 6 propose LGBP (Local Gabor Binary Pattern).

非特許文献６によれば、顔認証の特徴量として、入力画像から単純にＬＢＰを抽出したものを用いる代わりに、入力画像にGabor Waveletフィルタ処理を適用した結果からＬＢＰを抽出したもの（ＬＧＢＰ）を用いる方が、より顔認証精度が良くなる。 According to Non-Patent Document 6, instead of using a feature obtained by simply extracting an LBP from an input image as a feature amount for face authentication, an LBP extracted from the result of applying a Gabor Wavelet filter process to the input image (LGBP) The face authentication accuracy is better when using.

一方、識別対象に適した空間フィルタを学習し、学習により得られた空間フィルタを用いて特徴量を抽出する手法として非特許文献４のＣＮＮ（Convolutional Neural Networks）が提案されている。ＣＮＮでは入力画像に対して階層的に空間フィルタ処理を施すことにより特徴量を抽出する。空間フィルタの学習には、一般的に誤差逆伝播（Backpropagation）法が用いられる。誤差逆伝播法は教師有り学習手法であり、学習データとその学習データに対するＣＮＮ出力の正解データの組を用いて学習する。つまり、学習データに対するＣＮＮ出力と正解データとの誤差が小さくなるように空間フィルタの係数を更新する。 On the other hand, CNN (Convolutional Neural Networks) of Non-Patent Document 4 has been proposed as a method of learning a spatial filter suitable for an identification target and extracting a feature amount using the spatial filter obtained by learning. In the CNN, feature quantities are extracted by hierarchically performing spatial filter processing on an input image. For the learning of the spatial filter, an error back propagation method is generally used. The error back propagation method is a supervised learning method, and learning is performed using a set of learning data and correct data of CNN output for the learning data. That is, the coefficient of the spatial filter is updated so that the error between the CNN output for the learning data and the correct answer data becomes small.

先に述べたように、ＬＧＢＰはパターン識別に有効な特徴量として顔認証等に広く利用されている。しかし、ＬＧＢＰにおいて使用するGabor Waveletフィルタは、次元数、データ数が多く、処理負荷が大きいという問題がある。また、Gabor Waveletフィルタは、元々特定のパターン識別用として設計されたものではない。そのため、特定のパターン識別（例えば顔画像の識別）を目的とする場合には、Gabor Waveletフィルタより適切な空間フィルタが存在する可能性がある。また、上述したようにＣＮＮは学習により識別対象に適した空間フィルタを設計することができる。そのため、実際の識別対象（例えば顔画像の識別）の学習に基づき設計された空間フィルタをGabor Waveletフィルタの代わりに用いることが考えられる。 As described above, LGBP is widely used in face authentication and the like as an effective feature quantity for pattern identification. However, the Gabor Wavelet filter used in LGBP has a problem that the number of dimensions and the number of data are large and the processing load is large. Further, the Gabor Wavelet filter is not originally designed for identifying a specific pattern. Therefore, there is a possibility that there is a more appropriate spatial filter than the Gabor Wavelet filter for the purpose of specific pattern identification (for example, identification of a face image). Further, as described above, the CNN can design a spatial filter suitable for the identification target by learning. For this reason, it is conceivable to use a spatial filter designed based on learning of an actual identification target (for example, identification of a face image) instead of a Gabor Wavelet filter.

W. Zhang，S. Shan，W. Gao，X. Chen，and H. Zhang，“Local Gabor Binary Pattern Histogram Sequence (LGBPHS): A Novel Non-Statistical Model for Face Representation and Recognition”，Proc. IEEE International Conference on Computer Vision，pp. 768-791，2005.W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local Gabor Binary Pattern Histogram Sequence (LGBPHS): A Novel Non-Statistical Model for Face Representation and Recognition”, Proc. IEEE International Conference on Computer Vision, pp. 768-791, 2005. T. Ojala，M. Pietikainen，and D. Harwood，“A Comparative Study of Texture Measures with Classification Based on Featured Distributions”，Pattern Recognition，Vol. 29，pp. 51-59，1996.T. Ojala, M. Pietikainen, and D. Harwood, “A Comparative Study of Texture Measures with Classification Based on Featured Distributions”, Pattern Recognition, Vol. 29, pp. 51-59, 1996. 村瀬一郎，金子俊一，五十嵐悟，“増分符号相関によるロバスト画像照合”，電子情報通信学会論文誌 D-II，Vol. J83-D-II，No. 5，pp. 1323-1331，2000.Ichiro Murase, Shunichi Kaneko, Satoru Igarashi, “Robust Image Matching by Incremental Sign Correlation”, IEICE Transactions D-II, Vol. J83-D-II, No. 5, pp. 1323-1331, 2000 Y. LeCun，K. Kavukvuoglu， and C. Farabet，“Convolutional Networks and Applications in Vision”， Proc. IEEE International Symposium on Circuits and Systems，pp. 253-256，2010.Y. LeCun, K. Kavukvuoglu, and C. Farabet, “Convolutional Networks and Applications in Vision”, Proc. IEEE International Symposium on Circuits and Systems, pp. 253-256, 2010. S. Chopra，R. Hadsell，and Y. LeCun，“Learning a similarity metric discriminatively，with application to face verification”， Proc. IEEE Conference on Computer Vision and Pattern Recognition，pp. 539-546，2005.S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification”, Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 539-546, 2005. Z. Lei，S. Liao，R. He，M. Pietikainen，S. Z. Li，“Gabor Volume Based Local Binary Pattern for Face Representation and Recognition”，Proc. IEEE International Conference on Automatic Face & Gesture Recognition，pp. 1-6，2008.Z. Lei, S. Liao, R. He, M. Pietikainen, SZ Li, “Gabor Volume Based Local Binary Pattern for Face Representation and Recognition”, Proc. IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1-6 , 2008.

しかしながら、ＣＮＮの学習方法である誤差逆伝播法を適用するためには、入力画像に対する一連の画像処理におけるすべての処理要素が連続関数（微分可能）である必要がある。つまり、ＬＢＰ及び増分符号のような符号化には不連続関数であるステップ関数が含まれるため、誤差逆伝播法では学習することができない。その結果、パターン識別に好適な特徴量を抽出可能とするＣＮＮを学習により最適化することが出来ない。 However, in order to apply the error back-propagation method, which is a CNN learning method, all processing elements in a series of image processing on an input image need to be continuous functions (differentiable). That is, since the step function that is a discontinuous function is included in the encoding such as LBP and the incremental code, it cannot be learned by the error back propagation method. As a result, it is not possible to optimize a CNN that can extract a feature amount suitable for pattern identification by learning.

本発明は上述の問題点に鑑みなされたものであり、画像データ等からパターン識別に好適な特徴量を抽出可能とする技術を提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technique capable of extracting a feature amount suitable for pattern identification from image data or the like.

上述の問題点を解決するため、本発明の信号処理装置は以下の構成を備える。すなわち、信号処理装置において、入力されたデータに対して空間フィルタ処理を実行することで処理結果データを生成する生成手段と、前記処理結果データに対して、不連続関数を用いる所定の符号化処理を実行し符号化処理結果データを生成する第１の符号化手段と、前記処理結果データに対して、前記不連続関数を連続関数で近似した近似符号化処理を実行し近似符号化処理結果データを生成する第２の符号化手段と、前記近似符号化処理結果データに基づいて、前記空間フィルタ処理の重み係数を更新する更新手段と、前記更新手段による重み係数の更新を行う場合には前記処理結果データを前記第２の符号化手段に提供し、他の場合には前記処理結果データを前記第１の符号化手段に提供する、ように制御する制御手段と、を有する。 In order to solve the above-described problems, the signal processing device of the present invention has the following configuration. That is, in the signal processing device, generation means for generating processing result data by executing spatial filter processing on input data, and predetermined encoding processing using a discontinuous function for the processing result data First encoding means for generating encoding processing result data and approximate encoding processing result data by executing approximate encoding processing approximating the discontinuous function with a continuous function for the processing result data A second encoding means for generating the updating means, an updating means for updating the weighting coefficient of the spatial filter processing based on the approximate encoding processing result data, and the updating of the weighting coefficient by the updating means Control means for controlling to provide processing result data to the second encoding means, and to provide the processing result data to the first encoding means in other cases. .

本発明によれば、画像データ等からパターン識別に好適な特徴量を抽出可能とする技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique which enables extraction of the feature-value suitable for pattern identification from image data etc. can be provided.

第１実施形態における信号処理の概要を説明する図である。It is a figure explaining the outline | summary of the signal processing in 1st Embodiment. 第１実施形態における符号化処理を説明する図である。It is a figure explaining the encoding process in 1st Embodiment. 第１実施形態における近似符号化処理を説明する図である。It is a figure explaining the approximate encoding process in 1st Embodiment. 第１実施形態におけるＣＮＮ学習器の概念図である。It is a conceptual diagram of the CNN learning device in 1st Embodiment. 第１実施形態におけるデータ処理装置の構成を示す図である。It is a figure which shows the structure of the data processor in 1st Embodiment. 第１実施形態におけるデータ処理装置における各モードの動作を示すフローチャートである。It is a flowchart which shows operation | movement of each mode in the data processor in 1st Embodiment. ｔａｎｈ関数の係数と形状の関係を説明する図である。It is a figure explaining the relationship between the coefficient of a tanh function, and a shape. 入力画素値からＬＢＰを抽出する処理を説明する図である。It is a figure explaining the process which extracts LBP from an input pixel value. 入力画素値から増分符号を抽出する処理を説明する図である。It is a figure explaining the process which extracts an increment code | symbol from an input pixel value. ＣＮＮのネットワーク構成のメモリへの格納例を説明する図である。It is a figure explaining the example of storage to the memory of a network structure of CNN. ＣＮＮのフィルタ係数のメモリへの格納例を説明する図である。It is a figure explaining the example of the storing to the memory of the filter coefficient of CNN. ＣＮＮのネットワーク構成を例示的に示す図である。It is a figure which shows the network structure of CNN exemplarily. 顔認証の結果画像の一例を示す図である。It is a figure which shows an example of the result image of face authentication. 第２実施形態におけるデータ処理装置の構成を示す図である。It is a figure which shows the structure of the data processor in 2nd Embodiment. 第２実施形態における信号処理部の構成を示すブロック図である。It is a block diagram which shows the structure of the signal processing part in 2nd Embodiment. 第３実施形態によるデータ処理システムの構成を示す図である。It is a figure which shows the structure of the data processing system by 3rd Embodiment. 第３実施形態におけるクライアント装置の構成を示す図である。It is a figure which shows the structure of the client apparatus in 3rd Embodiment. 第３実施形態における特徴抽出部の構成を示すブロック図である。It is a block diagram which shows the structure of the feature extraction part in 3rd Embodiment. パルス関数と当該パルス関数を近似するガウス関数を示す図である。It is a figure which shows the Gaussian function which approximates a pulse function and the said pulse function.

以下に、図面を参照して、この発明の好適な実施の形態を詳しく説明する。なお、以下の実施の形態はあくまで例示であり、本発明の範囲を限定する趣旨のものではない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. The following embodiments are merely examples, and are not intended to limit the scope of the present invention.

（第１実施形態）
本発明に係る情報処理装置の第１実施形態として、顔画像から顔認証に適した特徴量を抽出するデータ処理装置を例に挙げて以下に説明する。ここで、顔認証とは、入力された顔画像から抽出された特徴量と、予め作成しておいた登録データとを比較することにより個人を特定する処理を意味する。なお、第１実施形態では顔認証における特徴抽出処理に適用する例について説明するが、本発明は他のパターン識別における特徴抽出処理に対しても適用可能である。 (First embodiment)
As a first embodiment of the information processing apparatus according to the present invention, a data processing apparatus that extracts a feature quantity suitable for face authentication from a face image will be described below as an example. Here, the face authentication means a process for identifying an individual by comparing a feature amount extracted from an input face image with registered data created in advance. In the first embodiment, an example applied to feature extraction processing in face authentication will be described, but the present invention can also be applied to feature extraction processing in other pattern identification.

＜１．信号処理の概要＞
図１は、第１実施形態における信号処理の概要を説明する図である。１００は入力画像である。１０９はＣＮＮ（Convolutional Neural Networks）処理であり、入力画像１００に対してＣＮＮ処理を実行する。１１０は符号化処理であり、ＣＮＮ処理結果データに対して所定の符号化処理を実行する。ＣＮＮ（Convolutional Neural Networks）処理１０９、符号化処理１１０（ＬＢＰ及び増分符号）の詳細について以下に説明する。 <1. Overview of signal processing>
FIG. 1 is a diagram for explaining an outline of signal processing in the first embodiment. Reference numeral 100 denotes an input image. Reference numeral 109 denotes CNN (Convolutional Neural Networks) processing, which performs CNN processing on the input image 100. Reference numeral 110 denotes an encoding process, which executes a predetermined encoding process on the CNN process result data. Details of CNN (Convolutional Neural Networks) processing 109 and encoding processing 110 (LBP and incremental code) will be described below.

＜１．１．ＣＮＮ（Convolutional Neural Networks）処理＞
ＣＮＮは、入力画像を順方向に伝播させ、複数の異なる空間フィルタを用いたコンボリューション演算を施すことにより特徴量を抽出するニューラルネットワークである。ここでは、コンボリューション演算結果を格納する２次元データを”特徴抽出面”と呼ぶ。 <1.1. CNN (Convolutional Neural Networks) processing>
The CNN is a neural network that extracts a feature amount by propagating an input image in the forward direction and performing a convolution operation using a plurality of different spatial filters. Here, the two-dimensional data storing the convolution calculation result is referred to as “feature extraction plane”.

続いて、特徴抽出面を重なりのない局所領域に分割し、各局所領域の平均値算出（統合処理）を実行する。平均化することにより、入力画像中の識別対象の微小な幾何学的変動（平均移動や回転など）に対する頑健性が向上する。ここでは、このような処理により生成された２次元データを”統合面”と呼ぶこととする。そして、統合面に対して再び複数の異なる空間フィルタを用いたコンボリューション演算を施すことにより特徴抽出面を生成する。ＣＮＮはこの様に特徴抽出処理と統合処理を階層的に繰り返す事で所望の特徴量を抽出する。 Subsequently, the feature extraction plane is divided into non-overlapping local regions, and an average value calculation (integration process) for each local region is executed. By averaging, robustness against minute geometrical variations (average movement, rotation, etc.) of the identification target in the input image is improved. Here, the two-dimensional data generated by such processing is referred to as “integrated plane”. Then, a feature extraction surface is generated by performing convolution operation using a plurality of different spatial filters again on the integrated surface. In this way, the CNN extracts a desired feature amount by hierarchically repeating the feature extraction process and the integration process.

１０１ａ〜ｃは第１階層の特徴抽出面であり、２次元の空間フィルタによるコンボリューション演算結果を格納する２次元データである。１０５は入力画像１００から特徴抽出面１０１ｃを算出するための２次元のコンボリューション演算の入力と出力の関係を表す。以下の式（１）は入力画像１００から特徴抽出面１０１ｃを生成するコンボリューション演算の計算式である。 Reference numerals 101a to 101c denote feature extraction planes in the first layer, which are two-dimensional data for storing convolution calculation results by a two-dimensional spatial filter. Reference numeral 105 denotes the relationship between the input and output of a two-dimensional convolution operation for calculating the feature extraction plane 101c from the input image 100. The following formula (1) is a calculation formula for the convolution calculation for generating the feature extraction surface 101 c from the input image 100.

ｉ（ｘ，ｙ）：座標（ｘ，ｙ）での入力画素値
ｕ（ｘ，ｙ）：座標（ｘ，ｙ）での演算結果
ｗ（ｃ，ｒ）：座標（ｘ＋ｃ，ｙ＋ｒ）におけるフィルタ係数
width，height：フィルタサイズ
ｉ（ｘ，ｙ）は入力画像１００の画素値（輝度値）に相当する。ここで得られたｕ（ｘ，ｙ）を双曲線正接（ｔａｎｈ）関数等により非線形処理した結果が特徴抽出面１０１ｃの画素値となる。１０１ａ，ｂも同様に入力画像１００に対するコンボリューション演算により生成する。ここで、１０１ａ〜ｃを生成するために使用する空間フィルタは、それぞれ係数が異なる。 i (x, y): input pixel value at coordinates (x, y) u (x, y): calculation result at coordinates (x, y) w (c, r): filter at coordinates (x + c, y + r) coefficient
width, height: filter size i (x, y) corresponds to a pixel value (luminance value) of the input image 100. The result of nonlinear processing of u (x, y) obtained here by a hyperbolic tangent (tanh) function or the like becomes the pixel value of the feature extraction surface 101c. Similarly, 101a and 101b are generated by convolution calculation on the input image 100. Here, the spatial filters used to generate 101a-c have different coefficients.

１０２ａ〜ｃは統合面であり、統合処理の結果を格納する２次元データである。１０６は特徴抽出面１０１ｃから統合面１０２ｃを算出するための統合処理の入力と出力の関係を表す。１０７は統合面１０２ｃから特徴抽出面１０３ｃを算出するための２次元のコンボリューション演算の入力と出力の関係を表す。１０３ａ〜ｃは第３階層の特徴抽出面であり、前階層の統合面１０２ａ〜ｃの全てに対するコンボリューション演算出力結果をｔａｎｈ関数等により非線形処理した値を足し合わせた結果である。従って、図１に示した例では、統合面１０２ａ〜ｃから特徴抽出面１０３ａ〜ｃを生成するために９種類の異なる空間フィルタを使用する。 Reference numerals 102a to 102c denote integration planes, which are two-dimensional data for storing the results of integration processing. Reference numeral 106 denotes a relationship between input and output of integration processing for calculating the integration surface 102c from the feature extraction surface 101c. Reference numeral 107 represents the relationship between the input and output of a two-dimensional convolution operation for calculating the feature extraction surface 103c from the integrated surface 102c. Reference numerals 103a to 103c denote feature extraction planes in the third hierarchy, which are results obtained by adding the values obtained by performing non-linear processing on the convolution calculation output results for all of the integration planes 102a to 102c in the previous hierarchy using a tanh function or the like. Accordingly, in the example shown in FIG. 1, nine different spatial filters are used to generate the feature extraction surfaces 103a-c from the integrated surfaces 102a-c.

学習により決定するパラメータは、特徴抽出面を生成するために使用する空間フィルタの係数である。図１に示した例では、入力画像１００から特徴抽出面１０１ａ〜ｃを生成するために使用する空間フィルタ３枚、統合面１０２ａ〜ｃから特徴抽出面１０３ａ〜ｃを生成するために使用する空間フィルタ９枚の計１２枚の空間フィルタの係数を学習により決定する。 The parameter determined by learning is a coefficient of the spatial filter used for generating the feature extraction surface. In the example shown in FIG. 1, three spatial filters used to generate feature extraction surfaces 101a to 101c from the input image 100, and spaces used to generate feature extraction surfaces 103a to 103c from the integrated surfaces 102a to 102c. The coefficients of a total of 12 spatial filters of 9 filters are determined by learning.

＜１．２．符号化処理＞
符号化処理１１０は、ＣＮＮ処理結果データ（図１に示した例では、特徴抽出面１０３ａ〜ｃ）に対して、注目画素（注目領域）と参照画素（参照領域）との大小関係に基づく符号化を施す処理である。このような符号化処理としては、ＬＢＰもしくは増分符号がある。 <1.2. Encoding process>
The encoding process 110 is a code based on the magnitude relationship between the target pixel (target area) and the reference pixel (reference area) with respect to the CNN process result data (in the example illustrated in FIG. 1, the feature extraction surfaces 103a to 103c). This is a process to apply Such encoding processing includes LBP or incremental code.

＜１．２．１．ＬＢＰ＞
図８は、入力画素値からＬＢＰを抽出する処理を説明する図である。ＬＢＰは、注目画素（ｘ，ｙ）の画素値と、当該注目画素を取り囲む８個の参照画素（ｘ＋ｘ_ｎ，ｙ＋ｙ_ｎ）の画素値とに基づき、式（２）で計算される特徴量に符号化する処理である。 <1.2.1. LBP>
FIG. 8 is a diagram for explaining processing for extracting an LBP from an input pixel value. LBP is based on the pixel value of the pixel of interest (x, y) and the pixel values of eight reference pixels (x + x _n , y + y _n ) surrounding the pixel of interest. This is a process of encoding.

ここで、
ｉ（ｘ，ｙ）：座標（ｘ，ｙ）での入力画素値
ＬＢＰ（ｘ，ｙ）：座標（ｘ，ｙ）でのＬＢＰ
（ｘ_ｎ，ｙ_ｎ）：参照画素の注目画素に対する相対位置
ｘ_ｎ＝｛−１，０，１｝，ｙ_ｎ＝｛−１，０，１｝，ｘ_ｎ ^２＋ｙ_ｎ ^２≠０
ただし、

here,
i (x, y): input pixel value at coordinates (x, y) LBP (x, y): LBP at coordinates (x, y)
(X _n , y _n ): relative position of the reference pixel with respect to the target pixel x _n = {− 1, 0, ₁ }, y _n = {− 1, 0, ₁ }, x _n ² + y _n ² ≠ 0
However,

である。

It is.

図８に示した例では、（ｘ_ｎ，ｙ_ｎ）は、注目画素の左隣の画素を起点として、注目画素を反時計まわりに囲うようにとっている。具体的には、
（ｘ_０，ｙ_０）＝（−１，０）
（ｘ_１，ｙ_１）＝（−１，１）
（ｘ_２，ｙ_２）＝（０，１）
・・・
（ｘ_７，ｙ_７）＝（−１，−１）
としている。 In the example shown in FIG. 8, (x _n , y _n ) surrounds the target pixel counterclockwise starting from the pixel adjacent to the left of the target pixel. In particular,
(X ₀ , y ₀ ) = (− 1, 0)
(X ₁ , y ₁ ) = (− 1, 1)
(X ₂ , y ₂ ) = (0, 1)
...
(X ₇ , y ₇ ) = (− 1, −1)
It is said.

なお、式（３）はステップ関数（階段関数）であり、参照画素値が注目画素値以上である場合は１、その逆の場合は０となる。ＬＢＰは、注目画素と参照画素との大小関係のみを表現するため、照明条件の変化により画素値が変動した場合でも、注目位置と参照位置の明るさの大小関係が変化しない限り同一視することができるという特性がある。 Equation (3) is a step function (step function), which is 1 when the reference pixel value is greater than or equal to the target pixel value, and 0 when the opposite is true. Since the LBP expresses only the magnitude relationship between the target pixel and the reference pixel, even if the pixel value fluctuates due to a change in the illumination condition, the LBP should be identified as long as the magnitude relationship between the brightness of the target position and the reference position does not change. There is a characteristic that can be.

＜１．２．２．増分符号＞
図９は、入力画素値から増分符号を抽出する処理を説明する図である。上述のＬＢＰの代わりに非特許文献３の増分符号を適用してもＬＢＰと同様の効果が得られる。座標（ｘ，ｙ）における増分符号は次式により計算される。 <1.2.2. Increment sign>
FIG. 9 is a diagram for explaining processing for extracting an incremental code from an input pixel value. Even if the incremental code of Non-Patent Document 3 is applied instead of the above-described LBP, the same effect as that of LBP can be obtained. The incremental sign at the coordinates (x, y) is calculated by the following equation.

ｉ（ｘ，ｙ）：座標（ｘ，ｙ）での入力画素値
ＩＳ（ｘ，ｙ）：座標（ｘ，ｙ）での増分符号
（ｘ_０，ｙ_０）：参照画素の注目画素に対する相対位置
ｘ_０＝｛−１，０，１｝，ｙ_０＝｛−１，０，１｝，ｘ_０ ^２＋ｙ_０ ^２≠０
式（２）と式（４）を比較すると、ＬＢＰにおいてｎ＝０としたものが増分符号に相当することが分かる。図９に示した例では、注目画素に対する相対位置が（ｘ_０，ｙ_０）＝（０，−１）である画素を参照画素として使用している。 i (x, y): Input pixel value at coordinates (x, y) IS (x, y): Incremental sign at coordinates (x, y) (x ₀ , y ₀ ): Relative to reference pixel of reference pixel Position x ₀ = {-1, ₀ , 1}, y ₀ = {-1, ₀ , 1}, x ₀ ² + y ₀ ² ≠ 0
Comparing equation (2) and equation (4), it can be seen that the LBP with n = 0 corresponds to the incremental sign. In the example shown in FIG. 9, a pixel whose relative position with respect to the target pixel is (x ₀ , y ₀ ) = (0, −1) is used as a reference pixel.

＜１．２．３．符号化処理の例＞
図２は、第１実施形態における符号化処理（第１の符号化処理）を説明する図である。なお、図１と同じ構成要素については同じ番号を付与している。ここでは、説明を簡単にするために増分符号を用いる場合の例について説明する。ＬＢＰを用いる場合であっても増分符号と同様に適用可能である。 <1.2.3. Example of encoding process>
FIG. 2 is a diagram for explaining the encoding process (first encoding process) in the first embodiment. In addition, the same number is provided about the same component as FIG. Here, in order to simplify the description, an example in the case of using an incremental code will be described. Even when LBP is used, it can be applied in the same manner as the incremental sign.

符号化処理結果データ１０４ａ〜ｃはＣＮＮの出力である特徴抽出面１０３ａ〜ｃに対して符号化処理を施した結果を格納する２次元データである。１０８は特徴抽出面１０３ｃから符号化処理結果データ１０４ｃを算出するための符号化処理における入力と出力の関係を表す。 The encoding process result data 104a to 104c are two-dimensional data for storing the results of applying the encoding process to the feature extraction planes 103a to 103c that are outputs of the CNN. Reference numeral 108 denotes a relationship between input and output in the encoding process for calculating the encoding process result data 104c from the feature extraction surface 103c.

２０２ａ〜ｃは画素比較処理であり、特徴抽出面１０３ａ〜ｃにおける注目画素２０３ａ〜ｃの画素値と、参照画素２０４ａ〜ｃの画素値を比較する。ここでは、参照画素２０４ａ〜ｃから注目画素２０３ａ〜ｃの画素値を引いた差を計算する。 202a-c is a pixel comparison process, which compares the pixel values of the target pixels 203a-c on the feature extraction surfaces 103a-c with the pixel values of the reference pixels 204a-c. Here, the difference obtained by subtracting the pixel values of the target pixels 203a to 203c from the reference pixels 204a to 204c is calculated.

２０１ａ〜ｃは比較処理結果データであり、特徴抽出面１０３ａ〜ｃを画素比較処理２０２ａ〜ｃにより処理した結果を格納する２次元データである。比較処理結果データ２０１ａ〜ｃは、比較に使用する参照画素２０４ａ〜ｃの相対位置に応じて、異なる方向特性を備えた特徴量となる。例えば、比較処理結果データ２０１ｂを生成する際は、注目画素２０３ｂの上隣の画素を参照画素２０４ｂとするため、生成された比較処理結果データ２０１ｂは縦方向の画素値の変化に対応する方向特性を持つ特徴量となる。ここでは、様々な方向特性を備えた特徴量を生成するために、参照画素２０４ａ〜ｃの相対位置はそれぞれ異なる方向（横、縦、斜め）に設定する。 Reference numerals 201a to 201c denote comparison processing result data, which are two-dimensional data for storing results obtained by processing the feature extraction surfaces 103a to 103c by the pixel comparison processes 202a to 202c. The comparison processing result data 201a to 201c are feature amounts having different directional characteristics according to the relative positions of the reference pixels 204a to 204c used for comparison. For example, when the comparison processing result data 201b is generated, the pixel adjacent to the target pixel 203b is set as the reference pixel 204b, and thus the generated comparison processing result data 201b has a directional characteristic corresponding to a change in the vertical pixel value. The feature amount has Here, in order to generate feature amounts having various directional characteristics, the relative positions of the reference pixels 204a to 204c are set to different directions (horizontal, vertical, and diagonal).

符号化処理結果データ１０４ａ〜ｃは比較処理結果データ２０１ａ〜ｃをステップ関数処理２０５により処理した結果を格納する２次元データである。ステップ関数処理２０５は、比較処理結果データ２０１ａ〜ｃの各画素値を入力として式（３）に示したステップ関数を計算する。式（５）は特徴抽出面１０３ａ〜ｃから、符号化処理結果データ１０４ａ〜ｃを生成する計算式である。 The encoding process result data 104a to 104c are two-dimensional data for storing the results of processing the comparison process result data 201a to 201c by the step function process 205. The step function process 205 calculates the step function shown in Expression (3) with each pixel value of the comparison process result data 201a to 201c as an input. Expression (5) is a calculation expression for generating the encoding processing result data 104a to 104c from the feature extraction surfaces 103a to 103c.

ｕ（ｘ，ｙ）：座標（ｘ，ｙ）での前階層の面の画素値
ｖ（ｘ，ｙ）：座標（ｘ，ｙ）での演算結果
（ｘ_０，ｙ_０）：参照画素の注目画素に対する相対位置
図２に示した例では、参照画素の相対位置（ｘ_０，ｙ_０）は、２０４ａ〜ｃについてそれぞれ（−１，０）、（０，−１）、（１，−１）である。 u (x, y): Pixel value of the previous layer surface at coordinates (x, y) v (x, y): Calculation result at coordinates (x, y) (x ₀ , y ₀ ): Reference pixel In the example shown in FIG. 2, the relative positions (x ₀ , y ₀ ) of the reference pixels are (−1, 0), (0, −1), (1, − 1).

符号化処理結果データ１０４ａ〜ｃは、特徴抽出面１０３ａ〜ｃにおける注目画素２０３ａ〜ｃの画素値と、参照画素２０４ａ〜ｃの画素値の大小関係のみを表現する。そのため、照明条件の変化により特徴抽出面１０３ａ〜ｃの画素パターンが変動した場合でも、注目画素２０３ａ〜ｃの画素値と参照画素２０４ａ〜ｃの画素値の大小関係が反転しない限り符号化処理結果データ１０４ａ〜ｃの画素値は変化しない。 The encoding process result data 104a to 104c express only the magnitude relationship between the pixel values of the target pixels 203a to 203c on the feature extraction surfaces 103a to 103c and the pixel values of the reference pixels 204a to 204c. Therefore, even when the pixel pattern of the feature extraction surfaces 103a to 103c changes due to a change in illumination conditions, the encoding processing result is obtained unless the magnitude relationship between the pixel values of the target pixels 203a to 203c and the pixel values of the reference pixels 204a to 204c is reversed. The pixel values of the data 104a to 104c do not change.

＜２．学習によるフィルタ係数の決定＞
第１実施形態では、図１に示した信号処理の出力結果がパターン識別対象に有効な特徴量となるようにＣＮＮのフィルタ係数を学習により決定する。 <2. Determination of filter coefficients by learning>
In the first embodiment, the filter coefficient of CNN is determined by learning so that the output result of the signal processing shown in FIG.

＜２．１．学習器＞
図４は、第１実施形態におけるＣＮＮ学習器の概念図である。ここでは、公知のＳｉａｍｅｓｅ学習器（非特許文献５）を使用する。Ｓｉａｍｅｓｅ学習器は、入力データのペアと、入力データのペアが同じクラスであるかどうかを示すラベルとを基に学習を行う学習器である。具体的には、同じクラスの入力データに対してはＣＮＮ出力間の距離が小さく、逆に異なるクラスの入力データに対してはＣＮＮ出力間の距離の大きくなるようにＣＮＮを学習する。 <2.1. Learning device>
FIG. 4 is a conceptual diagram of the CNN learner in the first embodiment. Here, a known Siasese learning device (Non-Patent Document 5) is used. The Siasese learning device is a learning device that performs learning based on a pair of input data and a label indicating whether the pair of input data is in the same class. Specifically, the CNN is learned so that the distance between CNN outputs is small for input data of the same class, and conversely, the distance between CNN outputs is large for input data of different classes.

学習データベース４０６は、学習データを格納するデータベースである。ここで、学習データとは、顔画像と顔画像に対応する人物ＩＤとを含むデータのことである。ここで、人物ＩＤとは、顔画像に対応する人物を識別するためのものであり、例えば整数値で表わされる。例えば、人物ＩＤはデータベースに登録された順に０、１、２という値が設定されるようにする。また、名前やニックネーム等の文字列データを人物ＩＤに関連付けてもよい。顔画像は好ましくは、両目が水平に並び、かつ予め定められたサイズとなるように画像変換されたものを使用する。ここで、信号処理の出力結果が各種変動に対して頑健な特徴量となるために、顔画像はパン・チルト方向への顔向き、表情、照明条件などについて様々な変動を含むことが望ましい。 The learning database 406 is a database that stores learning data. Here, the learning data is data including a face image and a person ID corresponding to the face image. Here, the person ID is for identifying a person corresponding to the face image, and is represented by an integer value, for example. For example, the values of 0, 1, and 2 are set for the person ID in the order registered in the database. Further, character string data such as a name and a nickname may be associated with the person ID. Preferably, the face image is an image converted so that both eyes are aligned horizontally and have a predetermined size. Here, in order for the output result of the signal processing to be a robust feature quantity against various variations, it is desirable that the face image includes various variations with respect to the face orientation in the pan / tilt direction, facial expressions, illumination conditions, and the like.

画像ペア選択４０７は、学習データベース４０６から学習に使用する顔画像のペアを選択する。ここで、顔画像のペアは、学習データベース４０６に格納されているすべての顔画像の中から毎回ランダムに選択するものとする。そして、選択した顔画像４０１ａ，ｂをそれぞれＣＮＮ処理１０９ａ，ｂに入力する。また、選択した顔画像のＩＤが同じである場合は０、異なる場合は１という値をラベル４０５に設定する。ラベル４０５は、誤差（Ｌｏｓｓ）計算４０４においてＬｏｓｓを計算する際に使用する。 The image pair selection 407 selects a face image pair to be used for learning from the learning database 406. Here, it is assumed that a pair of face images is randomly selected from all the face images stored in the learning database 406 each time. Then, the selected face images 401a and 401b are input to the CNN processes 109a and 109b, respectively. Further, a value of 0 is set in the label 405 when the IDs of the selected face images are the same, and 1 when they are different. The label 405 is used when calculating the loss in the error (Loss) calculation 404.

ＣＮＮ処理１０９ａ，ｂは、図１に示したＣＮＮ処理１０９と同じネットワーク構成のＣＮＮ処理を実行する。ＣＮＮ処理１０９ａ，ｂは、画像ペア選択４０７により選択された顔画像４０１ａ，ｂに対してＣＮＮ処理を実行してＣＮＮ処理結果データを生成する。なお、ＣＮＮ処理１０９ａ，ｂは同じフィルタ係数４０８を共有する。 The CNN processes 109a and 109b execute a CNN process having the same network configuration as that of the CNN process 109 shown in FIG. The CNN processing 109a, b performs CNN processing on the face images 401a, b selected by the image pair selection 407 to generate CNN processing result data. Note that the CNN processes 109a and 109b share the same filter coefficient 408.

符号化処理１１０ａ，ｂは、図１を参照して説明した符号化処理１１０と同じ処理を実行する。符号化処理１１０ａ，ｂはＣＮＮ処理１０９ａ，ｂが生成したＣＮＮ処理結果データを符号化して符号化処理結果データを生成する。ここで、ＣＮＮ処理１０９ａ，ｂ及び符号化処理１１０ａ，ｂは同じ構成であり、ＣＮＮ処理１０９ａ，ｂのフィルタ係数は同一であるため、入力画像が同じであれば符号化処理１１０ａ，ｂが生成する符号化処理結果データは同じものとなる。 The encoding processes 110a and 110b execute the same process as the encoding process 110 described with reference to FIG. The encoding processes 110a and 110b encode the CNN process result data generated by the CNN processes 109a and 109b to generate encoding process result data. Here, since the CNN processes 109a and 109b and the encoding processes 110a and 110b have the same configuration and the filter coefficients of the CNN processes 109a and 109b are the same, the encoding processes 110a and 110b are generated if the input images are the same. The encoding process result data to be the same.

距離計算４０３は、符号化処理１１０ａ，ｂが生成した２つの符号化処理結果データの距離を計算する。ここでは、距離尺度として、符号化処理結果データをベクトルとした場合の間のＬ１ノルムを使用する。例えば、符号化処理結果データ１面あたりのサイズをＷ×Ｈ、符号化処理結果データの面の数をＮとすると、ベクトルの次元はＷ×Ｈ×Ｎとなる。なお、ユークリッド距離、コサイン距離など、他の距離尺度を使用してもよい。式（６）は、符号化処理１１０ａ，ｂが生成した符号化処理結果データ間のＬ１ノルムを計算する計算式である。 The distance calculation 403 calculates the distance between the two encoding process result data generated by the encoding processes 110a and 110b. Here, the L1 norm between the cases where the encoding process result data is a vector is used as the distance measure. For example, if the size of one surface of the encoding process result data is W × H and the number of surfaces of the encoding process result data is N, the dimension of the vector is W × H × N. Note that other distance measures such as Euclidean distance and cosine distance may be used. Expression (6) is a calculation expression for calculating the L1 norm between the encoding process result data generated by the encoding processes 110a and 110b.

ｗ：ＣＮＮのフィルタ係数
Ｅ（ｗ）：符号化処理結果データ間のＬ１ノルム
ｖ_ｎ（ｗ）：入力画像ｎから生成した符号化処理結果データ（ｎ：画像ペア内のインデックス）
ｗはＣＮＮのフィルタ係数を要素とするベクトルである。
なお、ｖ、ＥはいずれもＣＮＮのフィルタ係数により値が変化するためｗの関数である。ｖ_１（ｗ）、ｖ_２（ｗ）はそれぞれ顔画像４０１ａ，ｂから生成された符号化処理結果データである。 w: Filter coefficient of CNN E (w): L1 norm between encoding process result data v _n (w): Encoding process result data generated from input image n (n: index in image pair)
w is a vector whose elements are CNN filter coefficients.
Note that both v and E are functions of w because their values change depending on the filter coefficient of CNN. v ₁ (w) and v ₂ (w) are encoding process result data generated from the face images 401a and 401b, respectively.

Ｌｏｓｓ計算４０４では、距離計算４０３が計算したＬ１ノルムと、画像ペア選択４０７が生成したラベル４０５とに基づきＬｏｓｓを計算する。式（７）は、Ｌ１ノルムとラベル４０５からＬｏｓｓを計算する計算式である。 In the loss calculation 404, the loss is calculated based on the L1 norm calculated by the distance calculation 403 and the label 405 generated by the image pair selection 407. Expression (7) is a calculation expression for calculating Loss from the L1 norm and the label 405.

Ｙ：ラベルの値（０：画像ペアの人物ＩＤが同じ、１：画像ペアの人物ＩＤが異なる）
Ｌ（ｗ）：誤差（Ｌｏｓｓ）
Ｑ：Ｅ（ｗ）の上限値に設定した定数
である。 Y: Label value (0: Person ID of image pair is the same, 1: Person ID of image pair is different)
L (w): Error (Loss)
Q: A constant set to the upper limit value of E (w).

顔画像４０１ａ，ｂの人物ＩＤが同じ場合、ラベル４０５としてＹ＝０が入力される。この場合ＬｏｓｓであるＬ（ｗ）は、Ｅ（ｗ）の値が小さければ小さな値に、逆にＥ（ｗ）の値が大きければ大きな値となる。これは、同じ人物に対しては、符号化処理結果データ間の距離が小さいほどＬｏｓｓの値は小さくなることを意味する。 When the person IDs of the face images 401 a and 401 b are the same, Y = 0 is input as the label 405. In this case, L (w) which is Loss is a small value if the value of E (w) is small, and conversely a large value if the value of E (w) is large. This means that for the same person, the value of Loss decreases as the distance between the encoding process result data decreases.

また、顔画像４０１ａ，ｂの人物ＩＤが異なる場合、Ｙ＝１となる。この場合ＬｏｓｓであるＬ（ｗ）は、Ｅ（ｗ）の値が小さければ大きな値に、逆にＥ（ｗ）の値が大きければ小さな値となる。これは、異なる人物に対しては、符号化処理結果データ間の距離が大きいほどＬｏｓｓの値は小さくなることを意味する。 Further, when the person IDs of the face images 401a and 401b are different, Y = 1. In this case, L (w) which is Loss is a large value when the value of E (w) is small, and conversely, it is a small value when the value of E (w) is large. This means that for different persons, the value of Loss decreases as the distance between the encoding process result data increases.

以上説明した処理により、顔画像のペアからＬｏｓｓの値が算出される。続いて、算出されたＬｏｓｓを基に、誤差逆伝播法によりＣＮＮのフィルタ係数４０８を更新する。誤差逆伝播法によりフィルタ係数４０８を更新する手順について以下に説明する。 Through the processing described above, the value of Loss is calculated from the pair of face images. Subsequently, based on the calculated loss, the CNN filter coefficient 408 is updated by the error back propagation method. A procedure for updating the filter coefficient 408 by the error back propagation method will be described below.

＜２．２．誤差逆伝播法による学習＞
誤差逆伝播法では、誤差関数を最小化するために勾配降下法を用いてパラメータを更新する。ここでは、誤差関数はＬ（ｗ）、パラメータはフィルタ係数ｗである。ここで、ｗは学習を始める前に初期化する必要がある。そこで、ここでは、ｗを乱数により初期化する。あるいは、Gabor Waveletフィルタ、Sobelフィルタ等の公知の空間フィルタの係数を設定し初期化してもよい。また、以前の学習により得られたフィルタ係数を設定して追加再学習するようにしてもよい。式（８）は、勾配降下法によりｗのｉ番目の要素を更新する方法を示す式である。 <2.2. Learning by back propagation method>
In the error backpropagation method, parameters are updated using a gradient descent method in order to minimize the error function. Here, the error function is L (w), and the parameter is the filter coefficient w. Here, w needs to be initialized before learning starts. Therefore, here, w is initialized with a random number. Alternatively, a coefficient of a known spatial filter such as a Gabor Wavelet filter or a Sobel filter may be set and initialized. Further, additional re-learning may be performed by setting a filter coefficient obtained by previous learning. Expression (8) is an expression showing a method of updating the i-th element of w by the gradient descent method.

ｗ_ｉ：更新前のｗのｉ番目の要素
ｗ_ｉ’：更新後のｗのｉ番目の要素
ρ：更新係数
式（８）を計算してｗ_ｉを更新するためには、∂Ｌ（ｗ）／∂ｗ_ｉを求める必要がある。ここで、Ｌ（ｗ）はＥ（ｗ）を通してのみｗ_ｉに依存するため、偏微分の連鎖法則を適用し、∂Ｌ（ｗ）／∂ｗ_ｉを式（９）に示すように変形する。 w _i : i-th element of w before update w _i ': i-th element of w after update ρ: update coefficient To update w _i by calculating equation (8), ∂L (w ) / it is necessary to obtain the ∂w _i. Here, since L (w) depends only on w _i through E (w), the chain law of partial differentiation is applied to transform ∂L (w) / ∂w _i as shown in equation (9). .

ここで、∂Ｌ（ｗ）／∂Ｅ（ｗ）は、式（７）をＥ（ｗ）について偏微分することにより得られる。また、∂Ｅ（ｗ）／∂ｗ_ｉは、∂Ｌ（ｗ）／∂ｗ_ｉを分解したのと同様に、式（１０）に示すように変形することができる。 Here, ∂L (w) / ∂E (w) is obtained by partial differentiation of Equation (7) with respect to E (w). Further, ∂E (w) / ∂w _i, similar to that decompose ∂L (w) / ∂w _i, can be modified as shown in equation (10).

ｖ_ｊ：ｖ（ｗ）のｊ番目の要素
ここで、∂Ｅ（ｗ）／∂ｖ_ｊは、式（６）をｖ_ｊについて偏微分することにより得られる。ここで、∂Ｅ（ｗ）／∂ｖ_ｊは、ｖ１（ｗ）に含まれるｖ１_ｊ，ｖ２（ｗ）に含まれるｖ２_ｊそれぞれについて計算する必要がある。 v _j : j-th element of v (w) Here, ∂E (w) / ∂v _j is obtained by partial differentiation of equation (6) with respect to v _j . Here, ∂E (w) / ∂v _j needs to be calculated for each of v1 _j included in v1 (w) and v2 _j included in v2 (w).

なお、式（１０）では符号化処理結果データｖ（ｗ）の各要素ｖ_ｊに対して分解した結果を足し合わせている。これは、Ｅ（ｗ）はｖ（ｗ）のすべての要素ｖ_ｊを通してｗ_ｉに依存するためである。 In Equation (10), the result of decomposition for each element v _{j of the} encoding processing result data v (w) is added. This is because E (w) depends on w _i through all elements v _j of v (w).

∂ｖ_ｊ／∂ｗ_ｉはさらに式（１１）に示すように変形することができる。 ∂v _j / ∂w _i can be further modified as shown in equation (11).

∂ｖ_ｊ／∂ｕ_ｊは、式（５）を偏微分することにより得られる。さらに、∂ｕ_ｊ／∂ｗ_ｉは、式（４）をｗ_ｉについて偏微分することにより得られる。

∂v _j / ∂u _j is obtained by partial differentiation of equation (5). Furthermore, ∂u _j / ∂w _i is obtained by equation (4) is partially differentiated for _{w i.}

以上のように、誤差逆伝播法では、図４において破線の矢印で示したように、∂Ｌ（ｗ）／∂Ｅ（ｗ）、∂Ｅ（ｗ）／∂ｖ_ｊ、…といった偏微分を入力顔画像からＬｏｓｓを算出する場合とは逆方向に順次計算していく。これにより、最終的に∂Ｌ（ｗ）／∂ｗ_ｉを得る。そして、式（８）に従ってフィルタ係数を更新する。 As described above, in the backpropagation, as indicated by broken line arrow in FIG. 4, ∂L (w) / ∂E (w), ∂E (w) / ∂v j, the partial differential ... such The calculation is sequentially performed in the opposite direction to the case of calculating Loss from the input face image. This gives the final ∂L (w) / ∂w _i. Then, the filter coefficient is updated according to Expression (8).

ここで、距離計算４０３から符号化処理１１０ａ，ｂに逆伝播する際、∂Ｅ（ｗ）／∂ｖ_ｊは、ｖ_１（ｗ）に対応するもの、ｖ_２（ｗ）に対応するものに分かれて逆伝播する。ＣＮＮのフィルタ係数４０８は、まずｖ_１（ｗ）に対する逆伝播により更新し、その後ｖ_２（ｗ）に対する逆伝播により更新するものとする。 Here, when the back propagation from the distance calculating 403 the encoding process 110a, the b, ∂E (w) / ∂v j _are those corresponding to _v 1 _(w), the one corresponding to _v 2 (w) Divide and propagate back. The CNN filter coefficient 408 is updated first by back propagation for v ₁ (w) and then updated by back propagation for v ₂ (w).

＜２．３．近似符号化処理＞
上述したように、誤差逆伝播法によりＣＮＮのフィルタ係数４０８を学習するためには、すべての演算は微分可能な関数である必要がある。しかし、符号化処理１１０ａ，ｂにおけるステップ関数処理２０５では、式（３）に示したステップ関数を用いる。ステップ関数は、ｔ＝０で不連続な関数であるため、その近傍で微分不可能である。そのためこのままでは式（１１）における∂ｖ_ｊ／∂ｕ_ｊを計算することができない。 <2.3. Approximate encoding process>
As described above, in order to learn the CNN filter coefficient 408 by the error back-propagation method, all operations need to be differentiable functions. However, in the step function process 205 in the encoding processes 110a and 110b, the step function shown in Expression (3) is used. Since the step function is a discontinuous function at t = 0, it cannot be differentiated in the vicinity thereof. Therefore, この v _j / ∂u _j in equation (11) cannot be calculated as it is.

そこで、第１実施形態では、ステップ関数処理２０５におけるステップ関数を、微分可能な連続関数により置き換えて近似する。ここでは、ステップ関数を近似する連続関数としてｔａｎｈ関数を使用するが、ステップ関数を十分に近似可能な連続関数であれば任意の関数を用いるのであってもよい。例えば、シグモイド関数（ｓｉｇｍｏｉｄ）を使用するように変形することも可能である。ここでは、ステップ関数を連続関数により近似した符号化処理のことを”近似符号化処理”と呼ぶ。 Therefore, in the first embodiment, the step function in the step function processing 205 is approximated by replacing it with a differentiable continuous function. Here, the tanh function is used as a continuous function that approximates the step function, but any function may be used as long as it is a continuous function that can sufficiently approximate the step function. For example, it can be modified to use a sigmoid function. Here, an encoding process in which a step function is approximated by a continuous function is referred to as an “approximate encoding process”.

図３は、第１実施形態における近似符号化処理（第２の符号化処理）を説明する図である。図２の符号化処理に対して、ステップ関数がｔａｎｈ関数に置き換えられたものに相当する。なお、図２と同じ構成要素については同じ番号を付与し説明を省略する。 FIG. 3 is a diagram illustrating the approximate encoding process (second encoding process) in the first embodiment. This corresponds to the encoding process of FIG. 2 in which the step function is replaced with a tanh function. Note that the same constituent elements as those in FIG.

近似符号化処理結果データ３０２ａ〜ｃは、比較処理結果データ２０１ａ〜ｃをｔａｎｈ関数処理３０１により処理した結果を格納する２次元データである。ｔａｎｈ関数処理３０１は、比較処理結果データ２０１ａ〜ｃの各画素値を入力としてｔａｎｈ関数を計算することにより近似符号化処理結果データ３０２ａ〜ｃを生成する。 The approximate encoding process result data 302a to 302c are two-dimensional data for storing the results of processing the comparison process result data 201a to 201c by the tanh function process 301. The tanh function processing 301 generates approximate coding processing result data 302a to 302c by calculating the tanh function with each pixel value of the comparison processing result data 201a to 201c as an input.

式（１２）は、図３に示した近似符号化処理により、特徴抽出面１０３ａ〜ｃから近似符号化処理結果データ３０２ａ〜ｃを生成する計算式である。 Expression (12) is a calculation expression for generating the approximate encoding process result data 302a to 302c from the feature extraction planes 103a to 103c by the approximate encoding process shown in FIG.

ｕ（ｘ，ｙ）：座標（ｘ，ｙ）での前階層の面の画素値
ｖ〜（ｘ，ｙ）：座標（ｘ，ｙ）での演算結果
ｋ：ｔａｎｈ関数の傾きを決定する係数
ここで、ｋはｔａｎｈ関数の傾きを決定する係数である。 u (x, y): pixel value of the previous layer surface at coordinates (x, y) v to (x, y): operation result at coordinates (x, y) k: coefficient for determining the slope of the tanh function Here, k is a coefficient that determines the slope of the tanh function.

図７は、ｔａｎｈ関数の係数と形状の関係を説明する図である。ｋの値が大きくなるにつれてｔａｎｈ関数の傾きは大きくなる。ｋの値をｋ＞＞１と設定することにより、ｔａｎｈ関数はステップ関数と近い特性を持つようになる。 FIG. 7 is a diagram illustrating the relationship between the coefficient of the tanh function and the shape. As the value of k increases, the slope of the tanh function increases. By setting the value of k as k >> 1, the tanh function has characteristics close to those of the step function.

なお、ｋの値があまりにも大きいと、ｔａｎｈ関数を微分した結果は、ｔ＝０付近では非常に大きな値となり、それ以外では０に近い値となる。これは学習処理において、ｔ＝０付近ではフィルタ係数は大きく更新され、それ以外ではフィルタ係数はほとんど更新されないことを意味する。その結果、学習が収束しにくくなるという問題が生じる。ただし、ｋの値を小さく設定すると、ｔａｎｈ関数とステップ関数との誤差が大きくなってしまう。なお、実験から、ｋ＝０．５〜４．０でより好適な結果が得られることが分かった。 If the value of k is too large, the result of differentiating the tanh function is a very large value near t = 0, and close to 0 otherwise. This means that in the learning process, the filter coefficient is greatly updated near t = 0, and the filter coefficient is hardly updated otherwise. As a result, there arises a problem that learning is difficult to converge. However, if the value of k is set small, the error between the tanh function and the step function increases. In addition, it turned out that a more suitable result is obtained by experiment from k = 0.5-4.0.

＜３．データ処理装置の構成＞
図５は、第１実施形態におけるデータ処理装置の構成を示す図である。以下では、上述したＣＮＮのフィルタ係数の学習および決定したフィルタ係数に基づくパターン識別処理の双方を実行可能なデータ処理装置について説明する。 <3. Configuration of data processing apparatus>
FIG. 5 is a diagram illustrating a configuration of the data processing device according to the first embodiment. In the following, a data processing apparatus capable of performing both of the above-described learning of CNN filter coefficients and pattern identification processing based on the determined filter coefficients will be described.

データ保存部５０１は、画像データを保持する部分であり、通常はハードディスク、フレキシブルディスク、光学記憶メディア（ＣＤ、ＤＶＤ）、半導体記憶メディア（各種規格のメモリーカード、ＵＳＢメモリ）等で構成される。データ保存部５０１には画像データの他にも、プログラムやその他のデータを保存することも可能である。あるいは、後述するＲＡＭ５０５の一部をデータ保存部５０１として用いるのであってもよい。またあるいは、後述する通信部５０２により接続した先の機器の記憶装置を、通信部５０２を介して利用するというように仮想的に構成するのであってもよい。 The data storage unit 501 is a part that holds image data, and is usually configured by a hard disk, a flexible disk, an optical storage medium (CD, DVD), a semiconductor storage medium (memory cards of various standards, USB memory), or the like. In addition to image data, the data storage unit 501 can store programs and other data. Alternatively, a part of the RAM 505 described later may be used as the data storage unit 501. Alternatively, a storage device of a destination device connected by the communication unit 502 described later may be virtually configured to be used via the communication unit 502.

表示部５０７は、画像処理前、画像処理後の画像を表示、あるいはＧＵＩ等の画像を表示する装置で、一般的にはＣＲＴや液晶ディスプレイなどが用いられる。あるいは、ケーブル等で接続された装置外部のディスプレイ装置であっても構わない。 The display unit 507 is a device that displays an image before image processing, an image after image processing, or an image such as a GUI, and generally uses a CRT, a liquid crystal display, or the like. Alternatively, it may be a display device outside the device connected by a cable or the like.

入力部５０６は、ユーザからの指示や、データを入力する装置で、キーボードやポインティング装置を含む。なお、ポインティング装置としては、マウス、トラックボール、トラックパッド、タブレット等が挙げられる。あるいは、データ処理装置を例えば公知のデジタルカメラ装置やプリンタなどの機器に適用する場合には、ボタンやダイヤル等で構成されるのであってもよい。また、キーボードをソフトウェアで構成（ソフトウェアキーボード）し、ボタンやダイヤル、あるいは先に挙げたポインティングデバイスを操作して文字を入力するように構成するのであってもよい。 The input unit 506 is a device for inputting instructions and data from the user, and includes a keyboard and a pointing device. Note that examples of the pointing device include a mouse, a trackball, a trackpad, and a tablet. Alternatively, when the data processing device is applied to a device such as a known digital camera device or printer, the data processing device may be configured with buttons, a dial, or the like. Further, the keyboard may be configured by software (software keyboard), and may be configured to input characters by operating buttons, dials, or the pointing device mentioned above.

また、あるいは公知のタッチスクリーン装置のように、表示部５０７と入力部５０６が同一装置であってもよい。その場合、タッチスクリーンによる入力を入力部５０６の入力として扱う。 Alternatively, the display unit 507 and the input unit 506 may be the same device as in a known touch screen device. In that case, the input by the touch screen is handled as the input of the input unit 506.

５０３はＣＰＵであり、上述した各処理を実行すると共にデータ処理装置全体の動作を制御する。ＲＯＭ５０４とＲＡＭ５０５は、その処理に必要なプログラム、データ、作業領域などをＣＰＵ５０３に提供する。後述する処理に必要なプログラムがデータ保存部５０１に格納されている場合や、ＲＯＭ５０４に格納されている場合には、一旦ＲＡＭ５０５に読み込まれてから実行される。なお、図１においては、ＣＰＵが１つ（ＣＰＵ５０３）だけである構成だが、これを複数設けるような構成としてもよい。 Reference numeral 503 denotes a CPU that executes the above-described processes and controls the operation of the entire data processing apparatus. The ROM 504 and RAM 505 provide the CPU 503 with programs, data, work areas, and the like necessary for the processing. When a program necessary for processing to be described later is stored in the data storage unit 501 or stored in the ROM 504, the program is once read into the RAM 505 and executed. In FIG. 1, although there is only one CPU (CPU 503), a configuration in which a plurality of CPUs are provided may be employed.

通信部５０２は、機器間の通信を行うためのＩ／Ｆである。例えば、ＩＥＥＥ８０２．３シリーズ規格に代表される公知の有線ＬＡＮ規格、ＵＳＢ（Universal Serial Bus）、ＩＥＥＥ１２８４、ＩＥＥＥ１３９４、電話回線などの有線による通信方式であってもよい。あるいは赤外線（ＩｒＤＡ）、ＩＥＥＥ８０２．１１シリーズ規格に代表される公知の無線ＬＡＮ規格、Bluetooth（登録商標）、ＵＷＢ（Ultra Wide Band）等の無線通信方式であってもよい。 A communication unit 502 is an I / F for performing communication between devices. For example, a known wired LAN standard represented by the IEEE 802.3 series standard, a wired communication system such as USB (Universal Serial Bus), IEEE 1284, IEEE 1394, a telephone line, or the like may be used. Alternatively, it may be a wireless communication system such as infrared (IrDA), a known wireless LAN standard represented by the IEEE 802.11 series standard, Bluetooth (registered trademark), UWB (Ultra Wide Band), or the like.

なお、図５では入力部５０６、データ保存部５０１、表示部５０７が全て１つの装置内に含まれるような図を示しているが、それぞれが別体として構成されるようにしてもよい。その場合、各部は公知の通信方式により相互に通信可能に接続される。また、上記以外の追加の構成要素が存在してもよい。 Although FIG. 5 shows a diagram in which the input unit 506, the data storage unit 501, and the display unit 507 are all included in one apparatus, they may be configured separately. In that case, each part is connected so that communication is possible mutually by a well-known communication system. There may be additional components other than those described above.

＜４．データ処理装置の処理フロー＞
図６は、第１実施形態におけるデータ処理装置における各モードの動作を示すフローチャートである。以下では、ＣＰＵ５０３が各種プログラムを実行することにより以下のフローチャートに示す処理を実行する。 <4. Processing flow of data processing apparatus>
FIG. 6 is a flowchart showing the operation in each mode in the data processing apparatus according to the first embodiment. In the following, the CPU 503 executes various programs to execute the processing shown in the following flowchart.

ステップＳ６０１では、動作モードを判定する。動作モードは、例えばユーザが入力部５０６を通して指定される。ここでは、動作モードとして、（ａ）学習モード、（ｂ）識別モード、（ｃ）登録データ作成モードの３種類あるものとする。以下では、各モードでのデータ処理装置の処理フローについて説明する。 In step S601, the operation mode is determined. The operation mode is designated by the user through the input unit 506, for example. Here, it is assumed that there are three types of operation modes: (a) learning mode, (b) identification mode, and (c) registration data creation mode. Hereinafter, the processing flow of the data processing apparatus in each mode will be described.

＜４．１．（ａ）学習モード＞
学習モードは、学習を実行するための動作モードである。なお、学習に使用する顔画像は、学習処理に先立って以下の手順に従って作成され、対応する人物ＩＤと関連付けてデータ保存部５０１に保存されているものとする。まず、データ保存部５０１に保存されている画像データをＲＡＭ５０５に読み出す。次に、ＲＡＭ５０５にある画像データを８ｂｉｔ符号なし輝度画像に変換する。そして、公知の顔検出手法により顔領域を検出し、予め定めたサイズにリサイズした顔画像をデータ保存部５０１に保存する。好ましくは、目や鼻、口といった顔の器官位置を検出し、検出した器官位置に基づいて両目が水平に並び、かつ予め定められたサイズとなるように画像変換する。なお、器官位置の検出には、公知のActive Appearance Model、Active Shape Model等を用いることができる。 <4.1. (A) Learning mode>
The learning mode is an operation mode for performing learning. It is assumed that the face image used for learning is created according to the following procedure prior to the learning process, and is stored in the data storage unit 501 in association with the corresponding person ID. First, the image data stored in the data storage unit 501 is read into the RAM 505. Next, the image data in the RAM 505 is converted into an 8-bit unsigned luminance image. Then, a face area is detected by a known face detection method, and the face image resized to a predetermined size is stored in the data storage unit 501. Preferably, facial organ positions such as eyes, nose, and mouth are detected, and image conversion is performed so that both eyes are aligned horizontally and have a predetermined size based on the detected organ positions. For detecting the organ position, a known Active Appearance Model, Active Shape Model, or the like can be used.

また、各種変動に対して頑健な特徴量となるように、顔画像は、パン・チルト方向の顔向き、表情、照明条件などについて様々な変動を含むことが望ましい。以上の処理は、ＣＰＵ５０３により処理してもよいし、あるいは同様の処理を外部装置により実行した結果を通信部５０２を介してデータ保存部５０１に保存してもよい。 Further, it is desirable that the face image includes various variations in the face direction in the pan / tilt direction, facial expressions, illumination conditions, and the like so that the feature amount is robust against various variations. The above processing may be performed by the CPU 503, or the result of executing similar processing by an external device may be stored in the data storage unit 501 via the communication unit 502.

ステップＳ６０２では、学習回数カウンタｐをｐ＝０に初期化する。ここで、学習回数カウンタｐは、ステップＳ６０３において学習の完了を判定するために使用する。 In step S602, a learning number counter p is initialized to p = 0. Here, the learning number counter p is used to determine completion of learning in step S603.

ステップＳ６０３では、予め指定した繰返し回数分の学習が完了したかどうかを判定する。ここでは、繰返し回数を予めＭ（Ｍは１以上の整数）回と設定しておく。ステップＳ６０３は、ｐ＜Ｍが成立するかどうかを判定する。成立しない場合は、学習が完了したと判定され、図６のフローチャートによる処理を終了する。一方、成立した場合は、学習は完了していないと判定され、ステップＳ６０４〜Ｓ６１１に進む。 In step S603, it is determined whether learning for the number of repetitions designated in advance has been completed. Here, the number of repetitions is set in advance as M (M is an integer of 1 or more). In step S603, it is determined whether p <M is satisfied. If not established, it is determined that the learning has been completed, and the processing according to the flowchart of FIG. 6 is terminated. On the other hand, when it is established, it is determined that the learning is not completed, and the process proceeds to steps S604 to S611.

ステップＳ６０４では、学習に使用する１枚目の顔画像をデータ保存部５０１からランダムに選択し、ＲＡＭ５０５に格納する。また、選択した顔画像に関連付けられている人物ＩＤをＲＡＭ５０５に格納する。 In step S604, the first face image used for learning is randomly selected from the data storage unit 501 and stored in the RAM 505. Further, the person ID associated with the selected face image is stored in the RAM 505.

ステップＳ６０５では、ステップＳ６０４において選択した顔画像に対してＣＮＮ処理を実行してＣＮＮ処理結果データを生成する。ここでは、ＣＮＮのネットワーク構成及びフィルタ係数はＲＡＭ５０５に保存されているものとする。もしろん、データ保存部５０１や、ＲＯＭ５０４に格納されていてもよい。この場合、ＣＮＮのネットワーク構成及びフィルタ係数をいったんＲＡＭ５０５に読み込んでからＣＮＮ処理を実行する。 In step S605, CNN processing is performed on the face image selected in step S604 to generate CNN processing result data. Here, it is assumed that the network configuration and filter coefficient of CNN are stored in RAM 505. Of course, it may be stored in the data storage unit 501 or the ROM 504. In this case, the CNN processing is executed after the CNN network configuration and filter coefficients are once read into the RAM 505.

ＣＮＮのネットワーク構成及びフィルタ係数のＲＡＭ５０５への格納形式、ＣＮＮ処理を実行する際のＣＰＵ５０３動作についての詳細を順に説明する。 Details of the network configuration of the CNN, the storage format of the filter coefficient in the RAM 505, and the operation of the CPU 503 when executing the CNN processing will be described in order.

図１０は、ＣＮＮのネットワーク構成のメモリへの格納例を説明する図である。ここでは、３階層のＣＮＮネットワーク構成を示しているが、３階層に限るものではなく、一般的にはＬ階層（Ｌは１以上の整数）の構成である。ここでは、各階層の処理内容（”特徴抽出”もしくは”統合”）１００１、生成する面の数１００２、アドレス１００３という３種類のデータを１セットとして第１階層から順にＬ個並べたものをＲＡＭ５０５に保持する。また、アドレス１００３が示す位置において、前階層の各面と現階層の各面のすべての組み合わせについて、接続がある場合は○、接続がない場合は×とした２次元配列１００４ａ〜ｃを保持する。 FIG. 10 is a diagram for explaining an example of storage in the memory of the network configuration of the CNN. Here, a three-layer CNN network configuration is shown, but the configuration is not limited to three layers, and is generally a configuration of an L layer (L is an integer of 1 or more). Here, the RAM 505 is a set of three types of data, that is, processing contents (“feature extraction” or “integration”) 1001 of each layer, the number of surfaces to be generated 1002, and an address 1003, arranged in order from the first layer. Hold on. In addition, at the position indicated by the address 1003, the two-dimensional arrays 1004a to 1004c are stored in which all the combinations of each surface of the previous layer and each surface of the current layer are ○ when there is a connection and × when there is no connection. .

図１２は、図１０に示したＣＮＮのネットワーク構成を視覚化した図である。なお、ＣＮＮのネットワーク構成のＲＡＭ５０５への格納方法は図１０に示した例に限らない。ＣＰＵ５０３が各階層における処理、各階層において生成する面の数、階層間の接続関係を識別可能な形式であれば任意のものであってよい。 FIG. 12 is a diagram visualizing the network configuration of the CNN shown in FIG. The method for storing the CNN network configuration in the RAM 505 is not limited to the example shown in FIG. Any format can be used as long as the CPU 503 can identify the processing in each layer, the number of planes generated in each layer, and the connection relationship between layers.

図１１は、ＣＮＮのフィルタ係数のメモリへの格納例を説明する図である。ここでは、空間フィルタの大きさと係数を１次元配列としてＲＡＭ５０５に保持する。また、フィルタ係数の先頭アドレスを順に並べた１次元配列をＲＡＭ５０５に保持する。ここで、ＲＡＭ５０５には、すべての特徴抽出面を生成するために必要な空間フィルタを格納する。 FIG. 11 is a diagram for explaining an example of storing CNN filter coefficients in a memory. Here, the size and coefficients of the spatial filter are held in the RAM 505 as a one-dimensional array. Further, a one-dimensional array in which the head addresses of the filter coefficients are arranged in order is held in the RAM 505. Here, the RAM 505 stores a spatial filter necessary for generating all feature extraction planes.

例えば、図１０に示した例では、入力画像から第１階層の特徴抽出面を生成するために必要な３個の空間フィルタと、第２階層の統合面から第３階層の特徴抽出面を生成するために必要な９個の空間フィルタとを合わせた合計１２個の空間フィルタを保持する。なお、ここでは、第１階層の特徴抽出面を生成するために必要な空間フィルタ、第３階層の特徴抽出面を生成するために必要な空間フィルタの順にＲＡＭ５０５に保持するものとする。 For example, in the example shown in FIG. 10, three spatial filters necessary for generating a first layer feature extraction surface from an input image and a third layer feature extraction surface from a second layer integration surface are generated. A total of 12 spatial filters, including the 9 spatial filters necessary for this, are held. Here, it is assumed that the RAM 505 holds the spatial filter necessary for generating the feature extraction plane of the first hierarchy and the spatial filter necessary for generating the feature extraction plane of the third hierarchy in this order.

なお、フィルタ係数には予め初期値が設定されているものとする。初期値は乱数であってもよいし、あるいは以前の学習により得られたフィルタ係数であってもよい。更に、Gabor Waveletフィルタ、Sobelフィルタ等の公知の空間フィルタの係数であってもよい。ＣＮＮのフィルタ係数のＲＡＭ５０５への格納方法は図１１に示した例に限らない。ＣＰＵ５０３が各空間フィルタの大きさ、係数を識別可能な形式であれば任意のものであってよい。 It is assumed that initial values are set in advance for the filter coefficients. The initial value may be a random number or may be a filter coefficient obtained by previous learning. Furthermore, it may be a coefficient of a known spatial filter such as a Gabor Wavelet filter or a Sobel filter. The method of storing CNN filter coefficients in the RAM 505 is not limited to the example shown in FIG. Any format may be used as long as the CPU 503 can identify the size and coefficient of each spatial filter.

ＣＰＵ５０３は、アドレス１００３が示す位置に格納されている階層間の接続を表す２次元配列に従って前階層の各面に対して処理を実行する。ここで、各階層において読み込む接続関係の数は前階層の面の数（前階層が入力画像である場合は１）及び現階層の面の数により定められる。例えば、第３階層において読み込む接続関係の数は、前階層の面の数が３であり、現階層の面の数が４であるため、３×４＝１２となる。 The CPU 503 executes processing for each surface of the previous layer in accordance with a two-dimensional array representing the connection between layers stored at the position indicated by the address 1003. Here, the number of connection relationships to be read in each hierarchy is determined by the number of faces in the previous hierarchy (1 if the previous hierarchy is an input image) and the number of faces in the current hierarchy. For example, the number of connection relationships read in the third hierarchy is 3 × 4 = 12, since the number of faces in the previous hierarchy is 3 and the number of faces in the current hierarchy is 4.

ＣＰＵ５０３は、前階層の各面に対して処理内容１００１が示す処理を実行する。処理内容１００１が特徴抽出である場合、図１１に示した空間フィルタをＲＡＭ５０５読み込んで空間フィルタ処理を実行する。ｋ番目のフィルタ係数を読み込む際は、まず、アドレスｋ、アドレスｋ＋１の位置にそれぞれ格納されている空間フィルタの幅（width）、高さ（height）を読み込む。次に、width、heightの値を基に、アドレスｋ＋２の位置から順に空間フィルタ係数を読み込むことにより、width×heightの大きさの２次元の空間フィルタを作成する。一方、処理内容１００１が”統合”である場合、前述した統合処理を実行する。 The CPU 503 executes the processing indicated by the processing content 1001 for each surface of the previous layer. When the processing content 1001 is feature extraction, the spatial filter shown in FIG. 11 is read into the RAM 505 and the spatial filter processing is executed. When reading the kth filter coefficient, first, the width (height) and height (height) of the spatial filter stored at the position of address k and address k + 1 are read. Next, based on the values of width and height, a spatial filter coefficient is read in order from the position of address k + 2, thereby creating a two-dimensional spatial filter having a size of width × height. On the other hand, when the processing content 1001 is “integration”, the above-described integration processing is executed.

ステップＳ６０６では、ステップＳ６０５において生成したＣＮＮ処理結果データに対して、式（１２）に示したｔａｎｈ関数を用いた近似符号化処理を実行して近似符号化処理結果データを生成する。 In step S606, the approximate encoding process result data is generated by executing the approximate encoding process using the tanh function shown in Expression (12) for the CNN process result data generated in step S605.

ステップＳ６０７では、学習に使用する２枚目の顔画像をデータ保存部５０１から選択し、ＲＡＭ５０５に格納する。ここで、２枚目の顔画像はステップＳ６０４において選択した１枚目の顔画像を除いた中からランダムに選択するものとする。また、選択した顔画像の人物ＩＤをＲＡＭ５０５に格納する。そして、１枚目の顔画像と同様に、ステップＳ６０５，Ｓ６０６の順にＣＮＮ処理、ｔａｎｈ関数を用いた近似符号化処理を実行して近似符号化処理結果データを生成する。 In step S607, the second face image used for learning is selected from the data storage unit 501 and stored in the RAM 505. Here, it is assumed that the second face image is randomly selected from the ones excluding the first face image selected in step S604. Further, the person ID of the selected face image is stored in the RAM 505. Similar to the first face image, the CNN process and the approximate encoding process using the tanh function are executed in the order of steps S605 and S606 to generate approximate encoding process result data.

ステップＳ６０８では、ステップＳ６０４及びＳ６０７において選択した２つの人物ＩＤが一致するかどうかを確認する。もし２つの人物ＩＤが一致すれば”０”、一致しなければ”１”というラベルを生成する。 In step S608, it is confirmed whether or not the two person IDs selected in steps S604 and S607 match. If the two person IDs match, a label “0” is generated, and if they do not match, a label “1” is generated.

ステップＳ６０９では、２枚の顔画像から生成した近似符号化処理結果データと、ステップＳ６０８において生成したラベルを用いてＬｏｓｓを計算する。まず、２つの近似符号化処理結果データを用いて式（６）に従ってＬ１ノルムを生成する。次に、Ｌ１ノルムとラベルを用いて式（７）に従ってＬｏｓｓを計算する。 In step S609, Loss is calculated using the approximate encoding processing result data generated from the two face images and the label generated in step S608. First, an L1 norm is generated according to equation (6) using two approximate encoding processing result data. Next, Loss is calculated according to Equation (7) using the L1 norm and the label.

ステップＳ６１０では、ステップＳ６０９において計算したＬｏｓｓを用いて、前述した誤差逆伝播法により、ＲＡＭ５０５に格納されているＣＮＮのフィルタ係数を更新する。そして、ステップＳ６１１では、学習回数カウンタｐをインクリメントしてステップＳ６０３に戻る。 In step S610, the CNN filter coefficient stored in the RAM 505 is updated by the error back propagation method described above using the loss calculated in step S609. In step S611, the learning number counter p is incremented, and the process returns to step S603.

＜４．２．（ｂ）識別モード＞
識別モードは、信号処理結果を用いてパターン識別を実行するための動作モードである。パターン識別結果は、例えば、顔認証処理などに利用される。顔認証に使用する顔画像は、以下の手順に従って顔認証処理に先立ち作成され、ＲＡＭ５０５に保存されているものとする。まず、データ保存部５０１に保存されている画像データをＲＡＭ５０５に格納する。次に、ＲＡＭ５０５にある画像データを８ｂｉｔ符号なし輝度画像に変換する。そして、公知の顔検出手法により顔領域を検出し、予め定めたサイズにリサイズした顔画像をＲＡＭ５０５に保存する。このとき、顔認証の結果を表示するための情報として、検出された顔領域の元の画像における位置、大きさを顔画像に関連付けてＲＡＭ５０５に保存する。または、同様の処理を外部装置により実行した結果を通信部５０２を介してＲＡＭ５０５に保存してもよい。 <4.2. (B) Identification mode>
The identification mode is an operation mode for performing pattern identification using the signal processing result. The pattern identification result is used for face authentication processing, for example. Assume that a face image used for face authentication is created prior to face authentication processing according to the following procedure and stored in the RAM 505. First, the image data stored in the data storage unit 501 is stored in the RAM 505. Next, the image data in the RAM 505 is converted into an 8-bit unsigned luminance image. Then, a face area is detected by a known face detection method, and a face image resized to a predetermined size is stored in the RAM 505. At this time, the position and size of the detected face area in the original image are stored in the RAM 505 in association with the face image as information for displaying the result of face authentication. Alternatively, the result of executing the same processing by the external device may be stored in the RAM 505 via the communication unit 502.

ステップＳ６１２では、読み込んだ顔画像に対して前処理を実行する。具体的には、公知のActive Appearance Model、Active Shape Model等を用いて、顔の器官位置を検出し、検出した器官位置に基づいて両目が水平に並び、かつ予め定められたサイズとなるように画像変換する。 In step S612, preprocessing is performed on the read face image. Specifically, the face organ position is detected using a known Active Appearance Model, Active Shape Model, etc., so that both eyes are aligned horizontally and have a predetermined size based on the detected organ position. Convert image.

ステップＳ６０５では、上述の学習モードの場合と同様、前処理した顔画像に対してＣＮＮ処理を実行してＣＮＮ処理結果データを生成する。 In step S605, CNN processing result data is generated by performing CNN processing on the preprocessed face image, as in the case of the learning mode described above.

ステップＳ６１３では、ステップＳ６０５において生成したＣＮＮ処理結果データに対して、式（５）に示した増分符号化処理を実行する。すべての注目画素に対する増分符号を並べたベクトルを識別に使用する特徴量とする。 In step S613, the incremental encoding process shown in Expression (5) is performed on the CNN process result data generated in step S605. A vector in which increment codes for all pixels of interest are arranged is used as a feature amount used for identification.

ステップＳ６１４では、ステップＳ６１３において生成した特徴量の次元を削減する。特徴量から識別に効果的な情報のみを抽出するよう次元を削減することにより、後段の処理における計算量を少なくすることができる。次元削減は、公知のPrincipal Component AnalysisやLocality Preserving Projection等を用い、予め決定しておいた変換行列を用いて変換すれば良い。ここで、変換行列とは次元削減後のベクトル空間を規定する基底ベクトルを並べたものである。変換行列を用いて、特徴量を一列に並べた特徴ベクトルを、元の空間から基底ベクトルが規定する空間へと射影する。変換行列は、ＲＯＭ５０４やデータ保存部５０１にデータあるいはプログラムの一部として格納されており、予めＲＡＭ５０５に読み込んでおく。ＣＰＵ５０３はそれを参照しながら次元削減処理を実行する。 In step S614, the dimension of the feature amount generated in step S613 is reduced. By reducing the dimension so that only information effective for identification is extracted from the feature amount, the amount of calculation in the subsequent processing can be reduced. The dimension reduction may be performed using a known conversion matrix using a known Principal Component Analysis, Locality Preserving Projection, or the like. Here, the transformation matrix is an array of base vectors that define a vector space after dimension reduction. Using the transformation matrix, the feature vector in which the feature amounts are arranged in a line is projected from the original space to the space defined by the base vector. The conversion matrix is stored in the ROM 504 or the data storage unit 501 as part of data or a program, and is read into the RAM 505 in advance. The CPU 503 executes dimension reduction processing while referring to it.

ステップＳ６１５では、ステップＳ６１４で求めた次元削減の後の特徴量を用いて識別処理を実行する。ここでは、次元削減の後の特徴ベクトルのことを”射影ベクトル”と呼ぶ。識別処理では、射影ベクトルと登録データとを照合する。 In step S615, identification processing is executed using the feature amount after dimension reduction obtained in step S614. Here, the feature vector after dimension reduction is called a “projection vector”. In the identification process, the projection vector and registered data are collated.

登録データとは、例えば登録ベクトルと対応する人物ＩＤとから成るデータのことである。なお、好ましくは名前やニックネーム等の文字列データを人物ＩＤに関連付けて記憶する。登録データはデータ保存部５０１に格納されており、顔認証処理に先立ってＲＡＭ５０５に読み込まれる。登録データの作成方法については後述する。 The registration data is data including, for example, a registration vector and a corresponding person ID. Preferably, character string data such as a name and a nickname is stored in association with the person ID. The registered data is stored in the data storage unit 501 and is read into the RAM 505 prior to face authentication processing. A method of creating registration data will be described later.

識別処理では、射影ベクトルと登録ベクトルとの類似度と、予め指定した閾値を基に入力顔画像の人物ＩＤを決定する。ここで、類似度は、次元削減後の特徴空間におけるベクトル間のユークリッド距離として説明する。この場合、距離が小さいほど射影ベクトルと登録ベクトルは似たベクトルであると解釈できるので、距離が小さい登録ベクトル（の基となった画像）ほど入力顔画像に類似しているといえる。 In the identification process, the person ID of the input face image is determined based on the similarity between the projection vector and the registered vector and a threshold value specified in advance. Here, the similarity is described as the Euclidean distance between vectors in the feature space after dimension reduction. In this case, since the projection vector and the registered vector can be interpreted to be similar vectors as the distance is small, it can be said that the registered vector (the image on which the distance is small) is similar to the input face image.

まず、射影ベクトルとすべての登録ベクトルとの距離を計算し、距離が小さい順に登録ベクトルをソートする。次に、射影ベクトルとソート後に先頭にある登録ベクトルとの距離（最小距離）と、予め設定した閾値とを比較する。 First, the distance between the projection vector and all registered vectors is calculated, and the registered vectors are sorted in ascending order of distance. Next, the distance (minimum distance) between the projection vector and the first registered vector after sorting is compared with a preset threshold value.

最小距離が閾値以下である場合、入力顔画像の人物ＩＤは、ソート後に先頭にある登録ベクトルの人物ＩＤとする。一方、最小距離が閾値よりも大きい場合、入力画像の人物は登録されていないと判定する。この場合、例えば予めシステムで定めておいた非登録人物に対応するＩＤ値を入力顔画像の人物ＩＤとする。 When the minimum distance is equal to or smaller than the threshold, the person ID of the input face image is the person ID of the registered vector at the head after sorting. On the other hand, if the minimum distance is greater than the threshold, it is determined that no person in the input image is registered. In this case, for example, an ID value corresponding to a non-registered person determined in advance by the system is set as the person ID of the input face image.

ステップＳ６１６では、Ｓ６１５により得られた人物ＩＤをＲＡＭ５０５に保存されている顔認証の結果を表示するための情報（顔領域の位置・サイズ）に関連付けて保存する。 In step S616, the person ID obtained in S615 is stored in association with information (face area position / size) for displaying the result of face authentication stored in the RAM 505.

以上の処理を元の画像から検出されたすべての顔画像それぞれに対して実行する。すべての顔画像に対する顔認証処理が完了した場合、識別結果を出力する。識別結果出力の一例として、ＲＡＭ５０５に保存されている元画像、顔領域の位置・サイズ情報、顔領域の人物ＩＤを基に顔認証結果画像を作成し、表示部５０７に表示することが考えられる。 The above processing is executed for each of all face images detected from the original image. When face authentication processing for all face images is completed, an identification result is output. As an example of the identification result output, it is conceivable that a face authentication result image is created based on the original image stored in the RAM 505, the position / size information of the face area, and the person ID of the face area, and displayed on the display unit 507. .

図１３は、顔認証の結果画像の一例を示す図である。図１３の例では、各顔領域を矩形の枠で表示し、さらにその上部にその顔領域の人物ＩＤまたは関連付けた文字列を表示している。他の出力方法として、顔領域の位置・サイズ情報、顔領域の人物ＩＤなどを元画像と関連付けてデータ保存部５０１に保存する方法も考えられる。また、データ保存部５０１に保存するのではなく、通信部５０２を介して外部の機器に同様の情報を送信するよう構成しても良い。 FIG. 13 is a diagram illustrating an example of a face authentication result image. In the example of FIG. 13, each face area is displayed as a rectangular frame, and the person ID of the face area or the associated character string is displayed above the face area. As another output method, a method in which the position / size information of the face area, the person ID of the face area, and the like are associated with the original image and stored in the data storage unit 501 is also conceivable. Further, instead of storing the data in the data storage unit 501, the same information may be transmitted to an external device via the communication unit 502.

上述の説明では全ての顔画像について処理が終了した時に結果を出力するとしたが、顔画像１枚に対する処理が完了する度に上記出力処理を実行するように構成してもよい。また、ここでは、ステップＳ６１２において、顔画像中の両目が水平に並び、かつ予め定められたサイズとなるように画像変換するとした。識別精度を高めるためには、このような画像変換を行うことが好ましいが、速度向上やリソース削減を図る必要がある場合などには当該画像変換処理を省略してもよい。 In the above description, the results are output when the processing is completed for all the face images. However, the output processing may be executed every time processing for one face image is completed. Here, in step S612, image conversion is performed so that both eyes in the face image are horizontally aligned and have a predetermined size. In order to increase the identification accuracy, it is preferable to perform such image conversion. However, when it is necessary to improve speed or reduce resources, the image conversion process may be omitted.

＜４．３．（ｃ）登録データ作成モード＞
登録データ作成モードは、信号処理結果を用いて顔認証において使用される基準パターンとなる登録データを作成するための動作モードである。ここで、登録データとは、登録ベクトルと、登録ベクトルに対応する人物ＩＤとから成るデータのことである。 <4.3. (C) Registration data creation mode>
The registration data creation mode is an operation mode for creating registration data to be a reference pattern used in face authentication using a signal processing result. Here, the registration data is data including a registration vector and a person ID corresponding to the registration vector.

ステップＳ６１７では、登録データの作成に使用する顔画像を選択する。まず、データ保存部５０１に保存されている画像データをＲＡＭ５０５に格納する。次に、ＲＡＭ５０５にある画像データから、公知の顔検出手法により顔領域を検出し、検出された顔領域を矩形の枠で示した画像を表示部５０７に表示する。ユーザは、それらの顔領域の中から登録したい顔領域を入力部５０６を通して選択する。選択された顔領域の画像は、予め定めたサイズにリサイズし、ＲＡＭ５０５に保存する。登録したい顔領域が存在しなければ、次の画像を表示する指示を入力する。 In step S617, a face image used to create registration data is selected. First, the image data stored in the data storage unit 501 is stored in the RAM 505. Next, a face area is detected from the image data in the RAM 505 by a known face detection method, and an image showing the detected face area with a rectangular frame is displayed on the display unit 507. The user selects a face area to be registered from the face areas through the input unit 506. The selected face area image is resized to a predetermined size and stored in the RAM 505. If there is no face area to be registered, an instruction to display the next image is input.

上述の識別モードで説明したステップＳ６１２〜Ｓ６１４と同様の処理を通して、選択した顔画像から次元削減後の特徴量を生成する。これを登録ベクトルとしてＲＡＭ５０５に保存する。 A feature amount after dimension reduction is generated from the selected face image through processing similar to steps S612 to S614 described in the identification mode. This is stored in the RAM 505 as a registered vector.

ステップＳ６１８では、登録ベクトルと人物ＩＤを関連付け、データ保存部５０１に格納する。登録ベクトルと人物ＩＤを関連付ける手順を以下に説明する。 In step S618, the registered vector and the person ID are associated and stored in the data storage unit 501. A procedure for associating the registered vector with the person ID will be described below.

まず、既にデータ保存部５０１に格納されている人物ＩＤもしくは人物ＩＤに関連付けられた文字列データを表示部５０７に表示する。好ましくは、データ保存部５０１には登録データと合わせて顔画像を保存しておき、人物ＩＤもしくは文字列データとともに顔画像を表示する。 First, the person ID already stored in the data storage unit 501 or character string data associated with the person ID is displayed on the display unit 507. Preferably, the face image is stored in the data storage unit 501 together with the registration data, and the face image is displayed together with the person ID or character string data.

次に、その中でＳ６１７において選択した顔画像に該当すると思われる人物ＩＤもしくは文字列データをユーザが入力部５０６を介して指定する。そして、指定された人物ＩＤを登録ベクトルに関連付けてデータ保存部５０１に保存する。一方、該当する人物ＩＤもしくは文字列データが存在しなければ、その旨を入力部５０６を介して入力する。この場合、登録ベクトルに新たな人物ＩＤを関連付け、データ保存部５０１に保存する。 Next, the user designates a person ID or character string data that is considered to correspond to the face image selected in S617 through the input unit 506. Then, the specified person ID is stored in the data storage unit 501 in association with the registered vector. On the other hand, if there is no corresponding person ID or character string data, the fact is input via the input unit 506. In this case, a new person ID is associated with the registered vector and stored in the data storage unit 501.

以上説明したとおり、第１実施形態によれば、不連続関数（微分不可能な関数）を連続関数に近似して学習を行う。これにより、ＬＢＰや増分符号などの符号化処理を含む処理系に対し、ＣＮＮによる学習を行うことが可能となる。その結果、画像データ等からパターン識別に利用される特徴量をより好適に抽出可能とすることが可能となる。なお、第１実施形態では、３つのモード（学習モード、識別モード、登録データ作成モード）の処理を１つの装置で行うよう説明したが、それぞれを別体の複数の装置により行っても良い。 As described above, according to the first embodiment, learning is performed by approximating a discontinuous function (non-differentiable function) to a continuous function. Thereby, it is possible to perform learning by CNN for a processing system including encoding processing such as LBP and incremental code. As a result, it is possible to more suitably extract a feature amount used for pattern identification from image data or the like. In the first embodiment, the processing in the three modes (learning mode, identification mode, and registration data creation mode) has been described as being performed by one device, but each may be performed by a plurality of separate devices.

（第２実施形態）
第２実施形態では、主要な処理をハードウェア回路を用いて構成する例について説明する。すなわち、第２実施形態においては、主要な処理をＡＳＩＣやＤＳＰなどのハードウェア回路である信号処理部１４０１で実行する点が主に異なる。 (Second Embodiment)
In the second embodiment, an example in which main processing is configured using a hardware circuit will be described. That is, the second embodiment is mainly different in that main processing is executed by a signal processing unit 1401 that is a hardware circuit such as an ASIC or a DSP.

＜データ処理装置の構成＞
図１４は、第２実施形態におけるデータ処理装置の構成を示す図である。なお、図５と同じ構成要素については同じ番号を付与し、ここでは説明を省略する。なお、以下では、第１実施形態と異なる部分についてのみ説明する。上述のように、第２実施形態では、主要な処理を信号処理部１４０１で実行し、ＣＰＵ５０３はもっぱら信号処理部１４０１の制御に用いられる。 <Configuration of data processing apparatus>
FIG. 14 is a diagram illustrating a configuration of a data processing device according to the second embodiment. In addition, the same number is attached | subjected about the same component as FIG. 5, and description is abbreviate | omitted here. In the following, only the parts different from the first embodiment will be described. As described above, in the second embodiment, main processing is executed by the signal processing unit 1401, and the CPU 503 is exclusively used for control of the signal processing unit 1401.

ＣＰＵ５０３は、まず、ユーザが入力部５０６を通して入力した動作モードを読み込む。ここで、動作モードとは信号処理部１４０１の動作を決める情報である。ここでは、動作モードとして”識別モード”と”学習モード”の２種類があるものとする。 First, the CPU 503 reads an operation mode input by the user through the input unit 506. Here, the operation mode is information that determines the operation of the signal processing unit 1401. Here, it is assumed that there are two types of operation modes, “identification mode” and “learning mode”.

上述した第１実施形態と同様に、”識別モード”とは、入力画像に対して図１に示した信号処理を実行することにより、入力画像からパターン識別に使用する特徴量を抽出する動作モードである。一方、”学習モード”とは、図１に示した信号処理におけるＣＮＮのフィルタ係数を学習する動作モードである。 As in the first embodiment described above, the “identification mode” is an operation mode in which the feature value used for pattern identification is extracted from the input image by performing the signal processing shown in FIG. 1 on the input image. It is. On the other hand, the “learning mode” is an operation mode for learning the CNN filter coefficient in the signal processing shown in FIG.

ＣＰＵ５０３は、入力された動作モードに応じて信号処理部１４０１が必要とするデータをＲＡＭ５０５に格納する。動作モードとして”識別モード”が選択されている場合、ＣＰＵ５０３は、データ保存部５０１に保存されている画像データから、第１実施形態と同様に公知の顔検出手法により顔領域を検出する。さらに両目が水平に並びかつ予め定めたサイズに画像変換した顔画像をＲＡＭ５０５に格納する。一方、動作モードとして”学習モード”が選択されている場合、データ保存部５０１に格納されているすべての学習用の顔画像の中から、選択した２枚の顔画像と顔画像に関連付けられている人物ＩＤをＲＡＭ５０５に格納する。 The CPU 503 stores data required by the signal processing unit 1401 in the RAM 505 according to the input operation mode. When “identification mode” is selected as the operation mode, the CPU 503 detects a face area from the image data stored in the data storage unit 501 by a known face detection method as in the first embodiment. Further, a face image in which both eyes are arranged horizontally and converted into a predetermined size is stored in the RAM 505. On the other hand, when “learning mode” is selected as the operation mode, it is associated with two selected face images and face images from among all the learning face images stored in the data storage unit 501. The person ID is stored in the RAM 505.

ＣＰＵ５０３は、必要なデータをＲＡＭ５０５に格納した後、選択された動作モードの情報を信号処理部１４０１に送信する。そして、信号処理部１４０１から処理が完了したことを示す信号を受信すると、次の動作モードの情報を読み込み、上述した処理を繰り返し実行する。 The CPU 503 stores necessary data in the RAM 505 and then transmits information on the selected operation mode to the signal processing unit 1401. When a signal indicating that the processing is completed is received from the signal processing unit 1401, information on the next operation mode is read, and the above-described processing is repeatedly executed.

＜信号処理部の構成＞
図１５は、第２実施形態における信号処理部の構成を示すブロック図である。信号処理部１４０１は、ユーザから受け付けた動作モードの指定に応じて処理経路を切り替える。これにより、特徴抽出処理と学習を同一の回路で実現する。以下では、信号処理部１４０１内の各ブロックについて図１５を参照して詳細に説明する。 <Configuration of signal processing unit>
FIG. 15 is a block diagram illustrating a configuration of a signal processing unit in the second embodiment. The signal processing unit 1401 switches the processing path according to the designation of the operation mode received from the user. Thereby, feature extraction processing and learning are realized by the same circuit. Hereinafter, each block in the signal processing unit 1401 will be described in detail with reference to FIG.

１５０８は顔画像、ＣＮＮのネットワーク構成、ＣＮＮのフィルタ係数、ｔａｎｈ符号化処理後の近似符号化処理結果データという４種類のデータを格納するメモリであり、公知のＲＡＭ，レジスタなどにより構成する。ここで、顔画像は、第１実施形態と同様に、両目が水平に並び、かつ予め定めたサイズに画像変換したものであるとする。また、ＣＮＮのネットワーク構成、ＣＮＮのフィルタ係数は、例えば、それぞれ第１実施形態の図１０、図１１で説明した形式により格納されるものとする。 A memory 1508 stores four types of data including a face image, a CNN network configuration, a CNN filter coefficient, and approximate encoding processing result data after tanh encoding processing, and includes a known RAM, a register, and the like. Here, it is assumed that the face image is obtained by converting both eyes horizontally and image-converted to a predetermined size, as in the first embodiment. In addition, it is assumed that the CNN network configuration and the CNN filter coefficients are stored, for example, in the formats described in FIGS. 10 and 11 of the first embodiment, respectively.

１５０１は制御部であり、信号処理部１４０１内の各ブロックの動作を制御する。制御部１５０１は、まず、ＣＮＮ処理部１５０２が使用するＣＮＮのネットワーク構成及びフィルタ係数をＲＡＭ５０５から読み込み、メモリ１５０８に格納する。そして、動作モードの指定をＣＰＵ５０３から受信するまで待機し、動作モードの指定を受信することをトリガとして、以下に説明する処理を開始する。 Reference numeral 1501 denotes a control unit that controls the operation of each block in the signal processing unit 1401. First, the control unit 1501 reads the network configuration and filter coefficients of the CNN used by the CNN processing unit 1502 from the RAM 505 and stores them in the memory 1508. And it waits until the designation | designated of operation mode is received from CPU503, The process demonstrated below is started by receiving the designation | designated of operation mode as a trigger.

・識別モード
制御部１５０１は、まず、スイッチ１５０３に対してＣＮＮ処理部１５０２の出力を増分符号化処理部１５０４に入力するように指示する信号を送信する。次に、特徴抽出の対象となる顔画像をＲＡＭ５０５から読み込み、メモリ１５０８に格納する。続いて、制御部１５０１はＣＮＮ処理部１５０２に、メモリ１５０８に格納した顔画像に対するＣＮＮ処理を実行するように指示する信号を送信する。 Identification Mode The control unit 1501 first transmits a signal instructing the switch 1503 to input the output of the CNN processing unit 1502 to the incremental encoding processing unit 1504. Next, a face image to be subjected to feature extraction is read from the RAM 505 and stored in the memory 1508. Subsequently, the control unit 1501 transmits a signal instructing the CNN processing unit 1502 to execute CNN processing on the face image stored in the memory 1508.

増分符号化処理部１５０４から符号化処理結果データを受信すると、そのデータをＲＡＭ５０５に格納する。ここで、通信部５０２を介して外部のデータ処理装置に符号化処理結果データを送信するように構成してもよい。さらに、制御部１５０１は処理が完了したことを示す信号をＣＰＵ５０３に送信し、次の動作モードの指定を受信するまで待機する。 When encoding process result data is received from the incremental encoding processing unit 1504, the data is stored in the RAM 505. Here, the encoding process result data may be transmitted to an external data processing apparatus via the communication unit 502. Further, the control unit 1501 transmits a signal indicating that the processing is completed to the CPU 503 and waits until receiving the designation of the next operation mode.

信号処理回路を顔認証に適用する場合、ＲＡＭ５０５に格納した符号化処理結果データを特徴量として、ＣＰＵ５０３により次元削減処理、識別処理を実行するよう構成する。もちろん、次元削減処理、識別処理を合わせて信号処理部１４０１などのハードウェア回路として構成しても良い。 When the signal processing circuit is applied to face authentication, the CPU 503 is configured to execute dimension reduction processing and identification processing using the encoding processing result data stored in the RAM 505 as a feature amount. Of course, the dimension reduction process and the identification process may be combined to form a hardware circuit such as the signal processing unit 1401.

・学習モード
制御部１５０１は、まず、スイッチ１５０３にＣＮＮ処理部１５０２の出力をｔａｎｈ符号化処理部１５０５に入力するように指示する信号を送信する。次に、ＲＡＭ５０５に格納されている２枚の学習用の顔画像を読み込み、メモリ１５０８に格納する。また、これらの顔画像に関連付けられている人物ＩＤも合わせて読み込む。 Learning Mode The control unit 1501 first transmits a signal instructing the switch 1503 to input the output of the CNN processing unit 1502 to the tanh encoding processing unit 1505. Next, the two learning face images stored in the RAM 505 are read and stored in the memory 1508. Further, the person ID associated with these face images is also read.

制御部１５０１は、続いて、ＣＮＮ処理部１５０２に対してメモリ１５０８に格納した２枚の顔画像のうち、１枚目の顔画像に対するＣＮＮ処理を実行するように指示する信号を送信する。ｔａｎｈ符号化処理部１５０５から近似符号化処理が完了したことを示す信号を受信すると、２枚目の顔画像に対するＣＮＮ処理を実行するように指示する信号をＣＮＮ処理部１５０２に送信する。 Subsequently, the control unit 1501 transmits a signal instructing the CNN processing unit 1502 to execute CNN processing on the first face image among the two face images stored in the memory 1508. When a signal indicating that the approximate encoding process has been completed is received from the tanh encoding processing unit 1505, a signal instructing to execute the CNN process on the second face image is transmitted to the CNN processing unit 1502.

制御部１５０１は、ｔａｎｈ符号化処理部１５０５から２枚目の顔画像に対する近似符号化処理が完了したことを示す信号を受信すると、先だって受信した２つの人物ＩＤを比較する。そして、人物ＩＤが同じであれば”０”、異なるならば”１”というラベルを生成してＬｏｓｓ計算部１５０６に送信する。 When receiving a signal indicating that the approximate encoding process for the second face image has been completed from the tanh encoding processing unit 1505, the control unit 1501 compares the two person IDs received in advance. If the person IDs are the same, a label “0” is generated, and if the person IDs are different, a label “1” is generated and transmitted to the loss calculation unit 1506.

フィルタ係数更新部１５０７からフィルタ係数の更新が完了したことを示す信号を受信すると、更新後のフィルタ係数をメモリ１５０８から読み込み、ＲＡＭ５０５に格納する。そして、処理が完了したことを示す信号をＣＰＵ５０３に送信し、次の動作モードの指定を受信するまで待機する。 When a signal indicating that the filter coefficient update has been completed is received from the filter coefficient update unit 1507, the updated filter coefficient is read from the memory 1508 and stored in the RAM 505. Then, a signal indicating that the process is completed is transmitted to the CPU 503 and waits until the next operation mode designation is received.

再び、図１５に示された各部の説明に戻る。ＣＮＮ処理部１５０２は、制御部１５０１からの指示を受けて、メモリ１５０８に格納されているＣＮＮのネットワーク構成及びフィルタ係数を読み込む。そして、メモリ１５０８に格納された顔画像を参照して、ステップＳ６０５と同様のＣＮＮ処理を実行する。１枚の顔画像に対するＣＮＮ処理が完了すると、次の指示を受けるまで待機する。 Returning to the description of each part shown in FIG. In response to an instruction from the control unit 1501, the CNN processing unit 1502 reads the CNN network configuration and filter coefficients stored in the memory 1508. Then, referring to the face image stored in the memory 1508, the same CNN processing as in step S605 is executed. When the CNN process for one face image is completed, the process waits until the next instruction is received.

スイッチ１５０３は、制御部１５０１からの指示に応じて、ＣＮＮ処理部１５０２が生成したＣＮＮ処理結果データを増分符号化処理部１５０４もしくはｔａｎｈ符号化処理部１５０５のいずれかに振り分けるスイッチである。 The switch 1503 is a switch that distributes the CNN processing result data generated by the CNN processing unit 1502 to either the incremental encoding processing unit 1504 or the tanh encoding processing unit 1505 in accordance with an instruction from the control unit 1501.

増分符号化処理部１５０４は、ＣＮＮ処理結果データに対して、第１実施形態におけるステップＳ６１３と同様の増分符号化処理を実行し、符号化処理結果データを制御部１５０１に送信する。 The incremental encoding processing unit 1504 performs the same incremental encoding processing as that in step S613 in the first embodiment on the CNN processing result data, and transmits the encoding processing result data to the control unit 1501.

ここでは、先に述べたように入力画像は予め定められたサイズに変換された顔画像である。また、ＣＮＮのネットワーク構成及び空間フィルタのサイズは予め定めておく。したがって、ＣＮＮ処理部１５０２が生成するＣＮＮ処理結果データのサイズは予め計算することが可能である。ここでは、増分符号化処理部１５０４は内部にレジスタ等を保持するよう構成し、制御部１５０１がそのレジスタにＣＮＮ出力結果データサイズに関する情報を予め設定しておくものとする。増分符号化処理部１５０４は、レジスタ等に保存したＣＮＮ処理結果データのサイズ分の画像に対する符号化処理が完了すると、符号化処理が完了したことを示す信号を制御部１５０１に送信する。 Here, as described above, the input image is a face image converted to a predetermined size. The network configuration of the CNN and the size of the spatial filter are determined in advance. Therefore, the size of the CNN processing result data generated by the CNN processing unit 1502 can be calculated in advance. Here, it is assumed that the incremental encoding processing unit 1504 is configured to hold a register or the like therein, and the control unit 1501 sets information related to the CNN output result data size in the register in advance. The incremental encoding processing unit 1504 transmits a signal indicating that the encoding process has been completed to the control unit 1501 when the encoding process for the image of the size of the CNN processing result data stored in the register or the like is completed.

ｔａｎｈ符号化処理部１５０５は、ＣＮＮ処理結果データに対して第１実施形態におけるステップＳ６０６と同様の近似符号化処理を実行し、近似符号化処理結果データをメモリ１５０８に格納する。ここでは、ｔａｎｈ符号化処理部１５０５はレジスタ等を保持しており、制御部１５０１はそこにＣＮＮ出力結果データサイズに関する情報を予め設定しておくものとする。ｔａｎｈ符号化処理部１５０５は、レジスタ等に保存したＣＮＮ処理結果データのサイズ分の画像に対する近似符号化処理が完了すると、近似符号化処理が完了したことを示す信号を制御部１５０１に送信する。 The tanh encoding processing unit 1505 performs the approximate encoding process similar to step S606 in the first embodiment on the CNN process result data, and stores the approximate encoding process result data in the memory 1508. Here, it is assumed that the tanh encoding processing unit 1505 holds a register and the like, and the control unit 1501 presets information related to the CNN output result data size therein. The tanh encoding processing unit 1505 transmits a signal indicating that the approximate encoding process is completed to the control unit 1501 when the approximate encoding process is completed on the image of the size of the CNN processing result data stored in the register or the like.

Ｌｏｓｓ計算部１５０６は、制御部１５０１からラベル情報を受信すると、メモリ１５０８に格納されている近似符号化処理結果データとラベル情報とを用いて、第１実施形態におけるステップＳ６０９と同様のＬｏｓｓを計算する。そして、結果をフィルタ係数更新部１５０７に送信する。 When the loss calculation unit 1506 receives the label information from the control unit 1501, the loss calculation unit 1506 calculates the loss similar to that in step S609 in the first embodiment, using the approximate encoding processing result data and the label information stored in the memory 1508. To do. Then, the result is transmitted to the filter coefficient update unit 1507.

フィルタ係数更新部１５０７は、Ｌｏｓｓ計算部１５０６からＬｏｓｓを受信すると、第１実施形態におけるステップＳ６１０と同様に、メモリ１５０８に格納されているＣＮＮのフィルタ係数を更新する。フィルタ係数更新部１５０７は、フィルタ係数の更新が完了すると、更新完了を示す信号を制御部１５０１に送信する。なお、ここではメモリ１５０８が信号処理回路内に含まれるような構成を示したが、信号処理回路と接続した外部のＲＡＭ等を使用するように構成しても構わない。 When receiving the loss from the loss calculation unit 1506, the filter coefficient update unit 1507 updates the filter coefficient of the CNN stored in the memory 1508, similarly to step S610 in the first embodiment. When the filter coefficient update unit 1507 completes the update of the filter coefficient, the filter coefficient update unit 1507 transmits a signal indicating the completion of the update to the control unit 1501. Note that although a configuration in which the memory 1508 is included in the signal processing circuit is shown here, an external RAM or the like connected to the signal processing circuit may be used.

以上説明したように、第２実施形態では、図１に示す信号処理の主要部分をハードウェア回路として実現する例について説明した。信号処理の主要部分をハードウェア回路により構成することで一般に処理の高速化が達成される。なお、信号処理部１４０１の全てをハードウェア回路とする必要は無く、一部の処理についてはソフトウェア処理により実現しても良い。 As described above, in the second embodiment, the example in which the main part of the signal processing shown in FIG. 1 is realized as a hardware circuit has been described. Generally, high-speed processing is achieved by configuring the main part of signal processing with hardware circuits. Note that it is not necessary for all of the signal processing unit 1401 to be hardware circuits, and some of the processing may be realized by software processing.

（第３実施形態）
第３実施形態では、学習処理とパターン識別処理を別体の機器で実行する信号処理システムについて説明する。図１６は、第３実施形態によるデータ処理システムの構成を示す図である。図１６に示したデータ処理システムでは、サーバ装置１６０１が学習処理を実行し、クライアント装置１６０２がパターン識別処理を実行するように構成されている。なお、サーバ装置１６０１は、好適には公知のクラウドコンピューティングのような分散処理システムにより構成するとよい。また、図１６では１台のクライアント装置のみを示しているが、１台のサーバ装置に対し複数のクライアント装置が接続される構成としても良い。 (Third embodiment)
In the third embodiment, a signal processing system in which learning processing and pattern identification processing are executed by separate devices will be described. FIG. 16 is a diagram showing a configuration of a data processing system according to the third embodiment. In the data processing system shown in FIG. 16, the server device 1601 executes learning processing, and the client device 1602 executes pattern identification processing. Note that the server device 1601 is preferably configured by a distributed processing system such as publicly known cloud computing. In FIG. 16, only one client device is shown, but a configuration may be adopted in which a plurality of client devices are connected to one server device.

＜システム内の各装置の構成＞
・サーバ装置
サーバ装置１６０１は、論理的には、図５に示したデータ処理装置の構成と同一の構成で良く、図６に示した処理を実行する。ただし、第３実施形態ではサーバ装置１６０１は学習処理のみを実行するため、識別モードや登録データ作成モードの処理を省略するよう構成しても構わない。なお、サーバ装置１６０１は、学習処理を行うため、ＣＮＮ処理部（第２の生成手段）を有する。 <Configuration of each device in the system>
Server Device The server device 1601 may logically have the same configuration as that of the data processing device shown in FIG. 5, and executes the processing shown in FIG. However, in the third embodiment, since the server device 1601 executes only the learning process, the process in the identification mode or the registration data creation mode may be omitted. Note that the server device 1601 includes a CNN processing unit (second generation unit) for performing learning processing.

サーバ装置１６０１は、自身の持つ通信部５０２を介してクライアント装置１６０２から学習処理のリクエスト信号を受信すると、第１実施形態におけるステップＳ６０２〜Ｓ６１１に示される学習処理を実行するよう構成する。 When the server apparatus 1601 receives a learning process request signal from the client apparatus 1602 via its own communication unit 502, the server apparatus 1601 is configured to execute the learning process shown in steps S602 to S611 in the first embodiment.

ここで、学習処理のリクエスト信号にＣＮＮのフィルタ係数を含めるように構成することも可能である。この場合、受信したフィルタ係数をＲＡＭ５０５に格納し、そのフィルタ係数を初期値として学習処理を実行するように構成する。また、学習処理のリクエスト信号に人物ＩＤを含めるように構成することも可能である。この場合、人物ＩＤはサーバ装置１６０１とクライアント装置１６０２で共有しており、サーバ装置１６０１は受信した人物ＩＤの顔画像に対する識別精度が高くなるような学習処理を実行するように構成する。さらに、学習処理のリクエスト信号に顔画像を含めるように構成することも可能である。この場合、サーバ装置１６０１は受信した顔画像に対する識別精度が高くなるような学習処理を実行するように構成する。 Here, the CNN filter coefficient may be included in the learning process request signal. In this case, the received filter coefficient is stored in the RAM 505, and the learning process is executed with the filter coefficient as an initial value. It is also possible to configure the person ID to be included in the request signal for the learning process. In this case, the server device 1601 and the client device 1602 share the person ID, and the server device 1601 is configured to execute a learning process that increases the identification accuracy for the face image of the received person ID. Furthermore, it is also possible to configure to include a face image in the request signal for the learning process. In this case, the server apparatus 1601 is configured to execute a learning process that increases the identification accuracy for the received face image.

サーバ装置１６０１は、学習処理が完了すると学習後（更新後）のフィルタ係数を、通信部５０２を介してクライアント装置１６０２に送信する。 When the learning process is completed, the server device 1601 transmits the learned (updated) filter coefficient to the client device 1602 via the communication unit 502.

なお、ここでは、サーバ装置１６０１は、クライアント装置１６０２からのリクエスト信号の受信をトリガに学習処理を行うよう説明したが、サーバ装置１６０１が自律的に学習処理を行うように構成してもよい。この場合、例えば最後の学習処理から一定の時間が経過したことをトリガとして学習処理を実行するように構成する。そして、学習処理が終了した時点で、サーバ装置１６０１がクライアント装置１６０２に対して通信を開始し、学習結果のフィルタ係数を送信するように構成すればよい。あるいは、クライアント装置１６０２がフィルタ係数を要求してきた場合に、サーバ装置１６０１が、その時点での最新のフィルタ係数を送信するように構成してもよい。 Here, the server apparatus 1601 has been described as performing a learning process triggered by reception of a request signal from the client apparatus 1602, but the server apparatus 1601 may be configured to autonomously perform a learning process. In this case, for example, the learning process is configured to be triggered by a lapse of a certain time from the last learning process. Then, when the learning process ends, the server device 1601 may start communication with the client device 1602 and transmit the learning result filter coefficient. Alternatively, when the client device 1602 requests a filter coefficient, the server device 1601 may be configured to transmit the latest filter coefficient at that time.

・クライアント装置
図１７は、第３実施形態におけるクライアント装置の構成を示す図である。なお、図５と同じ構成要素については同じ番号を付与し、ここでは説明を省略する。１７０１は特徴抽出部であり、第１及び第２実施形態において説明した識別モードにおける処理と同様の処理を実行する。なお、クライアント装置１６０２は、識別処理を行うため、ＣＮＮ処理部（第１の生成手段）を有する。 Client Device FIG. 17 is a diagram illustrating a configuration of a client device according to the third embodiment. In addition, the same number is attached | subjected about the same component as FIG. 5, and description is abbreviate | omitted here. Reference numeral 1701 denotes a feature extraction unit, which executes processing similar to the processing in the identification mode described in the first and second embodiments. Note that the client device 1602 includes a CNN processing unit (first generation unit) for performing identification processing.

図１８は、第３実施形態における特徴抽出部の構成を示すブロック図である。図１５に示した信号処理部１４０１の構成に対し、特徴抽出部１７０１は学習モードにおける処理のために必要な構成要素を除いた構成となる。クライアント装置１６０２は、ＲＡＭ５０５に格納されている顔画像に対して特徴抽出部１７０１を用いて特徴抽出処理を実行し、生成された符号化処理結果データを用いてＣＰＵ５０３によりパターン識別処理を実行する。 FIG. 18 is a block diagram illustrating a configuration of a feature extraction unit according to the third embodiment. In contrast to the configuration of the signal processing unit 1401 illustrated in FIG. 15, the feature extraction unit 1701 has a configuration in which components necessary for processing in the learning mode are excluded. The client device 1602 performs feature extraction processing on the face image stored in the RAM 505 using the feature extraction unit 1701, and executes pattern identification processing by the CPU 503 using the generated encoding processing result data.

＜システムの処理フロー＞
ＣＮＮのフィルタ係数を更新する場合には、クライアント装置１６０２は、通信部５０２を介して、サーバ装置１６０１に学習処理のリクエスト信号を送信する。あるいは、クライアント装置１６０２内のＲＡＭ５０５に格納されているＣＮＮのフィルタ係数を送信する。またあるいは登録データに含まれる顔画像もしくは人物ＩＤを送信する。この場合、フィルタ係数もしくは顔画像もしくは人物ＩＤの送信を学習処理のリクエスト信号とするように構成してもよい。 <System processing flow>
When updating the filter coefficient of the CNN, the client device 1602 transmits a learning process request signal to the server device 1601 via the communication unit 502. Alternatively, the CNN filter coefficient stored in the RAM 505 in the client device 1602 is transmitted. Alternatively, the face image or person ID included in the registration data is transmitted. In this case, transmission of a filter coefficient, a face image, or a person ID may be used as a learning process request signal.

クライアント装置１６０２は、サーバ装置１６０１に学習処理のリクエスト信号を送信した後、サーバ装置１６０１からフィルタ係数を受信するまで待機するように構成すればよい。あるいは、フィルタ係数の受信を待機せず、更新前のＣＮＮのフィルタ係数を用いてパターン識別処理を継続するよう構成することもできる。この場合、例えば、学習処理が完了したタイミングなどで、サーバ装置１６０１がクライアント装置１６０２に対し通信を開始するように構成すればよい。あるいは、クライアント装置１６０２側でのユーザからの指示やタイマ割り込み等をトリガとして、クライアント装置１６０２がサーバ装置１６０１に学習処理が完了したかどうか問い合わせ、学習処理が完了していればフィルタ係数を受信するように構成してもよい。 The client device 1602 may be configured to wait until a filter coefficient is received from the server device 1601 after transmitting a request signal for learning processing to the server device 1601. Alternatively, the pattern identification process can be continued using the filter coefficient of the CNN before update without waiting for reception of the filter coefficient. In this case, for example, the server device 1601 may be configured to start communication with the client device 1602 at the timing when the learning process is completed. Alternatively, the client device 1602 inquires of the server device 1601 whether or not the learning process has been completed by using an instruction from the user on the client device 1602 side or a timer interrupt as a trigger, and receives the filter coefficient if the learning process has been completed. You may comprise as follows.

クライアント装置１６０２は、サーバ装置１６０１から通信部５０２を介して更新後のフィルタ係数を受信すると、更新後のフィルタ係数をＲＡＭ５０５に格納する。 When the client device 1602 receives the updated filter coefficient from the server device 1601 via the communication unit 502, the client device 1602 stores the updated filter coefficient in the RAM 505.

以上説明したように、第２実施形態では、クライアント装置における学習処理を省略し、サーバ装置において学習処理を実行することにより、クライアント装置における実装負荷（機器コストなど）を削減することができる。また、サーバ装置１６０１に複数のクライアント装置１６０２が接続するように構成した場合、サーバ装置１６０１で学習されたフィルタ係数を複数のクライアント間で容易に共有できるという利点がある。つまり、より効率的に学習が行われ、適切なフィルタをより速く決定することが可能となる。更に、パターン識別処理を実行する複数のクライアント装置間で同一のフィルタを共有することが出来、複数のクライアント装置間で同一のパターン識別結果を得ることが可能となる。 As described above, in the second embodiment, the learning process in the client device is omitted, and the learning process is executed in the server device, whereby the mounting load (equipment cost, etc.) in the client device can be reduced. Further, when a plurality of client devices 1602 are connected to the server device 1601, there is an advantage that the filter coefficient learned by the server device 1601 can be easily shared among a plurality of clients. That is, learning can be performed more efficiently and an appropriate filter can be determined more quickly. Furthermore, the same filter can be shared among a plurality of client apparatuses that execute pattern identification processing, and the same pattern identification result can be obtained between the plurality of client apparatuses.

（変形例）
上述の実施形態では、処理対象のデータとして２次元の画像データ（顔画像）に適用する例を想定して説明した。しかし、音声信号などの１次元のデータに対しても適用可能であり、また、３次元以上のデータに対しても同様に適用可能である。 (Modification)
In the above-described embodiment, the description has been made assuming an example in which the data to be processed is applied to two-dimensional image data (face image). However, the present invention can be applied to one-dimensional data such as an audio signal, and can be similarly applied to three-dimensional or more data.

また、上述の実施形態では、ＣＮＮ処理結果データ内の注目画素と参照画素の比較結果に対する不連続関数処理としてステップ関数を使用する例について説明したが、ステップ関数に限るわけではなく、パルス関数を利用することも出来る。ステップ関数では、比較対象の２つの値の相対的な大小に応じて２値化（”０”又は”１”）する。それに対し、パルス関数では、比較対象の２つの値の差の絶対値に応じて２値化（”０”又は”１”）する。図１９は、不連続関数であるパルス関数と当該パルス関数を近似するガウス関数を示す図である。 In the above-described embodiment, the example in which the step function is used as the discontinuous function processing for the comparison result between the target pixel and the reference pixel in the CNN processing result data has been described. However, the step function is not limited to the step function. It can also be used. In the step function, binarization (“0” or “1”) is performed according to the relative magnitude of the two values to be compared. On the other hand, the pulse function binarizes (“0” or “1”) according to the absolute value of the difference between the two values to be compared. FIG. 19 is a diagram illustrating a pulse function that is a discontinuous function and a Gaussian function that approximates the pulse function.

更に、上述の実施形態では、ＣＮＮ処理結果データ内の１つの注目画素と１つの参照画素のそれぞれの画素値を比較する例について説明したが、１つの画素の画素値に限定されない。例えば、１つの画素の画素値の代わりに（ｍ×ｍ）画素の領域内の画素値の平均値を使用するよう構成しても良い。 Furthermore, in the above-described embodiment, the example in which the pixel values of one target pixel and one reference pixel in the CNN processing result data are compared has been described, but the present invention is not limited to the pixel value of one pixel. For example, instead of the pixel value of one pixel, an average value of pixel values in an area of (m × m) pixels may be used.

（その他の実施例）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other examples)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed.

Claims

Generating means for generating processing result data by executing spatial filtering on the input data;
First encoding means for executing a predetermined encoding process using a discontinuous function on the processing result data to generate encoding process result data;
A second encoding means for executing an approximate encoding process in which the discontinuous function is approximated by a continuous function with respect to the processing result data, and generating approximate encoding process result data;
Updating means for updating a weighting factor of the spatial filter processing based on the approximate encoding processing result data;
Providing the processing result data to the second encoding means when updating the weighting coefficient by the updating means, and providing the processing result data to the first encoding means in other cases; Control means for controlling
A signal processing apparatus comprising:

The signal processing apparatus according to claim 1, wherein the other cases include at least one of a case of performing pattern identification and a case of registering a reference pattern used for the pattern identification.

The signal processing apparatus according to claim 1, further comprising pattern identification means for performing pattern identification based on the encoding processing result data.

The predetermined encoding process compares the value in the region of interest in the processing result data with the value in one or more reference regions at a predetermined relative position with respect to the region of interest, and compares the comparison processing result data. 4. The signal processing apparatus according to claim 1, wherein the signal processing apparatus is a process that generates and performs an operation using the discontinuous function on the comparison processing result data. 5.

5. The signal processing apparatus according to claim 1, wherein the spatial filter process is a CNN (Convolutional Neural Networks) process, and the weighting coefficient is a filter coefficient in the CNN process. 6.

The signal processing apparatus according to claim 5, wherein the updating unit updates a weighting coefficient of the spatial filter processing using an error back propagation method.

The signal processing apparatus according to claim 1, wherein the discontinuous function is a step function, and the continuous function is a tanh function or a sigmoid function.

The signal processing apparatus according to claim 1, wherein the discontinuous function is a pulse function, and the continuous function is a Gaussian function.

A generation step of generating processing result data by executing spatial filtering on the input data;
A first encoding step of executing a predetermined encoding process using a discontinuous function on the processing result data to generate encoding process result data;
A second encoding step of performing an approximate encoding process in which the discontinuous function is approximated by a continuous function on the processing result data to generate approximate encoding process result data;
An update step of updating a weighting coefficient of the spatial filter processing based on the approximate encoding processing result data;
A control step for performing control so that the second encoding step is executed when the weighting factor is updated, and the first encoding step is executed in other cases;
A signal processing method characterized by comprising:

The program for functioning a computer as each means of the signal processing apparatus of any one of Claims 1 thru | or 7.

A signal processing system including a client device and a server device,
The client device is
First generation means for generating processing result data by executing spatial filtering on the input data;
First encoding means for executing predetermined encoding processing using a discontinuous function on the processing result data generated by the first generation means to generate encoding processing result data;
Pattern identifying means for performing pattern identification based on the encoding processing result data;
Have
The server device
Second generation means for generating processing result data by executing the spatial filter processing on input data;
Second encoding means for executing approximate encoding processing for approximating the discontinuous function with a continuous function on the processing result data generated by the second generating means to generate approximate encoding processing result data; ,
Updating means for updating a weighting factor of the spatial filter processing of the second generation means based on the approximate encoding processing result data;
In addition,
The server device includes a transmission unit that transmits the weighting factor updated by the updating unit to the client device, and the client device uses the weighting factor received from the server device as a spatial filter of the first generation unit. A signal processing system comprising setting means for setting as a processing weight coefficient.