JP5772442B2

JP5772442B2 - Image processing apparatus and image processing program

Info

Publication number: JP5772442B2
Application number: JP2011207382A
Authority: JP
Inventors: 関野　雅則; 雅則関野
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2011-09-22
Filing date: 2011-09-22
Publication date: 2015-09-02
Anticipated expiration: 2031-09-22
Also published as: JP2013069132A

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program.

画像を認識する技術がある。
これに関連する技術として、例えば、特許文献１には、高精度かつ信頼性の高い光学的文字認識を行うことを目的とし、局所特徴抽出のための並列束縛特徴検出を行う層を複数個有し、次元性低減のための完全接続層を複数個有する階層的ネットワークによって実現され、文字分類も完全接続層において実行され、並列束縛特徴検出の各々の層は、複数個の束縛特徴マップ及び対応する複数個のカーネルよりなり、所定のカーネルが直接単一の束縛特徴マップに関連し、各層間でのアンダーサンプリングが実行されることが開示されている。 There is a technology for recognizing images.
As a technology related to this, for example, Patent Document 1 has a plurality of layers for performing parallel bound feature detection for local feature extraction for the purpose of performing highly accurate and reliable optical character recognition. And realized by a hierarchical network having a plurality of fully connected layers for reducing dimensionality, character classification is also performed in the fully connected layer, and each layer of parallel bound feature detection includes a plurality of bound feature maps and corresponding features. It is disclosed that a predetermined kernel is directly associated with a single constrained feature map, and undersampling between layers is performed.

また、例えば、特許文献２には、文字及び数字の識別に係る有用な決定を、ニューラルネットワーク技術を用いて行うために、多様な形態及び大きさの各々の文字あるいは数字を正確に認識することをニューラルネットワークに“学習”させることを目的とし、多くの文字及び数字の認識における正確さは、ニューラルネットワークが各々の文字あるいは数字の“不変”なカテゴリーに属する性質を識別するように学習されている場合には、ほとんど犠牲にならないということが了解され、そこで、ニューラルネットワークに対して識別された不変性に係る形態、位置、大きさ等の全ての段階を認識することを要求する代わりに、はるかに少ない数のサンプルデータ入力及び未知の文字あるいは数字に関連する情報のわずかの処理のみを要求するような、不変セグメントに係る一般化されかつ制限された記述が用いられることが開示されている。 Also, for example, Patent Document 2 accurately recognizes each character or number of various forms and sizes in order to make a useful decision regarding the identification of characters and numbers using neural network technology. The accuracy in recognizing many letters and numbers is learned so that the neural network can identify the properties belonging to the "invariant" category of each letter or number. It is understood that there is almost no sacrifice, so instead of requiring the neural network to recognize all the stages of invariance identified, such as form, position, size, etc. Requires a much smaller number of sample data entries and little processing of information related to unknown letters or numbers So that a, generalized and restricted described according to the invariant segment that has been disclosed for use.

また、例えば、特許文献３には、２次元物体だけでなく、３次元的な回転、大きさ及び照明条件が変化する３次元物体をも認識することができるパターン認識方法を提供することを課題とし、重み配分及びプーリングステージ等の要素は先行手法と同じだが、階層ネットワークの中間ステージで最適の特徴検出ユニットを決定する新しい方法に着目した技術を提供し、また、新しい特徴検出ステージを（増分的に）学習し、複雑なパターン認識に要する手間を従来技術に比してかなり削減する、統計的手段を使用した、階層ネットワークを訓練する新しい手法を提案し、この学習は教師なし学習なので、教師信号は不要であり、特定の認識シナリオのために認識アーキテクチャを予め構成することができ、教師付き学習による訓練を要するのは最後の分類ステップのみであり、これにより認識作業への適用においてかなりの手間が削減されることが開示されている。 Moreover, for example, Patent Document 3 provides a pattern recognition method capable of recognizing not only a two-dimensional object but also a three-dimensional object whose three-dimensional rotation, size, and illumination conditions change. The elements such as weight distribution and pooling stage are the same as the previous method, but it provides a technology that focuses on a new method for determining the optimal feature detection unit at the intermediate stage of the hierarchical network, and also introduces a new feature detection stage (incremental). A new approach to training hierarchical networks using statistical means that learns and significantly reduces the effort required to recognize complex patterns compared to the prior art, and this learning is unsupervised, No teacher signal is required, the recognition architecture can be pre-configured for specific recognition scenarios, and training with supervised learning is required Is only the final classification step, it is disclosed that a considerable effort is reduced in application to thereby recognition task.

特開平０５−００６４６３号公報JP 05-006463 A 特開平０７−０６４９４１号公報Japanese Patent Laid-Open No. 07-064941 特開２００２−３７３３３３号公報JP 2002-373333 A

本発明は、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理を原因とする識別誤りを減少させるようにした画像処理装置及び画像処理プログラムを提供することを目的としている。 The present invention provides an image processing apparatus and an image processing program that reduce identification errors caused by normalization processing when identifying an image, as compared to the case where the present configuration is not provided. The purpose is to do.

かかる目的を達成するための本発明の要旨とするところは、次の各項の発明に存する。
請求項１の発明は、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、正規化処理を行う複数の正規化処理手段と、前記正規化処理手段による処理結果に対して、サブサンプリング処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を具備し、前記正規化処理手段は、２つの前記整流処理手段による処理結果間の差を、該処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理装置である。 The gist of the present invention for achieving the object lies in the inventions of the following items.
According to the first aspect of the present invention, an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and a processing result by the convolution processing means. On the other hand, a plurality of rectification processing means for performing rectification processing, a plurality of normalization processing means for performing normalization processing on a processing result by the rectification processing means, and a processing result by the normalization processing means, A plurality of feature extraction means for extracting feature quantities of the image by performing sub-sampling processing; and an identification means for identifying the image based on the feature quantities extracted by the feature extraction means. The image processing means normalizes the difference between the processing results of the two rectification processing means by dividing the difference based on the value based on the processing results as a divisor. It is a device.

請求項２の発明は、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を具備し、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、２つの前記整流処理手段による処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理装置である。 The invention of claim 2 includes an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and a processing result by the convolution processing means. On the other hand, a plurality of rectification processing means for performing rectification processing, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means, A plurality of feature extraction means for extracting feature amounts of the image by performing normalization; and an identification means for identifying the image based on the feature amounts extracted by the feature extraction means. The processing unit divides the difference between the processing results by the two sub-sampling processing units by a value based on the processing results by the two rectification processing units. An image processing apparatus characterized by normalized by dividing by.

請求項３の発明は、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を具備し、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、該処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理装置である。 According to a third aspect of the present invention, an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and a processing result by the convolution processing means. On the other hand, a plurality of rectification processing means for performing rectification processing, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means, A plurality of feature extraction means for extracting feature amounts of the image by performing normalization; and an identification means for identifying the image based on the feature amounts extracted by the feature extraction means. The processing unit divides the difference between the processing results of the two sub-sampling processing units by using a value based on the processing result as a divisor. An image processing apparatus which is characterized in that-normalized.

請求項４の発明は、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を具備し、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、該処理結果の和を除数として除算することによって正規化することを特徴とする画像処理装置である。 According to a fourth aspect of the present invention, there is provided an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and a processing result by the convolution processing means. On the other hand, a plurality of rectification processing means for performing rectification processing, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means, A plurality of feature extraction means for extracting feature amounts of the image by performing normalization; and an identification means for identifying the image based on the feature amounts extracted by the feature extraction means. The normalization processing means normalizes the difference between the processing results of the two sub-sampling processing means by dividing the difference between the sums of the processing results as a divisor. It is an image processing apparatus according to claim.

請求項５の発明は、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、複数の前記整流処理手段による処理結果を加算する複数の加算手段と、前記加算手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、平均化処理を行う複数の平均化処理手段と、前記平均化処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を具備し、前記整流処理手段、前記加算手段、前記サブサンプリング処理手段、前記平均化処理手段による処理は、重み付き一般化平均処理であり、前記正規化処理手段は、２つの前記平均化処理手段による処理結果間の差を、該処理結果の和を除数として除算することによって正規化することを特徴とする画像処理装置である。 The invention of claim 5 includes an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and a processing result by the convolution processing means. On the other hand, a plurality of rectification processing means for performing rectification processing, a plurality of addition means for adding the processing results by the plurality of rectification processing means, and a plurality of sub-sampling processes for processing results by the addition means Sampling processing means, a plurality of averaging processing means for performing averaging processing on the processing results by the sub-sampling processing means, and normalization processing on the processing results by the averaging processing means, A plurality of feature extraction means for extracting the feature quantity of the image; and an identification means for identifying the image based on the feature quantity extracted by the feature extraction means. The processing by the rectification processing means, the adding means, the sub-sampling processing means, and the averaging processing means is a weighted generalized averaging process, and the normalization processing means includes two of the averaging processing means The image processing apparatus is characterized by normalizing the difference between the processing results obtained by subtracting the sum of the processing results as a divisor.

請求項６の発明は、前記整流処理手段が行う整流処理は、入力の絶対値に対してｒ乗を行う処理（ｒは正の実数）であり、前記平均化処理手段が行う平均化処理は、ｒ’乗を行う処理であることを特徴とする請求項５に記載の画像処理装置である。 According to a sixth aspect of the present invention, the rectification processing performed by the rectification processing means is a process (r is a positive real number) for performing an r-th power on an input absolute value, and the averaging processing performed by the averaging processing means is The image processing apparatus according to claim 5, wherein the image processing apparatus performs r ′ power.

請求項７の発明は、コンピュータを、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、正規化処理を行う複数の正規化処理手段と、前記正規化処理手段による処理結果に対して、サブサンプリング処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段として機能させ、前記正規化処理手段は、２つの前記整流処理手段による処理結果間の差を、該処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理プログラムである。 According to a seventh aspect of the present invention, there is provided an image receiving means for receiving an image, a plurality of convolution processing means for performing a convolution process on the image received by the image receiving means, and the convolution processing means. A plurality of rectification processing means for performing rectification processing on the processing result, a plurality of normalization processing means for performing normalization processing on a processing result by the rectification processing means, and a processing result by the normalization processing means. On the other hand, by performing sub-sampling processing, a plurality of feature extraction means for extracting the feature quantity of the image, and an identification means for identifying the image based on the feature quantity extracted by the feature extraction means. The normalization processing means normalizes the difference between the processing results of the two rectification processing means by dividing the value based on the processing results as a divisor. An image processing program characterized and.

請求項８の発明は、コンピュータを、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段として機能させ、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、２つの前記整流処理手段による処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理プログラムである。 According to an eighth aspect of the present invention, the computer includes an image receiving unit that receives an image, a plurality of convolution processing units that perform a convolution process on the image received by the image receiving unit, and the convolution processing unit. A plurality of rectification processing means for performing rectification processing on the processing result, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means On the other hand, by performing normalization processing, a plurality of feature extraction means for extracting the feature amount of the image, and an identification means for identifying the image based on the feature amount extracted by the feature extraction means. The normalization processing means calculates the difference between the processing results of the two sub-sampling processing means by the two rectification processing means. An image processing program characterized by normalizing by dividing a value based on results as divisor.

請求項９の発明は、コンピュータを、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段として機能させ、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、該処理結果に基づいた値を除数として除算することによって正規化することを特徴とする画像処理プログラムである。 According to a ninth aspect of the present invention, there is provided a computer comprising: an image receiving unit that receives an image; a plurality of convolution processing units that perform a convolution process on the image received by the image receiving unit; and the convolution processing unit. A plurality of rectification processing means for performing rectification processing on the processing result, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means On the other hand, by performing normalization processing, a plurality of feature extraction means for extracting the feature amount of the image, and an identification means for identifying the image based on the feature amount extracted by the feature extraction means. The normalization processing means uses the difference between the processing results of the two sub-sampling processing means as a divisor with a value based on the processing results. An image processing program characterized by normalized by dividing.

請求項１０の発明は、コンピュータを、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段として機能させ、前記正規化処理手段は、２つの前記サブサンプリング処理手段による処理結果間の差を、該処理結果の和を除数として除算することによって正規化することを特徴とする画像処理プログラムである。 According to a tenth aspect of the present invention, the computer includes an image receiving unit that receives an image, a plurality of convolution processing units that perform a convolution process on the image received by the image receiving unit, and the convolution processing unit. A plurality of rectification processing means for performing rectification processing on the processing result, a plurality of sub-sampling processing means for performing sub-sampling processing on a processing result by the rectification processing means, and a processing result by the sub-sampling processing means On the other hand, by performing normalization processing, a plurality of feature extraction means for extracting the feature amount of the image, and an identification means for identifying the image based on the feature amount extracted by the feature extraction means. The normalization processing means divides the difference between the processing results of the two sub-sampling processing means by using the sum of the processing results as a divisor. An image processing program characterized by normalized by Rukoto.

請求項１１の発明は、コンピュータを、画像を受け付ける画像受付手段と、前記画像受付手段によって受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、複数の前記整流処理手段による処理結果を加算する複数の加算手段と、前記加算手段による処理結果に対して、サブサンプリング処理を行う複数のサブサンプリング処理手段と、前記サブサンプリング処理手段による処理結果に対して、平均化処理を行う複数の平均化処理手段と、前記平均化処理手段による処理結果に対して、正規化処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段として機能させ、前記整流処理手段、前記加算手段、前記サブサンプリング処理手段、前記平均化処理手段による処理は、重み付き一般化平均処理であり、前記正規化処理手段は、２つの前記平均化処理手段による処理結果間の差を、該処理結果の和を除数として除算することによって正規化することを特徴とする画像処理プログラムである。 According to an eleventh aspect of the present invention, the computer includes an image receiving unit that receives an image, a plurality of convolution processing units that perform a convolution process on the image received by the image receiving unit, and the convolution processing unit. A plurality of rectification processing means for performing a rectification process on the processing result, a plurality of addition means for adding the processing results by the plurality of rectification processing means, and a sub-sampling process on the processing result by the addition means A plurality of sub-sampling processing means, a plurality of averaging processing means for performing an averaging process on the processing result by the sub-sampling processing means, and a normalization process for the processing result by the averaging processing means A plurality of feature extraction means for extracting the feature quantity of the image, and the image based on the feature quantity extracted by the feature extraction means. The processing by the rectification processing unit, the addition unit, the sub-sampling processing unit, and the averaging processing unit is a weighted generalized averaging process, and the normalization processing unit includes two normalization processing units. An image processing program characterized by normalizing a difference between processing results obtained by the averaging processing means by dividing a sum of the processing results as a divisor.

請求項１の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理を原因とする識別誤りを減少させることができる。 According to the image processing apparatus of the first aspect, when identifying an image, it is possible to reduce identification errors caused by normalization as compared with a case where the present configuration is not provided.

請求項２の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における減算処理と除算処理の演算量を削減することができる。 According to the image processing apparatus of the second aspect, when the image is identified, the amount of calculation of the subtraction process and the division process in the normalization process can be reduced as compared with the case where the present configuration is not provided. it can.

請求項３の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における演算量を削減することができる。 According to the image processing apparatus of the third aspect, it is possible to reduce the amount of calculation in the normalization process when identifying an image as compared with the case where the present configuration is not provided.

請求項４の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における演算量を削減することができる。 According to the image processing apparatus of the fourth aspect, it is possible to reduce the amount of calculation in the normalization process when an image is identified and compared with a case where this configuration is not provided.

請求項５の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、学習時間を短縮することができる。 According to the image processing apparatus of the fifth aspect, when the image is identified, the learning time can be shortened as compared with the case where the present configuration is not provided.

請求項６の画像処理装置によれば、画像を識別する場合にあって、本構成を有していない場合に比較して、学習時間を短縮することができる。 According to the image processing apparatus of the sixth aspect, when the image is identified, the learning time can be shortened as compared with the case where the present configuration is not provided.

請求項７の画像処理プログラムによれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理を原因とする識別誤りを減少させることができる。 According to the image processing program of the seventh aspect, it is possible to reduce the identification error caused by the normalization process when the image is identified and compared with the case where the present configuration is not provided.

請求項８の画像処理プログラムによれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における減算処理と除算処理の演算量を削減することができる。 According to the image processing program of the eighth aspect, when the image is identified, the calculation amount of the subtraction process and the division process in the normalization process can be reduced as compared with the case where the present configuration is not provided. it can.

請求項９の画像処理プログラムによれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における演算量を削減することができる。 According to the image processing program of the ninth aspect, it is possible to reduce the amount of calculation in the normalization process in comparison with the case where the image is identified and this configuration is not provided.

請求項１０の画像処理プログラムによれば、画像を識別する場合にあって、本構成を有していない場合に比較して、正規化処理における演算量を削減することができる。 According to the image processing program of the tenth aspect, it is possible to reduce the amount of calculation in the normalization process when the image is identified and compared with the case where the present configuration is not provided.

請求項１１の画像処理プログラムによれば、画像を識別する場合にあって、本構成を有していない場合に比較して、学習時間を短縮することができる。 According to the image processing program of the eleventh aspect, it is possible to reduce the learning time when identifying an image as compared with the case where the present configuration is not provided.

第１の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 1st Embodiment. 第１の実施の形態の全体的な構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the example of a whole structure of 1st Embodiment. 第２の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 2nd Embodiment. 第３の実施の形態の構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the structural example of 3rd Embodiment. 第４の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 4th Embodiment. 第５の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 5th Embodiment. 第６の実施の形態の構成例についての概念的なモジュール構成図である。It is a notional module block diagram about the structural example of 6th Embodiment. 本実施の形態を実現するコンピュータのハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment. ＣＮＮ識別装置の全体的な構成例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the example of a whole structure of a CNN identification device. 整流処理、正規化処理を行う特徴抽出層の例についての概念的なモジュール構成図である。It is a conceptual module block diagram about the example of the feature extraction layer which performs a rectification | straightening process and a normalization process. 正規化処理の例を示す説明図である。It is explanatory drawing which shows the example of a normalization process. 正規化処理の例を示す説明図である。It is explanatory drawing which shows the example of a normalization process. 正規化処理の例を示す説明図である。It is explanatory drawing which shows the example of a normalization process.

まず、本実施の形態を説明する前に、その前提又は本実施の形態を利用する画像処理装置について説明する。なお、この説明は、本実施の形態の理解を容易にすることを目的とするものである。
画像を、複数のクラスのうちいずれかに識別する技術がある。いわゆる認識技術である。また、これらの中では、画像と正解クラスの組から識別装置を学習し、認識率を向上させるようにしているものがある。具体的には、受け付けたパターン（画像）を識別する階層ネットワークに関する技術である。
特に、階層型ニューラルネットワークの一種である畳込ニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ、以下ＣＮＮという）を用いて、画像を識別することが可能であることが知られている。［ＬｅＣｕｎ，Ｙａｎｎ，ｅｔａｌ． “Ｇｒａｄｉｅｎｔ−ＢａｓｅｄＬｅａｒｎｉｎｇＡｐｐｌｉｅｄｔｏＤｏｃｕｍｅｎｔＲｅｃｏｇｎｉｔｉｏｎ．” ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥ８６．１１（１９９８）：２２７８−２３２４．］ First, before describing the present embodiment, the premise or an image processing apparatus using the present embodiment will be described. This description is intended to facilitate understanding of the present embodiment.
There is a technique for identifying an image into one of a plurality of classes. This is so-called recognition technology. Also, among these, there is one that learns an identification device from a set of an image and a correct class and improves the recognition rate. Specifically, this is a technique related to a hierarchical network for identifying an accepted pattern (image).
In particular, it is known that an image can be identified using a convolutional neural network (hereinafter referred to as CNN) which is a kind of hierarchical neural network. [LeCun, Yann, et al. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324. ]

図９は、ＣＮＮ識別装置の全体的な構成例についての概念的なモジュール構成図である。
ＣＮＮ識別装置の特徴抽出層（特徴抽出層９１０、特徴抽出層９２０）は、畳込み層とサブサンプリング層で構成される。各層の入力は、複数枚の２次元空間に配置された特徴量からなる、３次元の特徴マップであり、出力は新たな特徴マップである。ＣＮＮ識別装置が受け付ける画像９００も特徴マップの一種とみなす。
畳込み層では、受け付けた特徴量にウェイトを畳込み（畳込み処理として、例えば、２次元デジタルフィルタが利用される）、サブサンプリング層では、その畳込みの応答をサブサンプリングすることで、受け付けた特徴量を位置の局所的な変動に対して不変な特徴量に変換する。
この特徴抽出層（特徴抽出層９１０、特徴抽出層９２０）は複数回繰り返し（図９では、２つの特徴抽出層であるが、３以上であってもよい。）、その出力を識別器９３０が受け付ける。
識別器９３０は、受け付けた特徴量から、画像９００が属するクラスを識別する。識別器９３０は一般に、多層パーセプトロンやＲＢＦ（ＲａｄｉａｌＢａｓｉｓＦｕｎｃｔｉｏｎ）で構成される。
特徴抽出層（特徴抽出層９１０、特徴抽出層９２０）の畳込みウェイトは、画像及び正解クラスの組を用いて、一般に誤差逆伝搬法にて行われる。
例えば、参考文献として、特開平５−６４６３号公報、特開平７−６４９４１号公報、特開２００２−３７３３３３号公報、特開２００５−２１５９８８号公報、特開２０１０−１５７１１８号公報等が挙げられる。 FIG. 9 is a conceptual module configuration diagram of an overall configuration example of the CNN identification device.
The feature extraction layer (feature extraction layer 910, feature extraction layer 920) of the CNN identification device is composed of a convolution layer and a sub-sampling layer. The input of each layer is a three-dimensional feature map composed of feature quantities arranged in a plurality of two-dimensional spaces, and the output is a new feature map. The image 900 received by the CNN identification device is also regarded as a kind of feature map.
In the convolution layer, weights are convolved with the received feature amount (for example, a two-dimensional digital filter is used as the convolution process), and in the sub-sampling layer, the response of the convolution is sub-sampled. The obtained feature quantity is converted into a feature quantity that is invariant to local variations in position.
This feature extraction layer (feature extraction layer 910, feature extraction layer 920) is repeated a plurality of times (in FIG. 9, although it is two feature extraction layers, it may be three or more), and the output of the classifier 930 is Accept.
The discriminator 930 identifies the class to which the image 900 belongs from the received feature amount. The discriminator 930 is generally composed of a multilayer perceptron or an RBF (Radial Basis Function).
The convolution weights of the feature extraction layers (the feature extraction layer 910 and the feature extraction layer 920) are generally performed by an error back propagation method using a set of images and correct answer classes.
For example, JP-A-5-6463, JP-A-7-64941, JP-A-2002-373333, JP-A-2005-215988, JP-A-2010-157118, and the like can be cited as references.

近年、さらに識別精度が高めるために、ＣＮＮの特徴抽出層に整流処理（Ｒｅｃｔｉｆｉｃａｔｉｏｎ）、正規化処理（ＤｉｖｉｓｉｖｅＮｏｒｍａｌｉｚａｔｉｏｎ／ＣｏｎｔｒａｓｔＮｏｒｍａｌｉｚａｔｉｏｎ）が加えられている［ＬｅＣｕｎ，Ｙａｎｎ，ＫｏｒａｙＫａｖｕｋｖｕｏｇｌｕａｎｄＣｌeｍｅｎｔＦａｒａｂｅｔ． “ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓａｎｄＡｐｐｌｉｃａｔｉｏｎｓｉｎＶｉｓｉｏｎ．” Ｐｒｏｃｅｅｄｉｎｇｓｏｆ２０１０ＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＳｙｍｐｏｓｉｕｍｏｎＣｉｒｃｕｉｔｓａｎｄＳｙｓｔｅｍｓ（２０１０）：２５３−２５６．］。以降、ＣＮＮといえば、この整流処理、正規化処理が加えられたものを指す。 In recent years, in order to further improve the identification accuracy, rectification processing (Rectification) and normalization processing (Division Normalization / Contrast Normalization) have been added to the feature extraction layer of CNN [LeCun, Yann, Korykkubumulu. “Convolutional Networks and Applications in Vision.” Proceedings of 2010 IEEE International Symposium on Circuits and Systems (2010): 253-256. ]. Hereinafter, the term “CNN” refers to a device to which this rectification processing and normalization processing are added.

図１０は、整流処理、正規化処理を行う特徴抽出層の例についての概念的なモジュール構成図である。図１０では、畳込み処理モジュール１０１０等が２つである場合を例示しているが、３以上であってもよい。
この特徴抽出層は、複数の畳込み処理モジュール１０１０、複数の整流処理モジュール１０２０、正規化処理モジュール１０３０、複数のサブサンプリング処理モジュール１０４０を有している。つまり、従来のＣＮＮに整流処理、正規化処理を付加したものである。
畳込み処理モジュール１０１０ａ、畳込み処理モジュール１０１０ｂは、それぞれ整流処理モジュール１０２０ａ、整流処理モジュール１０２０ｂと接続されており、画像又は特徴マップ１０００Ａ、１０００Ｂ、１０００Ｃを受け付け、これらに対して畳込み処理を行う。
整流処理モジュール１０２０ａは、畳込み処理モジュール１０１０ａ、平均算出処理モジュール１０３２、差算出処理モジュール１０３４ａ、標準偏差算出処理モジュール１０３６と接続されている。従来のＣＮＮでは、正負が混在する畳込み応答どうしをサブサンプリングすることによる特徴の打ち消しあいが発生していた。認識率を改善させるために、この整流処理を加えることによって、特徴の打ち消しあいを抑えている。具体的な整流処理としては、例えば、絶対値化処理、２乗処理等がある。
整流処理モジュール１０２０ｂは、畳込み処理モジュール１０１０ｂ、平均算出処理モジュール１０３２、差算出処理モジュール１０３４ｂ、標準偏差算出処理モジュール１０３６と接続されており、整流処理モジュール１０２０ａと同等の処理を行う。 FIG. 10 is a conceptual module configuration diagram of an example of a feature extraction layer that performs rectification processing and normalization processing. FIG. 10 illustrates the case where there are two convolution processing modules 1010 and the like, but may be three or more.
The feature extraction layer includes a plurality of convolution processing modules 1010, a plurality of rectification processing modules 1020, a normalization processing module 1030, and a plurality of sub-sampling processing modules 1040. That is, rectification processing and normalization processing are added to the conventional CNN.
The convolution processing module 1010a and the convolution processing module 1010b are connected to the rectification processing module 1020a and the rectification processing module 1020b, respectively, and receive images or feature maps 1000A, 1000B, and 1000C and perform convolution processing on these. .
The rectification processing module 1020a is connected to the convolution processing module 1010a, the average calculation processing module 1032, the difference calculation processing module 1034a, and the standard deviation calculation processing module 1036. In the conventional CNN, feature cancellation occurs due to sub-sampling between convolutional responses in which positive and negative are mixed. In order to improve the recognition rate, this rectification process is added to suppress the cancellation of features. Specific examples of the rectification process include an absolute value process and a square process.
The rectification processing module 1020b is connected to the convolution processing module 1010b, the average calculation processing module 1032, the difference calculation processing module 1034b, and the standard deviation calculation processing module 1036, and performs processing equivalent to that of the rectification processing module 1020a.

正規化処理モジュール１０３０は、平均算出処理モジュール１０３２、差算出処理モジュール１０３４ａ、差算出処理モジュール１０３４ｂ、標準偏差算出処理モジュール１０３６、除算処理モジュール１０３８ａ、除算処理モジュール１０３８ｂを有している。認識率を改善させるために、この正規化処理を加えることによって、畳込みが受け付ける特徴量が一様に変位した場合であっても、変動を抑えて安定した特徴量が得られるようにしている。
平均算出処理モジュール１０３２は、整流処理モジュール１０２０ａ、整流処理モジュール１０２０ｂ、差算出処理モジュール１０３４ａ、差算出処理モジュール１０３４ｂと接続されている。平均算出処理モジュール１０３２は、全ての整流処理後の値の平均値を算出し、その平均値を差算出処理モジュール１０３４ａ等に渡す。
差算出処理モジュール１０３４ａは、整流処理モジュール１０２０ａ、平均算出処理モジュール１０３２、除算処理モジュール１０３８ａと接続されている。差算出処理モジュール１０３４ａは、整流処理モジュール１０２０ａによる処理結果から平均算出処理モジュール１０３２で算出された平均値を減算する処理を行う。
差算出処理モジュール１０３４ｂは、整流処理モジュール１０２０ｂ、平均算出処理モジュール１０３２、除算処理モジュール１０３８ｂと接続されており、差算出処理モジュール１０３４ａと同等の処理を行う。
標準偏差算出処理モジュール１０３６は、整流処理モジュール１０２０ａ、整流処理モジュール１０２０ｂ、除算処理モジュール１０３８ａ、除算処理モジュール１０３８ｂと接続されている。標準偏差算出処理モジュール１０３６は、全ての整流処理後の値の標準偏差を算出し、その標準偏差を除算処理モジュール１０３８ａ等に渡す。
除算処理モジュール１０３８ａは、差算出処理モジュール１０３４ａ、標準偏差算出処理モジュール１０３６、サブサンプリング処理モジュール１０４０ａと接続されている。除算処理モジュール１０３８ａは、差算出処理モジュール１０３４ａによる処理結果を標準偏差算出処理モジュール１０３６で算出された標準偏差で除算する（つまり標準偏差を除数としている）。
除算処理モジュール１０３８ｂは、差算出処理モジュール１０３４ｂ、標準偏差算出処理モジュール１０３６、サブサンプリング処理モジュール１０４０ｂと接続されており、除算処理モジュール１０３８ａと同等の処理を行う。
サブサンプリング処理モジュール１０４０ａは、除算処理モジュール１０３８ａと接続されている。サブサンプリング処理モジュール１０４０ａは、除算処理モジュール１０３８ａによる処理結果をサブサンプリング処理する。そして、次の特徴抽出層へ出力するか（つまり、次の特徴抽出層にとっての画像又は特徴マップ１０００Ａ、１０００Ｂ、１０００Ｃ等になる）、識別器（識別層ともいわれる）への出力となる。
サブサンプリング処理モジュール１０４０ｂは、除算処理モジュール１０３８ｂと接続されており、サブサンプリング処理モジュール１０４０ａと同等の処理を行う。 The normalization processing module 1030 includes an average calculation processing module 1032, a difference calculation processing module 1034a, a difference calculation processing module 1034b, a standard deviation calculation processing module 1036, a division processing module 1038a, and a division processing module 1038b. In order to improve the recognition rate, by adding this normalization process, even if the feature value accepted by convolution is displaced uniformly, it is possible to obtain a stable feature value by suppressing fluctuations. .
The average calculation processing module 1032 is connected to the rectification processing module 1020a, the rectification processing module 1020b, the difference calculation processing module 1034a, and the difference calculation processing module 1034b. The average calculation processing module 1032 calculates an average value of all the values after the rectification processing, and passes the average value to the difference calculation processing module 1034a and the like.
The difference calculation processing module 1034a is connected to the rectification processing module 1020a, the average calculation processing module 1032 and the division processing module 1038a. The difference calculation processing module 1034a performs processing for subtracting the average value calculated by the average calculation processing module 1032 from the processing result of the rectification processing module 1020a.
The difference calculation processing module 1034b is connected to the rectification processing module 1020b, the average calculation processing module 1032 and the division processing module 1038b, and performs processing equivalent to the difference calculation processing module 1034a.
The standard deviation calculation processing module 1036 is connected to the rectification processing module 1020a, the rectification processing module 1020b, the division processing module 1038a, and the division processing module 1038b. The standard deviation calculation processing module 1036 calculates the standard deviation of all the values after the rectification processing, and passes the standard deviation to the division processing module 1038a and the like.
The division processing module 1038a is connected to the difference calculation processing module 1034a, the standard deviation calculation processing module 1036, and the sub-sampling processing module 1040a. The division processing module 1038a divides the processing result by the difference calculation processing module 1034a by the standard deviation calculated by the standard deviation calculation processing module 1036 (that is, the standard deviation is a divisor).
The division processing module 1038b is connected to the difference calculation processing module 1034b, the standard deviation calculation processing module 1036, and the subsampling processing module 1040b, and performs processing equivalent to the division processing module 1038a.
The sub-sampling processing module 1040a is connected to the division processing module 1038a. The sub-sampling processing module 1040a performs sub-sampling processing on the processing result obtained by the division processing module 1038a. Then, it is output to the next feature extraction layer (that is, an image or feature map 1000A, 1000B, 1000C, etc. for the next feature extraction layer) or output to a discriminator (also referred to as a discrimination layer).
The sub-sampling processing module 1040b is connected to the division processing module 1038b and performs processing equivalent to the sub-sampling processing module 1040a.

正規化処理モジュール１０３０が行う正規化処理では、受け付けた特徴量は、ある範囲（例えば、［−１，１］）の値をとるよう拡大縮小されて出力される。図１１は、正規化処理の例を示す説明図である。図１１（ａ）の例に示すグラフは正規化処理モジュール１０３０が受け付ける値であり、図１１（ｂ）の例に示すグラフは正規化処理モジュール１０３０が出力する値であり、この場合は拡大処理を行っている。
例えば、正規化処理モジュール１０３０が受け付ける値の分散が小さな値であった場合、正規化処理によって値は大きく拡大されるため、出力値は正規化の分母の符号、すなわち受け付けた値の平均値に強く依存する。
また、正規化処理モジュール１０３０が受け付ける値の一部が大きな値（例えば、ノイズ等の影響で大きな値となる場合がある）をとる場合に、出力値全体が大きく変化してしまう。
図１２、図１３は、このような場合の正規化処理の例を示す説明図である。図１２（ａ）の例に示すグラフは正規化処理モジュール１０３０が受け付ける値であり、最初にノイズ等の影響で大きな値が発生している。したがって、この値の平均値は、その他の値で算出した平均値よりも高くなる。図１２（ｂ）の例に示すグラフは、この場合に正規化処理モジュール１０３０が出力する値である。
図１３（ａ）の例に示すグラフは正規化処理モジュール１０３０が受け付ける値である。図１２（ａ）の例に示すグラフと比較すると、ノイズ等の影響で大きな値は発生していないが、図１２（ａ）の例の最初の部分以外は類似しているグラフである。この値の平均値は、図１２（ａ）の例の平均値よりも低くなる。図１３（ｂ）の例に示すグラフは、この場合に正規化処理モジュール１０３０が出力する値である。つまり、図１２（ｂ）の例に示すグラフと図１３（ａ）の例に示すグラフは類似しているにもかかわらず、図１２（ｂ）の例に示すグラフと図１３（ｂ）の例に示すグラフは異なっている。つまり、正規化処理の出力は、受け付けた値の平均値に依存している。
このように、ＣＮＮの特徴抽出層には、正規化処理が受け付ける値の一部（一部の畳込みフィルタ応答）の変動が、出力される特徴量全体の大きな変動を引き起こす場合がある。つまり、受け付ける画像の変動に対して安定した特徴量が得られない場合がある。 In the normalization processing performed by the normalization processing module 1030, the received feature amount is scaled and output so as to take a value within a certain range (for example, [−1, 1]). FIG. 11 is an explanatory diagram illustrating an example of normalization processing. The graph shown in the example of FIG. 11A is a value accepted by the normalization processing module 1030, and the graph shown in the example of FIG. 11B is a value output by the normalization processing module 1030. In this case, the enlargement processing It is carried out.
For example, when the variance of the values received by the normalization processing module 1030 is a small value, the value is greatly expanded by the normalization processing, so the output value is the sign of the normalization denominator, that is, the average value of the received values. Strongly dependent.
Further, when a part of the values received by the normalization processing module 1030 takes a large value (for example, it may be a large value due to the influence of noise or the like), the entire output value changes greatly.
12 and 13 are explanatory diagrams showing an example of normalization processing in such a case. The graph shown in the example of FIG. 12A is a value accepted by the normalization processing module 1030, and a large value is initially generated due to the influence of noise or the like. Therefore, the average value of these values is higher than the average value calculated with other values. The graph shown in the example of FIG. 12B is a value output from the normalization processing module 1030 in this case.
The graph shown in the example of FIG. 13A is a value accepted by the normalization processing module 1030. Compared with the graph shown in the example of FIG. 12A, a large value is not generated due to the influence of noise or the like, but the graph is similar except for the first part of the example of FIG. The average value of this value is lower than the average value in the example of FIG. The graph shown in the example of FIG. 13B is a value output from the normalization processing module 1030 in this case. That is, although the graph shown in the example of FIG. 12B and the graph shown in the example of FIG. 13A are similar, the graph shown in the example of FIG. 12B and the graph of FIG. The graphs shown in the examples are different. That is, the output of the normalization process depends on the average value of the accepted values.
As described above, in the feature extraction layer of the CNN, a variation in a part of values (a part of the convolution filter response) accepted by the normalization process may cause a large variation in the entire output feature amount. That is, there may be a case where a stable feature amount cannot be obtained with respect to fluctuations in the received image.

以下、図面に基づき本発明を実現するにあたっての好適な各種の実施の形態の例を説明する。
＜第１の実施の形態＞
図１は、第１の実施の形態の構成例についての概念的なモジュール構成図を示している。
なお、モジュールとは、一般的に論理的に分離可能なソフトウェア（コンピュータ・プログラム）、ハードウェア等の部品を指す。したがって、本実施の形態におけるモジュールはコンピュータ・プログラムにおけるモジュールのことだけでなく、ハードウェア構成におけるモジュールも指す。それゆえ、本実施の形態は、それらのモジュールとして機能させるためのコンピュータ・プログラム（コンピュータにそれぞれの手順を実行させるためのプログラム、コンピュータをそれぞれの手段として機能させるためのプログラム、コンピュータにそれぞれの機能を実現させるためのプログラム）、システム及び方法の説明をも兼ねている。ただし、説明の都合上、「記憶する」、「記憶させる」、これらと同等の文言を用いるが、これらの文言は、実施の形態がコンピュータ・プログラムの場合は、記憶装置に記憶させる、又は記憶装置に記憶させるように制御するの意である。また、モジュールは機能に一対一に対応していてもよいが、実装においては、１モジュールを１プログラムで構成してもよいし、複数モジュールを１プログラムで構成してもよく、逆に１モジュールを複数プログラムで構成してもよい。また、複数モジュールは１コンピュータによって実行されてもよいし、分散又は並列環境におけるコンピュータによって１モジュールが複数コンピュータで実行されてもよい。なお、１つのモジュールに他のモジュールが含まれていてもよい。また、以下、「接続」とは物理的な接続の他、論理的な接続（データの授受、指示、データ間の参照関係等）の場合にも用いる。「予め定められた」とは、対象としている処理の前に定まっていることをいい、本実施の形態による処理が始まる前はもちろんのこと、本実施の形態による処理が始まった後であっても、対象としている処理の前であれば、そのときの状況・状態に応じて、又はそれまでの状況・状態に応じて定まることの意を含めて用いる。また、「Ａである場合、Ｂをする」という意味を有する記載は、「Ａであるか否かを判断し、Ａであると判断した場合はＢをする」の意味で用いる。ただし、Ａであるか否かの判断が不要である場合を除く。
また、システム又は装置とは、複数のコンピュータ、ハードウェア、装置等がネットワーク（一対一対応の通信接続を含む）等の通信手段で接続されて構成されるほか、１つのコンピュータ、ハードウェア、装置等によって実現される場合も含まれる。「装置」と「システム」とは、互いに同義の用語として用いる。もちろんのことながら、「システム」には、人為的な取り決めである社会的な「仕組み」（社会システム）にすぎないものは含まない。
また、各モジュールによる処理毎に又はモジュール内で複数の処理を行う場合はその処理毎に、対象となる情報を記憶装置から読み込み、その処理を行った後に、処理結果を記憶装置に書き出すものである。したがって、処理前の記憶装置からの読み込み、処理後の記憶装置への書き出しについては、説明を省略する場合がある。なお、ここでの記憶装置としては、ハードディスク、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、外部記憶媒体、通信回線を介した記憶装置、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）内のレジスタ等を含んでいてもよい。 Hereinafter, examples of various preferred embodiments for realizing the present invention will be described with reference to the drawings.
<First Embodiment>
FIG. 1 is a conceptual module configuration diagram of a configuration example according to the first embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. In addition, the description having the meaning of “do B when it is A” is used in the meaning of “determine whether or not it is A and do B when it is judged as A”. However, the case where it is not necessary to determine whether or not A is excluded.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

第１の実施の形態である画像処理装置は、画像を識別するために特徴を抽出するものであって、図１の例に示すように、畳込み処理モジュール１１０、整流処理モジュール１２０、正規化処理モジュール１３０、サブサンプリング処理モジュール１４０を有している。
この画像処理装置は、特徴抽出層であって、２つの畳込みフィルタの応答に整流処理を施した特徴量のペアについて、そのペアにおける差を、そのペア内の特徴量に基づいた値を除数として除算することによって正規化し、サブサンプリングしたものを出力特徴量とする。 The image processing apparatus according to the first embodiment extracts features for identifying an image. As shown in the example of FIG. 1, the convolution processing module 110, the rectification processing module 120, and the normalization are performed. A processing module 130 and a sub-sampling processing module 140 are included.
This image processing apparatus is a feature extraction layer, and for a pair of feature values obtained by performing rectification processing on responses of two convolution filters, a difference in the pair is calculated by dividing a value based on the feature value in the pair. As a result of normalization by dividing as a sub-sampled output feature value.

畳込み処理モジュール１１０ａ、畳込み処理モジュール１１０ｂ、畳込み処理モジュール１１０ｃ、畳込み処理モジュール１１０ｄは、それぞれ整流処理モジュール１２０ａ、整流処理モジュール１２０ｂ、整流処理モジュール１２０ｃ、整流処理モジュール１２０ｄと接続されている。畳込み処理モジュール１１０は画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを受け付ける。１段目の特徴抽出層の場合は、画像を受け付けることとなり、２段目以降の特徴抽出層の場合は、前段の特徴抽出層における出力である特徴量を受け付ける。そして、受け付けた画像又は特徴量に対して、畳込処理を行う。前述した図１０に例示した畳込み処理モジュール１０１０と同等の処理を行う。
整流処理モジュール１２０ａは、畳込み処理モジュール１１０ａ、差算出処理モジュール１３２ａ、除数算出処理モジュール１３４ａと接続されている。整流処理モジュール１２０ａは、整流処理を行う。前述した図１０に例示した整流処理モジュール１０２０と同等の処理を行う。もちろん、整流処理モジュール１０２０ｂ等の処理も同等である。 The convolution processing module 110a, the convolution processing module 110b, the convolution processing module 110c, and the convolution processing module 110d are connected to the rectification processing module 120a, the rectification processing module 120b, the rectification processing module 120c, and the rectification processing module 120d, respectively. . The convolution processing module 110 receives images or feature maps 100A, 100B, and 100C. In the case of the feature extraction layer in the first stage, an image is accepted, and in the case of the feature extraction layers in the second and subsequent stages, a feature amount that is an output in the preceding feature extraction layer is accepted. Then, a convolution process is performed on the received image or feature amount. Processing equivalent to that of the convolution processing module 1010 illustrated in FIG. 10 described above is performed.
The rectification processing module 120a is connected to the convolution processing module 110a, the difference calculation processing module 132a, and the divisor calculation processing module 134a. The rectification processing module 120a performs rectification processing. Processing equivalent to that of the rectification processing module 1020 illustrated in FIG. 10 described above is performed. Of course, the processing of the rectification processing module 1020b and the like is the same.

正規化処理モジュール１３０は、差算出処理モジュール１３２ａ、差算出処理モジュール１３２ｃ、除数算出処理モジュール１３４ａ、除数算出処理モジュール１３４ｃ、除算処理モジュール１３６ａ、除算処理モジュール１３６ｃを有している。正規化処理モジュール１３０は、正規化処理を行う。
差算出処理モジュール１３２ａは、整流処理モジュール１２０ａ、整流処理モジュール１２０ｂ、除算処理モジュール１３６ａと接続されている。
除数算出処理モジュール１３４ａは、整流処理モジュール１２０ａ、整流処理モジュール１２０ｂ、除算処理モジュール１３６ａと接続されている。
除算処理モジュール１３６ａは、差算出処理モジュール１３２ａ、除数算出処理モジュール１３４ａ、サブサンプリング処理モジュール１４０ａと接続されている。
正規化処理モジュール１３０が行う正規化処理として、２つの整流処理モジュール１２０（例えば、整流処理モジュール１２０ａと整流処理モジュール１２０ｂのペア）による処理結果間の差を、その処理結果に基づいた値（例えば、平均値、標準偏差、最頻値、中央値等）を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、整流処理モジュール１２０ａ、整流処理モジュール１２０ｂのいずれが一方であり、他方であってもよい。
この特徴抽出層において、一部の畳込み処理モジュール１１０による処理結果（フィルタ応答）の変動が及ぼす出力特徴量への影響は、ペア内に限定的であり、一部のフィルタ応答が出力される特徴量全体に大きな変動を引き起こすことを抑制している。
また、ペアによる処理結果間の差をとることで、畳込み処理モジュール１１０が受け付ける特徴量が大きく（又は小さく）変位した場合であっても、変動を抑制した安定した特徴量が得られるというＣＮＮの利点は保たれている。
特に、画像又は特徴マップの値が０であった場合に、特徴抽出層が出力する特徴量も０となり、したがって後段の識別器に、受け付けた画像又は特徴マップの値が０であったことを伝えられることになる。 The normalization processing module 130 includes a difference calculation processing module 132a, a difference calculation processing module 132c, a divisor calculation processing module 134a, a divisor calculation processing module 134c, a division processing module 136a, and a division processing module 136c. The normalization processing module 130 performs normalization processing.
The difference calculation processing module 132a is connected to the rectification processing module 120a, the rectification processing module 120b, and the division processing module 136a.
The divisor calculation processing module 134a is connected to the rectification processing module 120a, the rectification processing module 120b, and the division processing module 136a.
The division processing module 136a is connected to the difference calculation processing module 132a, the divisor calculation processing module 134a, and the subsampling processing module 140a.
As a normalization process performed by the normalization processing module 130, a difference between the processing results of the two rectification processing modules 120 (for example, a pair of the rectification processing module 120a and the rectification processing module 120b) is determined based on the processing result (for example, , Average value, standard deviation, mode value, median value, etc.) are normalized as a divisor. Note that the difference between the processing results is that the other processing result is subtracted from one processing result, and either the rectification processing module 120a or the rectification processing module 120b is one, and the other may be the other.
In this feature extraction layer, the influence on the output feature amount caused by the variation in the processing result (filter response) by some convolution processing modules 110 is limited within the pair, and some filter responses are output. It suppresses large fluctuations in the entire feature value.
Further, by taking the difference between the processing results of the pair, the CNN can obtain a stable feature amount with suppressed fluctuation even when the feature amount received by the convolution processing module 110 is displaced large (or small). The advantage of is kept.
In particular, when the value of the image or feature map is 0, the feature amount output by the feature extraction layer is also 0, and accordingly, it is confirmed that the received image or feature map value is 0. Will be communicated.

サブサンプリング処理モジュール１４０ａは、除算処理モジュール１３６ａと接続されている。正規化処理モジュール１３０による処理結果に対して、サブサンプリング処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。 The sub-sampling processing module 140a is connected to the division processing module 136a. By performing a sub-sampling process on the processing result by the normalization processing module 130, the feature amounts of the images or feature maps 100A, 100B, and 100C are extracted. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.

図２は、第１の実施の形態の全体的な構成例についての概念的なモジュール構成図である。これは、図１に例示した特徴抽出層を３階層重ねたものであり、それぞれ特徴抽出層１：２１０、特徴抽出層２：２３０、特徴抽出層３：２５０として、それぞれの出力は特徴マップＡ：２２０、特徴マップＢ：２４０、特徴マップＣ：２６０である。もちろんのことながら、３階層に限定しているわけではなく、２階層であってもよいし、３階層以上であってもよい。なお、第２の実施の形態から第６の実施の形態の特徴抽出層（画像処理装置）も、図２に例示している特徴抽出層に適用し得る。 FIG. 2 is a conceptual module configuration diagram of an overall configuration example of the first embodiment. This is a stack of the feature extraction layers illustrated in FIG. 1 in three layers. As the feature extraction layer 1: 210, the feature extraction layer 2: 230, and the feature extraction layer 3: 250, respectively, the outputs are the feature map A. : 220, feature map B: 240, and feature map C: 260. Of course, it is not limited to three layers, but may be two layers or three or more layers. Note that the feature extraction layer (image processing apparatus) of the second to sixth embodiments can also be applied to the feature extraction layer illustrated in FIG.

図１に例示の特徴抽出層が受け付ける特徴量をｘ、畳込みフィルタをｗとすると、畳込み処理の出力特徴量ｓは、

となる。なお、ｉは出力特徴量マップのインデックス、ｐは入力特徴量マップのインデックス、ｊ，ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標である。 When the feature quantity received by the feature extraction layer illustrated in FIG. 1 is x and the convolution filter is w, the output feature quantity s of the convolution process is

It becomes. Note that i is an index of the output feature map, p is an index of the input feature map, j and k are two-dimensional coordinates in the feature map, and q and r are two-dimensional coordinates of the convolution filter.

畳込み処理の入力特徴量マップのサイズをＭ×Ｎ、出力特徴量マップのサイズをＭ’×Ｎ’、畳込みフィルタのサイズをＱ×Ｒとすると、

の関係がある。 When the size of the input feature map for convolution processing is M × N, the size of the output feature map is M ′ × N ′, and the size of the convolution filter is Q × R,

There is a relationship.

図１に例示の特徴抽出層において、畳込み処理の出力特徴量をｓ、整流処理の出力特徴量をｔとすると、

となる。なお、ｉは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標である。 In the feature extraction layer illustrated in FIG. 1, when the output feature quantity of the convolution process is s and the output feature quantity of the rectification process is t,

It becomes. Note that i is a feature map index, and j and k are two-dimensional coordinates in the feature map.

図１に例示の特徴抽出層において、整流処理の出力特徴量をｔ、除数算出処理の出力特徴量をｖとすると、

となる。なお、ｉは出力特徴量マップのインデックス、ｐは入力特徴領マップのインデックス、ｊ，ｋは特徴量マップ内の２次元座標、ｑ、ｒは除数を計算する局所領域のインデックス、ｗ及びβは予め与えられる重み付き一般化平均のパラメータである。 In the feature extraction layer illustrated in FIG. 1, when the output feature amount of the rectification process is t and the output feature amount of the divisor calculation process is v,

It becomes. Note that i is an index of the output feature map, p is an index of the input feature map, j and k are two-dimensional coordinates in the feature map, q and r are indices of a local region for calculating a divisor, and w and β are This is a weighted generalized average parameter given in advance.

図１に例示の特徴抽出層において、整流処理の出力特徴量をｔ、正規化処理の出力特徴量をｕとすると、

となる。なお、ｉは出力特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標である。 In the feature extraction layer illustrated in FIG. 1, when the output feature amount of the rectification process is t and the output feature amount of the normalization process is u,

It becomes. Note that i is an index of the output feature map, and j and k are two-dimensional coordinates in the feature map.

図１に例示の特徴抽出層において、正規化処理の出力特徴量をｕ、特徴抽出層の出力をｙとすると、

となる。なお、ｉは出力特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは除数を計算する局所領域のインデックス、ｗは予め与えられるサブサンプリングの重みである。 In the feature extraction layer illustrated in FIG. 1, assuming that the output feature amount of the normalization process is u and the output of the feature extraction layer is y,

It becomes. Note that i is an index of the output feature map, j and k are two-dimensional coordinates in the feature map, q and r are indices of local regions for calculating the divisor, and w is a weight of subsampling given in advance.

図２に例示の特徴マップＣ：２６０の出力ｘから最終出力ｚを得る識別器は、

の計算で構成される。なお、ｗは学習されるウェイトである。出力の添え字ｃは入力画像が識別されるクラスを表し、識別時にはｚ_ｃがもっとも大きな値となるｃが識別結果となる。
ウェイトの学習は、受け付ける画像と正解識別クラスの組を用いて、誤差逆伝搬法にてクロスエントロピー誤差を最小化させて行われる。
したがって、識別器は、１層の全結合層とＳｏｆｔｍａｘ関数によって構成されていてもよい。識別器は、１層の全結合層とＲＢＦ関数によって構成されていてもよい。特徴抽出層及び識別器は、画像と正解クラスの組から結合重みを学習するようにしてもよい。結合重みの学習は、誤差逆伝搬法にて行われるようにしてもよい。 The classifier that obtains the final output z from the output x of the feature map C: 260 illustrated in FIG.

It consists of the calculation of Note that w is a weight to be learned. The subscript c of the output represents the class in which the input image is identified, and at the time of identification, _c having the largest value of zc is the identification result.
Weight learning is performed by minimizing a cross-entropy error by a back propagation method using a set of an image to be received and a correct answer identification class.
Therefore, the discriminator may be composed of one full coupling layer and a Softmax function. The discriminator may be composed of one full coupling layer and an RBF function. The feature extraction layer and the discriminator may learn the connection weight from the set of the image and the correct class. The learning of the connection weight may be performed by an error back propagation method.

＜第２の実施の形態＞
図３は、第２の実施の形態の構成例についての概念的なモジュール構成図である。第１の実施の形態による処理結果は変わらず、正規化処理の減算及び除算の演算量を減らすために、第１の実施の形態において、図１に例示の特徴抽出層のサブサンプリング処理を、正規化処理の後から正規化処理の前に移動させる。
第２の実施の形態は、図３の例に示すように、畳込み処理モジュール３１０、整流処理モジュール３２０、サブサンプリング処理モジュール３３０、正規化処理モジュール３４０を有している。
畳込み処理モジュール３１０は、畳込み処理モジュール１１０と同等の処理を行う。
整流処理モジュール３２０は、畳込み処理モジュール３１０、サブサンプリング処理モジュール３３０、除数算出処理モジュール３４４と接続されており、整流処理モジュール１２０と同等の処理を行うが、その処理結果をサブサンプリング処理モジュール３３０、除数算出処理モジュール３４４に渡す。
サブサンプリング処理モジュール３３０は、整流処理モジュール３２０、差算出処理モジュール３４２と接続されている。サブサンプリング処理モジュール３３０は、整流処理モジュール３２０による処理結果に対して、サブサンプリング処理を行う。 <Second Embodiment>
FIG. 3 is a conceptual module configuration diagram of a configuration example according to the second embodiment. The processing result according to the first embodiment is not changed, and in order to reduce the calculation amount of subtraction and division of normalization processing, the subsampling processing of the feature extraction layer illustrated in FIG. Move after normalization processing but before normalization processing.
As shown in the example of FIG. 3, the second embodiment includes a convolution processing module 310, a rectification processing module 320, a sub-sampling processing module 330, and a normalization processing module 340.
The convolution processing module 310 performs processing equivalent to that of the convolution processing module 110.
The rectification processing module 320 is connected to the convolution processing module 310, the sub-sampling processing module 330, and the divisor calculation processing module 344, and performs processing equivalent to that of the rectification processing module 120. To the divisor calculation processing module 344.
The sub-sampling processing module 330 is connected to the rectification processing module 320 and the difference calculation processing module 342. The sub-sampling processing module 330 performs sub-sampling processing on the processing result from the rectification processing module 320.

正規化処理モジュール３４０は、差算出処理モジュール３４２ａ、差算出処理モジュール３４２ｃ、除数算出処理モジュール３４４ａ、除数算出処理モジュール３４４ｃ、除算処理モジュール３４６ａ、除算処理モジュール３４６ｃを有している。正規化処理モジュール３４０は、複数のサブサンプリング処理モジュール３３０による処理結果に対して、正規化処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。
差算出処理モジュール３４２ａは、サブサンプリング処理モジュール３３０ａ、サブサンプリング処理モジュール３３０ｂ、除算処理モジュール３４６ａと接続されている。
除数算出処理モジュール３４４ａは、整流処理モジュール３２０ａ、整流処理モジュール３２０ｂ、除算処理モジュール３４６ａと接続されている。
除算処理モジュール３４６ａは、差算出処理モジュール３４２ａ、除数算出処理モジュール３４４ａと接続されている。
正規化処理モジュール３４０が行う正規化処理として、２つのサブサンプリング処理モジュール３３０（例えば、サブサンプリング処理モジュール３３０ａとサブサンプリング処理モジュール３３０ｂのペア）による処理結果間の差を、２つの整流処理モジュール３２０による処理結果に基づいた値（例えば、平均値、標準偏差、最頻値、中央値等）を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、サブサンプリング処理モジュール３３０ａ、サブサンプリング処理モジュール３３０ｂのいずれが一方であり、他方であってもよい。 The normalization processing module 340 includes a difference calculation processing module 342a, a difference calculation processing module 342c, a divisor calculation processing module 344a, a divisor calculation processing module 344c, a division processing module 346a, and a division processing module 346c. The normalization processing module 340 performs the normalization processing on the processing results obtained by the plurality of sub-sampling processing modules 330, thereby extracting the feature amounts of the images or feature maps 100A, 100B, and 100C. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.
The difference calculation processing module 342a is connected to the sub-sampling processing module 330a, the sub-sampling processing module 330b, and the division processing module 346a.
The divisor calculation processing module 344a is connected to the rectification processing module 320a, the rectification processing module 320b, and the division processing module 346a.
The division processing module 346a is connected to the difference calculation processing module 342a and the divisor calculation processing module 344a.
As normalization processing performed by the normalization processing module 340, the difference between the processing results of the two sub-sampling processing modules 330 (for example, a pair of the sub-sampling processing module 330a and the sub-sampling processing module 330b) is calculated as the two rectification processing modules 320. Normalization is performed by dividing a value (for example, an average value, a standard deviation, a mode value, a median value, and the like) based on the processing result by the above as a divisor. Note that the difference between the processing results is that one processing result is subtracted from the other processing result, and either the sub-sampling processing module 330a or the sub-sampling processing module 330b is one and may be the other. .

この場合（式４）〜（式６）は、（式８）のように書き換えられる。

なお、ｔは整流処理の出力特徴量、ｕはサブサンプリング処理の出力、ｖは正規化処理の除数、ｙは特徴抽出層の出力、ｉ、ｐは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標、ｗ及びβは予め与えられる重み付き一般化平均のパラメータである。 In this case, (Expression 4) to (Expression 6) can be rewritten as (Expression 8).

T is the output feature value of the rectification process, u is the output of the sub-sampling process, v is the divisor of the normalization process, y is the output of the feature extraction layer, i and p are the indexes of the feature map, and j and k are the features. Two-dimensional coordinates in the quantity map, q and r are two-dimensional coordinates of the convolution filter, and w and β are parameters of a weighted generalized average given in advance.

＜第３の実施の形態＞
図４は、第３の実施の形態の構成例についての概念的なモジュール構成図である。正規化処理の除数算出処理の演算量を減らすために、第２の実施形態において、除数算出処理への入力を、サブサンプリング前の特徴量から、サブサンプリング後の特徴量に変更したものである。
第３の実施の形態は、図４の例に示すように、畳込み処理モジュール４１０、整流処理モジュール４２０、サブサンプリング処理モジュール４３０、正規化処理モジュール４４０を有している。
畳込み処理モジュール４１０は、畳込み処理モジュール３１０と同等の処理を行う。
整流処理モジュール４２０は、畳込み処理モジュール４１０、サブサンプリング処理モジュール４３０と接続されている。整流処理モジュール３２０と同等の処理を行う。
サブサンプリング処理モジュール４３０は、整流処理モジュール４２０、差算出処理モジュール４４２、除数算出処理モジュール４４４と接続されている。サブサンプリング処理モジュール３３０と同等の処理を行うが、その処理結果を差算出処理モジュール４４２、除数算出処理モジュール４４４に渡す。 <Third Embodiment>
FIG. 4 is a conceptual module configuration diagram of a configuration example according to the third embodiment. In the second embodiment, the input to the divisor calculation process is changed from the feature quantity before sub-sampling to the feature quantity after sub-sampling in order to reduce the amount of calculation of the divisor calculation process of the normalization process. .
As shown in the example of FIG. 4, the third embodiment includes a convolution processing module 410, a rectification processing module 420, a sub-sampling processing module 430, and a normalization processing module 440.
The convolution processing module 410 performs processing equivalent to that of the convolution processing module 310.
The rectification processing module 420 is connected to the convolution processing module 410 and the subsampling processing module 430. Processing equivalent to that of the rectification processing module 320 is performed.
The sub-sampling processing module 430 is connected to the rectification processing module 420, the difference calculation processing module 442, and the divisor calculation processing module 444. The same processing as the sub-sampling processing module 330 is performed, but the processing result is passed to the difference calculation processing module 442 and the divisor calculation processing module 444.

正規化処理モジュール４４０は、差算出処理モジュール４４２ａ、差算出処理モジュール４４２ｃ、除数算出処理モジュール４４４ａ、除数算出処理モジュール４４４ｃ、除算処理モジュール４４６ａ、除算処理モジュール４４６ｃを有している。正規化処理モジュール４４０は、複数のサブサンプリング処理モジュール４３０による処理結果に対して、正規化処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。
差算出処理モジュール４４２ａは、サブサンプリング処理モジュール４３０ａ、サブサンプリング処理モジュール４３０ｂ、除算処理モジュール４４６ａと接続されている。
除数算出処理モジュール４４４ａは、サブサンプリング処理モジュール４３０ａ、サブサンプリング処理モジュール４３０ｂ、除算処理モジュール４４６ａと接続されている。
除算処理モジュール４４６ａは、差算出処理モジュール４４２ａ、除数算出処理モジュール４４４ａと接続されている。
正規化処理モジュール４４０が行う正規化処理として、２つのサブサンプリング処理モジュール４３０（例えば、サブサンプリング処理モジュール４３０ａとサブサンプリング処理モジュール４３０ｂのペア）による処理結果間の差を、その処理結果に基づいた値（例えば、平均値、標準偏差、最頻値、中央値等）を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、サブサンプリング処理モジュール４３０ａ、サブサンプリング処理モジュール４３０ｂのいずれが一方であり、他方であってもよい。 The normalization processing module 440 includes a difference calculation processing module 442a, a difference calculation processing module 442c, a divisor calculation processing module 444a, a divisor calculation processing module 444c, a division processing module 446a, and a division processing module 446c. The normalization processing module 440 extracts the feature amounts of the images or feature maps 100A, 100B, and 100C by performing normalization processing on the processing results of the plurality of sub-sampling processing modules 430. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.
The difference calculation processing module 442a is connected to the sub-sampling processing module 430a, the sub-sampling processing module 430b, and the division processing module 446a.
The divisor calculation processing module 444a is connected to the sub-sampling processing module 430a, the sub-sampling processing module 430b, and the division processing module 446a.
The division processing module 446a is connected to the difference calculation processing module 442a and the divisor calculation processing module 444a.
As the normalization processing performed by the normalization processing module 440, the difference between the processing results of the two sub-sampling processing modules 430 (for example, a pair of the sub-sampling processing module 430a and the sub-sampling processing module 430b) is based on the processing results. Normalize by dividing the value (eg, mean, standard deviation, mode, median, etc.) as a divisor. Note that the difference between the processing results is that the other processing result is subtracted from one processing result, and either the sub-sampling processing module 430a or the sub-sampling processing module 430b is one and may be the other. .

この場合（式８）は（式９）のように書き換えられる。

なお、ｔは整流処理の出力特徴量、ｕはサブサンプリング処理の出力、ｖは正規化処理の除数、ｙは特徴抽出層の出力、ｉ、ｐは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標、βは予め与えられる重み付き一般化平均のパラメータである。 In this case, (Equation 8) is rewritten as (Equation 9).

T is the output feature value of the rectification process, u is the output of the sub-sampling process, v is the divisor of the normalization process, y is the output of the feature extraction layer, i and p are the indexes of the feature map, and j and k are the features. Two-dimensional coordinates in the quantity map, q and r are two-dimensional coordinates of the convolution filter, and β is a weighted generalized average parameter given in advance.

＜第４の実施の形態＞
図５は、第４の実施の形態の構成例についての概念的なモジュール構成図である。正規化処理の演算量を減らすために、第３の実施の形態において、（式９）におけるβ＝１では、β乗及び１／β乗の計算が不要となる。
第４の実施の形態は、図５の例に示すように、畳込み処理モジュール５１０、整流処理モジュール５２０、サブサンプリング処理モジュール５３０、正規化処理モジュール５４０を有している。
畳込み処理モジュール５１０は、畳込み処理モジュール４１０と同等の処理を行う。
整流処理モジュール５２０は、整流処理モジュール４２０と同等の処理を行う。
サブサンプリング処理モジュール５３０は、サブサンプリング処理モジュール４３０と同等の処理を行う。 <Fourth embodiment>
FIG. 5 is a conceptual module configuration diagram of a configuration example according to the fourth embodiment. To reduce the amount of normalization processing, in the third embodiment, when β = 1 in (Equation 9), calculations of the β power and 1 / β power are not required.
As shown in the example of FIG. 5, the fourth embodiment includes a convolution processing module 510, a rectification processing module 520, a sub-sampling processing module 530, and a normalization processing module 540.
The convolution processing module 510 performs processing equivalent to the convolution processing module 410.
The rectification processing module 520 performs processing equivalent to that of the rectification processing module 420.
The subsampling processing module 530 performs processing equivalent to that of the subsampling processing module 430.

正規化処理モジュール５４０は、差算出処理モジュール５４２ａ、差算出処理モジュール５４２ｃ、和算出処理モジュール５４４ａ、和算出処理モジュール５４４ｃ、除算処理モジュール５４６ａ、除算処理モジュール５４６ｃを有している。正規化処理モジュール５４０は、複数のサブサンプリング処理モジュール５３０による処理結果に対して、正規化処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。
差算出処理モジュール５４２ａは、サブサンプリング処理モジュール５３０ａ、サブサンプリング処理モジュール５３０ｂ、除算処理モジュール５４６ａと接続されている。
和算出処理モジュール５４４ａは、サブサンプリング処理モジュール５３０ａ、サブサンプリング処理モジュール５３０ｂ、除算処理モジュール５４６ａと接続されている。
除算処理モジュール５４６ａは、差算出処理モジュール５４２ａ、和算出処理モジュール５４４ａと接続されている。
正規化処理モジュール５４０が行う正規化処理として、２つのサブサンプリング処理モジュール５３０（例えば、サブサンプリング処理モジュール５３０ａとサブサンプリング処理モジュール５３０ｂのペア）による処理結果間の差を、その処理結果の和を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、サブサンプリング処理モジュール５３０ａ、サブサンプリング処理モジュール５３０ｂのいずれが一方であり、他方であってもよい。 The normalization processing module 540 includes a difference calculation processing module 542a, a difference calculation processing module 542c, a sum calculation processing module 544a, a sum calculation processing module 544c, a division processing module 546a, and a division processing module 546c. The normalization processing module 540 extracts the feature amounts of the images or feature maps 100A, 100B, and 100C by performing normalization processing on the processing results of the plurality of sub-sampling processing modules 530. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.
The difference calculation processing module 542a is connected to the sub-sampling processing module 530a, the sub-sampling processing module 530b, and the division processing module 546a.
The sum calculation processing module 544a is connected to the sub-sampling processing module 530a, the sub-sampling processing module 530b, and the division processing module 546a.
The division processing module 546a is connected to the difference calculation processing module 542a and the sum calculation processing module 544a.
As the normalization processing performed by the normalization processing module 540, the difference between the processing results of the two sub-sampling processing modules 530 (for example, a pair of the sub-sampling processing module 530a and the sub-sampling processing module 530b) Normalize by dividing as a divisor. The difference between the processing results is to subtract the other processing result from one processing result, and either the sub-sampling processing module 530a or the sub-sampling processing module 530b is one and may be the other. .

この場合（式９）は（式１０）のように書き換えられる。

なお、ｔは整流処理の出力特徴量、ｕはサブサンプリング処理の出力、ｙは特徴抽出層の出力、ｉ、ｐは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標である。 In this case, (Equation 9) is rewritten as (Equation 10).

T is an output feature value of the rectification process, u is an output of the sub-sampling process, y is an output of the feature extraction layer, i and p are indexes of the feature quantity map, j and k are two-dimensional coordinates in the feature quantity map, q and r are two-dimensional coordinates of the convolution filter.

＜第５の実施の形態＞
図６は、第５の実施の形態の構成例についての概念的なモジュール構成図である。第３の実施の形態及び第４の実施の形態において、サブサンプリング処理は、複数の畳込み処理モジュールによる処理結果を入力とする一般化平均に拡張できる。
第５の実施の形態は、図６の例に示すように、畳込み処理モジュール６１０、整流＋サブサンプリング処理モジュール６２０、正規化処理モジュール６３０を有している。
畳込み処理モジュール６１０は、畳込み処理モジュール５１０と同等の処理を行う。 <Fifth embodiment>
FIG. 6 is a conceptual module configuration diagram of an exemplary configuration according to the fifth embodiment. In the third embodiment and the fourth embodiment, the sub-sampling process can be extended to a generalized average that receives processing results from a plurality of convolution processing modules.
As shown in the example of FIG. 6, the fifth embodiment includes a convolution processing module 610, a rectification + subsampling processing module 620, and a normalization processing module 630.
The convolution processing module 610 performs processing equivalent to the convolution processing module 510.

整流＋サブサンプリング処理モジュール６２０は、整流処理モジュール６２２ａ、整流処理モジュール６２２ｂ、整流処理モジュール６２２ｃ、整流処理モジュール６２２ｄ、整流処理モジュール６２２ｅ、整流処理モジュール６２２ｆ、和算出処理モジュール６２４ａ、和算出処理モジュール６２４ｄ、サブサンプリング処理モジュール６２６ａ、サブサンプリング処理モジュール６２６ｄ、平均化処理モジュール６２８ａ、平均化処理モジュール６２８ｄを有している。
整流処理モジュール６２２ａは、畳込み処理モジュール６１０ａ、和算出処理モジュール６２４ａと接続されている。整流処理モジュール６２２ａは、畳込み処理モジュール６１０ａによる処理結果に対して、整流処理を行う。例えば、畳込み処理モジュール６１０ａによる処理結果の絶対値のｒ乗の算出を行う。
和算出処理モジュール６２４ａは、整流処理モジュール６２２ａ、整流処理モジュール６２２ｂ、整流処理モジュール６２２ｃ、サブサンプリング処理モジュール６２６ａと接続されている。和算出処理モジュール６２４ａは、複数の整流処理モジュール６２２（ｒ個の整流処理モジュール６２２ａ、６２２ｂ、・・・、６２２ｃ）による処理結果を加算する。
サブサンプリング処理モジュール６２６ａは、和算出処理モジュール６２４ａ、平均化処理モジュール６２８ａと接続されている。サブサンプリング処理モジュール６２６ａは、和算出処理モジュール６２４ａによる処理結果に対して、サブサンプリング処理を行う。
平均化処理モジュール６２８ａは、サブサンプリング処理モジュール６２６ａ、差算出処理モジュール６３２、和算出処理モジュール６３４と接続されている。平均化処理モジュール６２８ａは、サブサンプリング処理モジュール６２６ａによる処理結果に対して、平均化処理を行う。例えば、サブサンプリング処理モジュール６２６ａによる処理結果のｒ乗根の算出を行う。 The rectification + subsampling processing module 620 includes a rectification processing module 622a, a rectification processing module 622b, a rectification processing module 622c, a rectification processing module 622d, a rectification processing module 622e, a rectification processing module 622f, a sum calculation processing module 624a, and a sum calculation processing module 624d. A sub-sampling processing module 626a, a sub-sampling processing module 626d, an averaging processing module 628a, and an averaging processing module 628d.
The rectification processing module 622a is connected to the convolution processing module 610a and the sum calculation processing module 624a. The rectification processing module 622a performs rectification processing on the processing result of the convolution processing module 610a. For example, the absolute value of the processing result by the convolution processing module 610a is calculated to the power of r.
The sum calculation processing module 624a is connected to the rectification processing module 622a, the rectification processing module 622b, the rectification processing module 622c, and the sub-sampling processing module 626a. The sum calculation processing module 624a adds the processing results of the plurality of rectification processing modules 622 (r rectification processing modules 622a, 622b,..., 622c).
The sub-sampling processing module 626a is connected to the sum calculation processing module 624a and the averaging processing module 628a. The sub-sampling processing module 626a performs sub-sampling processing on the processing result by the sum calculation processing module 624a.
The averaging processing module 628a is connected to the sub-sampling processing module 626a, the difference calculation processing module 632, and the sum calculation processing module 634. The averaging processing module 628a performs an averaging process on the processing result obtained by the sub-sampling processing module 626a. For example, the r-th root of the processing result by the sub-sampling processing module 626a is calculated.

正規化処理モジュール６３０は、差算出処理モジュール６３２、和算出処理モジュール６３４、除算処理モジュール６３６を有している。正規化処理モジュール６３０は、複数の平均化処理モジュール６２８による処理結果に対して、正規化処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。
差算出処理モジュール６３２は、平均化処理モジュール６２８ａ、平均化処理モジュール６２８ｄ、除算処理モジュール６３６と接続されている。
和算出処理モジュール６３４は、平均化処理モジュール６２８ａ、平均化処理モジュール６２８ｄ、除算処理モジュール６３６と接続されている。
除算処理モジュール６３６は、差算出処理モジュール６３２、和算出処理モジュール６３４と接続されている。
正規化処理モジュール６３０が行う正規化処理として、２つの平均化処理モジュール６２８（例えば、平均化処理モジュール６２８ａと平均化処理モジュール６２８ｄ）による処理結果間の差を、その処理結果の和を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、平均化処理モジュール６２８ａ、平均化処理モジュール６２８ｄのいずれが一方であり、他方であってもよい。 The normalization processing module 630 includes a difference calculation processing module 632, a sum calculation processing module 634, and a division processing module 636. The normalization processing module 630 extracts the feature amounts of the images or feature maps 100A, 100B, and 100C by performing normalization processing on the processing results of the plurality of averaging processing modules 628. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.
The difference calculation processing module 632 is connected to the averaging processing module 628a, the averaging processing module 628d, and the division processing module 636.
The sum calculation processing module 634 is connected to the averaging processing module 628a, the averaging processing module 628d, and the division processing module 636.
The division processing module 636 is connected to the difference calculation processing module 632 and the sum calculation processing module 634.
As the normalization processing performed by the normalization processing module 630, the difference between the processing results of the two averaging processing modules 628 (for example, the averaging processing module 628a and the averaging processing module 628d) is used, and the sum of the processing results is used as a divisor. Normalize by dividing. Note that the difference between the processing results is that one processing result is subtracted from the other processing result, and either the averaging processing module 628a or the averaging processing module 628d is one and may be the other. .

この場合、（式３）及び（式１０）は（式１１）のように書き換えられる。

なお、ｓは畳込み処理の出力特徴量、ｕは整流及びサブサンプリング処理の出力、ｙは特徴抽出層の出力、ｉ、ｐは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標、ｗ及びβは予め与えられる重み付き一般化平均のパラメータである。
正規化処理は、入力の重み付き一般化平均を除数としてもよい（（式１１）の第１式）。 In this case, (Expression 3) and (Expression 10) are rewritten as (Expression 11).

Note that s is an output feature value of the convolution process, u is an output of the rectification and sub-sampling process, y is an output of the feature extraction layer, i and p are feature map indices, and j and k are 2 in the feature map. Dimensional coordinates, q and r are two-dimensional coordinates of the convolution filter, and w and β are parameters of a weighted generalized average given in advance.
In the normalization process, a weighted generalized average of the input may be used as a divisor (the first expression of (Expression 11)).

＜第６の実施の形態＞
図７は、第６の実施の形態の構成例についての概念的なモジュール構成図である。画像の局所的な位相変動に対して不変な出力が得られるようにするために、第５の実施の形態において、サブサンプリング処理へ入力する畳込みフィルタ応答の数を２に、ｒ＝２にする（視覚のエネルギーモデル［Ａｄｅｌｓｏｎ，ＥｄｗａｒｄＨａｎｄＪａｍｅｓＲＢｅｒｇｅｎ． “Ｓｐａｔｉｏｔｅｍｐｏｒａｌｅｎｅｒｇｙｍｏｄｅｌｓｆｏｒｔｈｅｐｅｒｃｅｐｔｉｏｎｏｆｍｏｔｉｏｎ．” ＪｏｕｒｎａｌｏｆｔｈｅＯｐｔｉｃａｌＳｏｃｉｅｔｙｏｆＡｍｅｒｉｃａＡ２．２（１９８５）：２８４−２９９．］参照）。
学習時間の短縮及び認識率を向上させるために、畳込みフィルタが学習によって得るべき局所的な不変性を、予め学習しやすい構成とする。
第６の実施の形態は、図７の例に示すように、畳込み処理モジュール７１０、整流＋サブサンプリング処理モジュール７２０、正規化処理モジュール７３０を有している。
畳込み処理モジュール７１０は、畳込み処理モジュール６１０と同等の処理を行う。 <Sixth Embodiment>
FIG. 7 is a conceptual module configuration diagram of a configuration example according to the sixth embodiment. In the fifth embodiment, the number of convolution filter responses input to the sub-sampling process is set to 2 and r = 2 in order to obtain an output that is invariant to local phase fluctuations of the image. (Adelson, Edward H and James R Bergen. “Spatiotemporal energy models for the perception of motion. 228 Journal of the Optics.” Journal of the Optics.
In order to shorten the learning time and improve the recognition rate, the local invariance that the convolution filter should obtain by learning is configured to be easy to learn in advance.
As shown in the example of FIG. 7, the sixth embodiment includes a convolution processing module 710, a rectification + subsampling processing module 720, and a normalization processing module 730.
The convolution processing module 710 performs processing equivalent to the convolution processing module 610.

整流＋サブサンプリング処理モジュール７２０は、整流処理モジュール７２２ａ、整流処理モジュール７２２ｂ、整流処理モジュール７２２ｃ、整流処理モジュール７２２ｄ、和算出処理モジュール７２４ａ、和算出処理モジュール７２４ｄ、サブサンプリング処理モジュール７２６ａ、サブサンプリング処理モジュール７２６ｄ、平均化処理モジュール７２８ａ、平均化処理モジュール７２８ｄを有している。
整流処理モジュール７２２ａは、畳込み処理モジュール７１０ａ、和算出処理モジュール７２４ａと接続されている。整流処理モジュール７２２ａは、畳込み処理モジュール７１０ａによる処理結果を入力として、その入力の絶対値に対してｒ乗を行う処理（ｒは正の整数、好適には例えばｒ＝２の場合は二乗の算出処理）による整流処理を行う。
和算出処理モジュール７２４ａは、整流処理モジュール７２２ａ、整流処理モジュール７２２ｂ、サブサンプリング処理モジュール７２６ａと接続されている。和算出処理モジュール７２４ａは、２個の整流処理モジュール７２２（整流処理モジュール７２２ａ、７２２ｂ）による処理結果を加算する。
サブサンプリング処理モジュール７２６ａは、和算出処理モジュール７２４ａ、平均化処理モジュール７２８ａと接続されている。サブサンプリング処理モジュール７２６ａは、和算出処理モジュール７２４ａによる処理結果に対して、サブサンプリング処理を行う。
平均化処理モジュール７２８ａは、サブサンプリング処理モジュール７２６ａ、差算出処理モジュール７３２、和算出処理モジュール７３４と接続されている。平均化処理モジュール７２８ａは、サブサンプリング処理モジュール７２６ａによる処理結果に対して、r' 乗を行う処理（好適にはｒ’＝１／ｒ、例えばｒ＝２の場合は平方根処理）を行うことによって平均化処理を行う。 The rectification + subsampling processing module 720 includes a rectification processing module 722a, a rectification processing module 722b, a rectification processing module 722c, a rectification processing module 722d, a sum calculation processing module 724a, a sum calculation processing module 724d, a subsampling processing module 726a, and a subsampling processing. A module 726d, an averaging processing module 728a, and an averaging processing module 728d are provided.
The rectification processing module 722a is connected to the convolution processing module 710a and the sum calculation processing module 724a. The rectification processing module 722a receives the processing result of the convolution processing module 710a as an input, and performs a process of performing the r-th power on the absolute value of the input (r is a positive integer, preferably, for example, a square of r = 2 The rectification process by the calculation process is performed.
The sum calculation processing module 724a is connected to the rectification processing module 722a, the rectification processing module 722b, and the sub-sampling processing module 726a. The sum calculation processing module 724a adds the processing results of the two rectification processing modules 722 (rectification processing modules 722a and 722b).
The sub-sampling processing module 726a is connected to the sum calculation processing module 724a and the averaging processing module 728a. The sub-sampling processing module 726a performs sub-sampling processing on the processing result by the sum calculation processing module 724a.
The averaging processing module 728a is connected to the sub-sampling processing module 726a, the difference calculation processing module 732, and the sum calculation processing module 734. The averaging processing module 728a performs the process of performing the power of r ′ on the processing result of the sub-sampling processing module 726a (preferably r ′ = 1 / r, for example, the square root process when r = 2). Perform an averaging process.

正規化処理モジュール７３０は、差算出処理モジュール７３２、和算出処理モジュール７３４、除算処理モジュール７３６を有している。正規化処理モジュール７３０は、２個の平均化処理モジュール７２８による処理結果に対して、正規化処理を行うことによって、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃの特徴量を抽出する。そして、次の段の特徴抽出層へ出力する。又は、識別器に出力する。識別器は、受け付けた特徴量に基づいて、画像又は特徴マップ１００Ａ、１００Ｂ、１００Ｃを識別する。
差算出処理モジュール７３２は、平均化処理モジュール７２８ａ、平均化処理モジュール７２８ｄ、除算処理モジュール７３６と接続されている。
和算出処理モジュール７３４は、平均化処理モジュール７２８ａ、平均化処理モジュール７２８ｄ、除算処理モジュール７３６と接続されている。
除算処理モジュール７３６は、差算出処理モジュール７３２、和算出処理モジュール７３４と接続されている。
正規化処理モジュール７３０が行う正規化処理として、２つの平均化処理モジュール７２８（例えば、平均化処理モジュール７２８ａと平均化処理モジュール７２８ｄ）による処理結果間の差を、その処理結果の和を除数として除算することによって正規化する。なお、処理結果間の差とは、一方の処理結果から他方の処理結果を減算することであり、平均化処理モジュール７２８ａ、平均化処理モジュール７２８ｄのいずれが一方であり、他方であってもよい。 The normalization processing module 730 includes a difference calculation processing module 732, a sum calculation processing module 734, and a division processing module 736. The normalization processing module 730 extracts the feature amounts of the images or feature maps 100A, 100B, and 100C by performing normalization processing on the processing results of the two averaging processing modules 728. And it outputs to the feature extraction layer of the next stage. Or it outputs to a discriminator. The discriminator identifies the images or feature maps 100A, 100B, and 100C based on the received feature amount.
The difference calculation processing module 732 is connected to the averaging processing module 728a, the averaging processing module 728d, and the division processing module 736.
The sum calculation processing module 734 is connected to the averaging processing module 728a, the averaging processing module 728d, and the division processing module 736.
The division processing module 736 is connected to the difference calculation processing module 732 and the sum calculation processing module 734.
As the normalization processing performed by the normalization processing module 730, the difference between the processing results of the two averaging processing modules 728 (for example, the averaging processing module 728a and the averaging processing module 728d) is used, and the sum of the processing results is used as a divisor. Normalize by dividing. Note that the difference between the processing results is that one processing result is subtracted from the other processing result, and either the averaging processing module 728a or the averaging processing module 728d is one and may be the other. .

この場合、（式１１）は（式１２）のように書き換えられる。

なお、ｓは畳込み処理の出力特徴量、ｕは整流及びサブサンプリング処理の出力、ｙは特徴抽出層の出力、ｉ、ｐは特徴量マップのインデックス、ｊ、ｋは特徴量マップ内の２次元座標、ｑ、ｒは畳込みフィルタの２次元座標、ｗは予め与えられる重み付き一般化平均のパラメータである。
正規化処理は、（式１２）の第２式としてもよい。 In this case, (Equation 11) is rewritten as (Equation 12).

Note that s is an output feature value of the convolution process, u is an output of the rectification and sub-sampling process, y is an output of the feature extraction layer, i and p are feature map indices, and j and k are 2 in the feature map. Dimensional coordinates, q and r are two-dimensional coordinates of the convolution filter, and w is a weighted generalized average parameter given in advance.
The normalization process may be the second expression of (Expression 12).

図８を参照して、前述の実施の形態の画像処理装置のハードウェア構成例について説明する。図８に示す構成は、例えばパーソナルコンピュータ（ＰＣ）などによって構成されるものであり、スキャナ等のデータ読み取り部８１７と、プリンタなどのデータ出力部８１８を備えたハードウェア構成例を示している。 A hardware configuration example of the image processing apparatus according to the above-described embodiment will be described with reference to FIG. The configuration shown in FIG. 8 is configured by a personal computer (PC), for example, and shows a hardware configuration example including a data reading unit 817 such as a scanner and a data output unit 818 such as a printer.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）８０１は、前述の実施の形態において説明した各種のモジュール、すなわち、畳込み処理モジュール１１０、整流処理モジュール１２０、正規化処理モジュール１３０、サブサンプリング処理モジュール１４０、特徴抽出層１：２１０等の各モジュールの実行シーケンスを記述したコンピュータ・プログラムにしたがった処理を実行する制御部である。 The CPU (Central Processing Unit) 801 is the various modules described in the above-described embodiments, that is, the convolution processing module 110, the rectification processing module 120, the normalization processing module 130, the sub-sampling processing module 140, and the feature extraction layer 1 A control unit that executes processing according to a computer program describing an execution sequence of each module such as 210.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）８０２は、ＣＰＵ８０１が使用するプログラムや演算パラメータ等を格納する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）８０３は、ＣＰＵ８０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を格納する。これらはＣＰＵバスなどから構成されるホストバス８０４により相互に接続されている。 A ROM (Read Only Memory) 802 stores programs used by the CPU 801, operation parameters, and the like. A RAM (Random Access Memory) 803 stores programs used in the execution of the CPU 801, parameters that change as appropriate during the execution, and the like. These are connected to each other via a host bus 804 including a CPU bus.

ホストバス８０４は、ブリッジ８０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス８０６に接続されている。 The host bus 804 is connected to an external bus 806 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 805.

キーボード８０８、マウス等のポインティングデバイス８０９は、操作者により操作される入力デバイスである。ディスプレイ８１０は、液晶表示装置又はＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）などがあり、各種情報をテキストやイメージ情報として表示する。 A keyboard 808 and a pointing device 809 such as a mouse are input devices operated by an operator. The display 810 includes a liquid crystal display device or a CRT (Cathode Ray Tube), and displays various types of information as text or image information.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）８１１は、ハードディスクを内蔵し、ハードディスクを駆動し、ＣＰＵ８０１によって実行するプログラムや情報を記録又は再生させる。ハードディスクには、対象としている画像、認識結果、学習結果などが格納される。さらに、その他の各種のデータ処理プログラム等、各種コンピュータ・プログラムが格納される。 An HDD (Hard Disk Drive) 811 includes a hard disk, drives the hard disk, and records or reproduces a program executed by the CPU 801 and information. The hard disk stores a target image, a recognition result, a learning result, and the like. Further, various computer programs such as various other data processing programs are stored.

ドライブ８１２は、装着されている磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブル記録媒体８１３に記録されているデータ又はプログラムを読み出して、そのデータ又はプログラムを、インタフェース８０７、外部バス８０６、ブリッジ８０５、及びホストバス８０４を介して接続されているＲＡＭ８０３に供給する。リムーバブル記録媒体８１３も、ハードディスクと同様のデータ記録領域として利用可能である。 The drive 812 reads out data or a program recorded in a removable recording medium 813 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and reads the data or program as an interface 807 or an external bus 806. , And supplied to the RAM 803 connected via the bridge 805 and the host bus 804. The removable recording medium 813 can also be used as a data recording area similar to a hard disk.

接続ポート８１４は、外部接続機器８１５を接続するポートであり、ＵＳＢ、ＩＥＥＥ１３９４等の接続部を持つ。接続ポート８１４は、インタフェース８０７、及び外部バス８０６、ブリッジ８０５、ホストバス８０４等を介してＣＰＵ８０１等に接続されている。通信部８１６は、通信回線に接続され、外部とのデータ通信処理を実行する。データ読み取り部８１７は、例えばスキャナであり、ドキュメントの読み取り処理を実行する。データ出力部８１８は、例えばプリンタであり、ドキュメントデータの出力処理を実行する。 The connection port 814 is a port for connecting the external connection device 815, and has a connection unit such as USB or IEEE1394. The connection port 814 is connected to the CPU 801 and the like via the interface 807, the external bus 806, the bridge 805, the host bus 804, and the like. The communication unit 816 is connected to a communication line and executes data communication processing with the outside. The data reading unit 817 is a scanner, for example, and executes document reading processing. The data output unit 818 is, for example, a printer, and executes document data output processing.

なお、図８に示す画像処理装置のハードウェア構成は、１つの構成例を示すものであり、本実施の形態は、図８に示す構成に限らず、本実施の形態において説明したモジュールを実行可能な構成であればよい。例えば、一部のモジュールを専用のハードウェア（例えば特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）等）で構成してもよく、一部のモジュールは外部のシステム内にあり通信回線で接続しているような形態でもよく、さらに図８に示すシステムが複数互いに通信回線によって接続されていて互いに協調動作するようにしてもよい。また、複写機、ファックス、スキャナ、プリンタ、複合機（スキャナ、プリンタ、複写機、ファックス等のいずれか２つ以上の機能を有している画像処理装置）などに組み込まれていてもよい。 Note that the hardware configuration of the image processing apparatus illustrated in FIG. 8 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 8, and the modules described in the present embodiment are executed. Any configuration is possible. For example, some modules may be configured with dedicated hardware (for example, Application Specific Integrated Circuit (ASIC), etc.), and some modules are in an external system and connected via a communication line In addition, a plurality of systems shown in FIG. 8 may be connected to each other via communication lines so as to cooperate with each other. Further, it may be incorporated in a copying machine, a fax machine, a scanner, a printer, a multifunction machine (an image processing apparatus having any two or more functions of a scanner, a printer, a copying machine, a fax machine, etc.).

なお、前述の各種の実施の形態を組み合わせてもよく（例えば、ある実施の形態内のモジュールを他の実施の形態内に追加する、入れ替えをする等も含む）、また、各モジュールの処理内容として背景技術で説明した技術（「発明を実施するための形態」の説明内で参考文献として挙げたものに記載されている技術を含む）を採用してもよい。 Note that the above-described various embodiments may be combined (for example, adding or replacing a module in one embodiment in another embodiment), and processing contents of each module The technology described in the background art (including the technology described as a reference in the description of “Mode for Carrying Out the Invention”) may be adopted.

なお、説明したプログラムについては、記録媒体に格納して提供してもよく、また、そのプログラムを通信手段によって提供してもよい。その場合、例えば、前記説明したプログラムについて、「プログラムを記録したコンピュータ読み取り可能な記録媒体」の発明として捉えてもよい。
「プログラムを記録したコンピュータ読み取り可能な記録媒体」とは、プログラムのインストール、実行、プログラムの流通などのために用いられる、プログラムが記録されたコンピュータで読み取り可能な記録媒体をいう。
なお、記録媒体としては、例えば、デジタル・バーサタイル・ディスク（ＤＶＤ）であって、ＤＶＤフォーラムで策定された規格である「ＤＶＤ−Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ−ＲＡＭ等」、ＤＶＤ＋ＲＷで策定された規格である「ＤＶＤ＋Ｒ、ＤＶＤ＋ＲＷ等」、コンパクトディスク（ＣＤ）であって、読出し専用メモリ（ＣＤ−ＲＯＭ）、ＣＤレコーダブル（ＣＤ−Ｒ）、ＣＤリライタブル（ＣＤ−ＲＷ）等、ブルーレイ・ディスク（Ｂｌｕ−ｒａｙＤｉｓｃ（登録商標））、光磁気ディスク（ＭＯ）、フレキシブルディスク（ＦＤ）、磁気テープ、ハードディスク、読出し専用メモリ（ＲＯＭ）、電気的消去及び書換可能な読出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、フラッシュ・メモリ、ランダム・アクセス・メモリ（ＲＡＭ）、ＳＤ（ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリーカード等が含まれる。
そして、前記のプログラム又はその一部は、前記記録媒体に記録して保存や流通等させてもよい。また、通信によって、例えば、ローカル・エリア・ネットワーク（ＬＡＮ）、メトロポリタン・エリア・ネットワーク（ＭＡＮ）、ワイド・エリア・ネットワーク（ＷＡＮ）、インターネット、イントラネット、エクストラネット等に用いられる有線ネットワーク、あるいは無線通信ネットワーク、さらにこれらの組み合わせ等の伝送媒体を用いて伝送させてもよく、また、搬送波に乗せて搬送させてもよい。
さらに、前記のプログラムは、他のプログラムの一部分であってもよく、あるいは別個のプログラムと共に記録媒体に記録されていてもよい。また、複数の記録媒体に分割して
記録されていてもよい。また、圧縮や暗号化など、復元可能であればどのような態様で記録されていてもよい。 The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray Disc (registered trademark), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.

１００…画像又は特徴マップ
１１０…畳込み処理モジュール
１２０…整流処理モジュール
１３０…正規化処理モジュール
１３２…差算出処理モジュール
１３４…除数算出処理モジュール
１３６…除算処理モジュール
１４０…サブサンプリング処理モジュール
２００…画像
２１０…特徴抽出層１：
２２０…特徴マップＡ：
２３０…特徴抽出層２：
２４０…特徴マップＢ：
２５０…特徴抽出層３：
２６０…特徴マップＣ：
２７０…出力
３００…画像又は特徴マップ
３１０…畳込み処理モジュール
３２０…整流処理モジュール
３３０…サブサンプリング処理モジュール
３４０…正規化処理モジュール
３４２…差算出処理モジュール
３４４…除数算出処理モジュール
３４６…除算処理モジュール
４００…画像又は特徴マップ
４１０…畳込み処理モジュール
４２０…整流処理モジュール
４３０…サブサンプリング処理モジュール
４４０…正規化処理モジュール
４４２…差算出処理モジュール
４４４…除数算出処理モジュール
４４６…除算処理モジュール
５００…画像又は特徴マップ
５１０…畳込み処理モジュール
５２０…整流処理モジュール
５３０…サブサンプリング処理モジュール
５４０…正規化処理モジュール
５４２…差算出処理モジュール
５４４…和算出処理モジュール
５４６…除算処理モジュール
６００…画像又は特徴マップ
６１０…畳込み処理モジュール
６２０…整流＋サブサンプリング処理モジュール
６２２…整流処理モジュール
６２４…和算出処理モジュール
６２６…サブサンプリング処理モジュール
６２８…平均化処理モジュール
６３０…正規化処理モジュール
６３２…差算出処理モジュール
６３４…和算出処理モジュール
６３６…除算処理モジュール
７００…画像又は特徴マップ
７１０…畳込み処理モジュール
７２０…整流＋サブサンプリング処理モジュール
７２２…整流処理モジュール
７２４…和算出処理モジュール
７２６…サブサンプリング処理モジュール
７２８…平均化処理モジュール
７３０…正規化処理モジュール
７３２…差算出処理モジュール
７３４…和算出処理モジュール
７３６…除算処理モジュール
９００…画像
９１０…特徴抽出層
９２０…特徴抽出層
９３０…識別器
１０００…画像又は特徴マップ
１０１０…畳込み処理モジュール
１０２０…整流処理モジュール
１０３０…正規化処理モジュール
１０３２…平均算出処理モジュール
１０３４…差算出処理モジュール
１０３６…標準偏差算出処理モジュール
１０３８…除算処理モジュール
１０４０…サブサンプリング処理モジュール DESCRIPTION OF SYMBOLS 100 ... Image or feature map 110 ... Convolution processing module 120 ... Rectification processing module 130 ... Normalization processing module 132 ... Difference calculation processing module 134 ... Divisor calculation processing module 136 ... Division processing module 140 ... Subsampling processing module 200 ... Image 210 ... Feature extraction layer 1:
220 ... Feature map A:
230 ... feature extraction layer 2:
240 ... feature map B:
250 ... feature extraction layer 3:
260 ... feature map C:
270 ... Output 300 ... Image or feature map 310 ... Convolution processing module 320 ... Rectification processing module 330 ... Sub-sampling processing module 340 ... Normalization processing module 342 ... Difference calculation processing module 344 ... Divisor calculation processing module 346 ... Division processing module 400 ... image or feature map 410 ... convolution processing module 420 ... rectification processing module 430 ... sub-sampling processing module 440 ... normalization processing module 442 ... difference calculation processing module 444 ... divisor calculation processing module 446 ... division processing module 500 ... image or feature Map 510 ... Convolution processing module 520 ... Rectification processing module 530 ... Sub-sampling processing module 540 ... Normalization processing module 542 ... Difference calculation processing module 54 ... sum calculation processing module 546 ... division processing module 600 ... image or feature map 610 ... convolution processing module 620 ... rectification + subsampling processing module 622 ... rectification processing module 624 ... sum calculation processing module 626 ... subsampling processing module 628 ... average Normalization processing module 630 ... Normalization processing module 632 ... Difference calculation processing module 634 ... Sum calculation processing module 636 ... Division processing module 700 ... Image or feature map 710 ... Convolution processing module 720 ... Rectification + subsampling processing module 722 ... Rectification processing Module 724 ... Sum calculation processing module 726 ... Sub-sampling processing module 728 ... Averaging processing module 730 ... Normalization processing module 732 ... Difference calculation processing module 34 ... Sum calculation processing module 736 ... Division processing module 900 ... Image 910 ... Feature extraction layer 920 ... Feature extraction layer 930 ... Discriminator 1000 ... Image or feature map 1010 ... Convolution processing module 1020 ... Rectification processing module 1030 ... Normalization processing Module 1032 ... Average calculation processing module 1034 ... Difference calculation processing module 1036 ... Standard deviation calculation processing module 1038 ... Division processing module 1040 ... Sub-sampling processing module

Claims

Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of normalization processing means for performing normalization processing on the processing result by the rectification processing means,
A plurality of feature extraction means for extracting the feature quantity of the image by performing a sub-sampling process on the processing result by the normalization processing means;
An identification unit for identifying the image based on the feature amount extracted by the feature extraction unit;
The normalization processing unit normalizes the difference between the processing results of the two rectification processing units by dividing the difference based on the value based on the processing result.

Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
An identification unit for identifying the image based on the feature amount extracted by the feature extraction unit;
The normalization processing means normalizes the difference between the processing results of the two sub-sampling processing means by dividing the difference based on the processing results of the two rectification processing means as a divisor. Image processing device.

Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
An identification unit for identifying the image based on the feature amount extracted by the feature extraction unit;
The normalization processing unit normalizes the difference between the processing results of the two sub-sampling processing units by dividing a difference based on the processing result as a divisor.

Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
An identification unit for identifying the image based on the feature amount extracted by the feature extraction unit;
The normalization processing unit normalizes the difference between the processing results of the two sub-sampling processing units by dividing the difference between the sums of the processing results as a divisor.

Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of addition means for adding the processing results of the plurality of rectification processing means;
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the adding means;
A plurality of averaging processing means for performing an averaging process on the processing result of the sub-sampling processing means;
A plurality of feature extraction means for extracting feature values of the image by performing a normalization process on a processing result by the averaging processing means;
An identification unit for identifying the image based on the feature amount extracted by the feature extraction unit;
The processing by the rectification processing means, the addition means, the sub-sampling processing means, and the averaging processing means is a weighted generalized averaging process,
The normalization processing unit normalizes the difference between the processing results of the two averaging processing units by dividing the difference between the sums of the processing results as a divisor.

The rectification processing performed by the rectification processing means is processing (r is a positive real number) for performing the r-th power on the absolute value of the input,
The image processing apparatus according to claim 5, wherein the averaging process performed by the averaging processing unit is a process of performing r ′ power.

Computer
Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of normalization processing means for performing normalization processing on the processing result by the rectification processing means,
A plurality of feature extraction means for extracting the feature quantity of the image by performing a sub-sampling process on the processing result by the normalization processing means;
Based on the feature amount extracted by the feature extraction means, function as an identification means for identifying the image,
The normalization processing means normalizes the difference between the processing results of the two rectification processing means by dividing the difference based on the value based on the processing results.

Computer
Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
Based on the feature amount extracted by the feature extraction means, function as an identification means for identifying the image,
The normalization processing means normalizes the difference between the processing results of the two sub-sampling processing means by dividing the difference based on the processing results of the two rectification processing means as a divisor. Image processing program.

Computer
Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
Based on the feature amount extracted by the feature extraction means, function as an identification means for identifying the image,
The normalization processing means normalizes the difference between the processing results of the two sub-sampling processing means by dividing the difference based on the value based on the processing results.

Computer
Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the rectification processing means;
A plurality of feature extraction means for extracting feature amounts of the image by performing a normalization process on the processing result of the sub-sampling processing means;
Based on the feature amount extracted by the feature extraction means, function as an identification means for identifying the image,
The normalization processing means normalizes the difference between the processing results of the two sub-sampling processing means by dividing the difference between the sums of the processing results as a divisor.

Computer
Image receiving means for receiving images;
A plurality of convolution processing means for performing convolution processing on the image received by the image receiving means;
A plurality of rectification processing means for performing rectification processing on the processing result by the convolution processing means,
A plurality of addition means for adding the processing results of the plurality of rectification processing means;
A plurality of sub-sampling processing means for performing sub-sampling processing on the processing result by the adding means;
A plurality of averaging processing means for performing an averaging process on the processing result of the sub-sampling processing means;
A plurality of feature extraction means for extracting feature values of the image by performing a normalization process on a processing result by the averaging processing means;
Based on the feature amount extracted by the feature extraction means, function as an identification means for identifying the image,
The processing by the rectification processing means, the addition means, the sub-sampling processing means, and the averaging processing means is a weighted generalized averaging process,
The normalization processing means normalizes the difference between the processing results of the two averaging processing means by dividing the difference between the sums of the processing results as a divisor.