JP6414458B2

JP6414458B2 - Arithmetic processing unit

Info

Publication number: JP6414458B2
Application number: JP2014255058A
Authority: JP
Inventors: 顕一蓑谷; 智章尾崎
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2014-12-17
Filing date: 2014-12-17
Publication date: 2018-10-31
Anticipated expiration: 2034-12-17
Also published as: JP2016115248A

Description

本発明は、演算処理装置に関する。 The present invention relates to an arithmetic processing device.

従来より、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置が考えられている。特に画像認識を行う演算処理装置においては、いわゆる畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）が中核的な存在となっている。 2. Description of the Related Art Conventionally, there has been considered an arithmetic processing device that executes arithmetic operations using a neural network in which a plurality of processing layers are hierarchically connected. Particularly in arithmetic processing devices that perform image recognition, a so-called convolutional neural network (CNN) is at the core.

特許第５１８４８２４号公報Japanese Patent No. 5184824

この種の畳み込みニューラルネットワークは、コンボルーション層およびプーリング層を交互に接続した構造を有する。コンボルーション層では、前階層から入力される入力データに対して学習済みのフィルタを用いたいわゆる畳み込み演算処理が行われ、これにより、入力データに含まれる特徴量が抽出される。コンボルーション層に続くプーリング層では、畳み込み演算処理により得られる特徴マップのうち近傍数画素の領域から最大値または平均値を出力することにより、特徴量の微小な変化に対する応答の不変性を獲得する。即ち、特徴量の微小な変化を吸収する。 This type of convolutional neural network has a structure in which convolution layers and pooling layers are alternately connected. In the convolution layer, so-called convolution calculation processing using a learned filter is performed on the input data input from the previous layer, and thereby the feature amount included in the input data is extracted. In the pooling layer following the convolution layer, the maximum or average value is output from the neighborhood of several pixels in the feature map obtained by the convolution calculation process, thereby obtaining the invariance of the response to a minute change in the feature amount. . That is, a minute change in the feature amount is absorbed.

ところで、畳み込みニューラルネットワークによれば、このようなコンボルーション層およびプーリング層による処理を繰り返すことにより、より高次元の特徴量の抽出を実現する。しかしながら、プーリング処理は、その処理の繰り返しに伴い出力データの解像度が徐々に低下していく低解像度化処理である。そのため、このような低解像度化処理が繰り返されることに伴い、出力データの解像度が徐々に不足していく可能性があり、入力画像における検出対象物の位置を精度良く特定できなくなる懸念がある。 By the way, according to the convolutional neural network, it is possible to extract higher-dimensional feature values by repeating the process using the convolution layer and the pooling layer. However, the pooling process is a resolution reduction process in which the resolution of output data gradually decreases as the process is repeated. Therefore, as the resolution reduction process is repeated, the resolution of the output data may gradually become insufficient, and there is a concern that the position of the detection target in the input image cannot be accurately identified.

そこで、本発明は、ニューラルネットワークによる演算処理を実現する演算処理装置において、低解像度化処理の繰り返しに伴いデータの解像度が低下する場合であっても、入力画像における検出対象物の位置を精度良く特定することを目的とする。 Therefore, the present invention provides an arithmetic processing device that realizes arithmetic processing using a neural network, and accurately positions the detection target in an input image even when the resolution of data decreases as the resolution reduction processing is repeated. The purpose is to identify.

本発明に係る演算処理装置は、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置であり、演算処理手段と、記憶手段と、復元手段と、を備える。演算処理手段は、前階層から入力される入力データに対して、少なくとも畳み込み演算処理および低解像度化処理を実行する。記憶手段は、低解像度化処理の際に特徴を示す画素データのうち最大値を示す画素データの位置情報を記憶する。復元手段は、記憶手段が記憶している位置情報に基づいて、入力データにおける検出対象物の位置を特定する。前記記憶手段は、前階層から複数の前記入力データが入力される場合には、それぞれの前記入力データに対する前記低解像度化処理の際に最大値を示す画素データをそれぞれ特徴画素データとして特定し、これら複数の前記特徴画素データのうち最も大きい値を示す特徴画素データの位置情報を記憶する。そして、前記低解像度化処理の際に最大値を示す画素データの位置情報を記憶しながら、前記畳み込み演算処理および前記低解像度化処理を多階層にわたって繰り返し実行する。
An arithmetic processing apparatus according to the present invention is an arithmetic processing apparatus that executes an operation by a neural network in which a plurality of processing layers are hierarchically connected, and includes an arithmetic processing means, a storage means, and a restoration means. The arithmetic processing means executes at least a convolution operation process and a resolution reduction process on the input data input from the previous layer. The storage unit stores position information of pixel data indicating a maximum value among pixel data indicating characteristics during the resolution reduction processing. The restoration unit specifies the position of the detection target in the input data based on the position information stored in the storage unit. When a plurality of the input data is input from the previous layer, the storage unit specifies pixel data indicating a maximum value as the characteristic pixel data at the time of the resolution reduction processing for each of the input data, The position information of the feature pixel data indicating the largest value among the plurality of feature pixel data is stored. Then, the convolution calculation process and the resolution reduction process are repeatedly executed over multiple layers while storing the position information of the pixel data indicating the maximum value in the resolution reduction process.

この構成によれば、低解像度化処理の繰り返しに伴いデータの解像度が低下する場合であっても、記憶手段が記憶している位置情報に基づいて、入力画像における検出対象物の位置を精度良く特定することができる。 According to this configuration, the position of the detection target in the input image can be accurately determined based on the position information stored in the storage unit even when the resolution of the data decreases as the resolution reduction process is repeated. Can be identified.

畳み込みニューラルネットワークの構成例を概念的に示す図A diagram conceptually showing a configuration example of a convolutional neural network 中間層における演算処理の流れを視覚的に示す図（その１）The figure which shows the flow of arithmetic processing in the middle layer visually (the 1) 中間層における演算処理の流れを視覚的に示す図（その２）A diagram visually showing the flow of arithmetic processing in the intermediate layer (Part 2) 特徴量抽出処理に用いられる一般的な演算式および関数の一例を示す図The figure which shows an example of the general arithmetic expression and function used for the feature-value extraction process 一実施形態に係る演算処理装置の構成例を概略的に示すブロック図1 is a block diagram schematically showing a configuration example of an arithmetic processing apparatus according to an embodiment. 各プーリング層におけるデータの解像度の一例を示す図The figure which shows an example of the resolution of the data in each pooling layer 記憶処理部による位置情報の記憶処理の一例を視覚的に示す図The figure which shows visually an example of the memory | storage process of the positional information by a memory | storage process part 記憶処理部による位置情報の記憶処理の一例を多階層にわたって視覚的に示す図The figure which shows visually an example of the memory | storage process of the positional information by a memory | storage process part over many layers 補正処理部による矩形枠の補正処理の一例を視覚的に示す図The figure which shows visually an example of the correction process of the rectangular frame by a correction process part 複数の特徴マップから位置情報を記憶する処理の一例を視覚的に示す図The figure which shows an example of the process which memorize | stores positional information from a some feature map visually 複数の特徴マップから得られた位置情報に基づく矩形枠の補正処理の一例を視覚的に示す図The figure which shows visually an example of the correction process of the rectangular frame based on the positional information obtained from the several feature map

以下、演算処理装置に係る一実施形態について図面を参照しながら説明する。
（ニューラルネットワーク）
図１には、詳しくは後述する演算処理装置１００に適用されるニューラルネットワーク、この場合、畳み込みニューラルネットワークの構成を概念的に示している。即ち、畳み込みニューラルネットワークＮは、入力データである画像データＤ１から所定の形状やパターンを認識する画像認識技術に応用されるものであり、中間層Ｎａと全結合層Ｎｂとを有する。中間層Ｎａは、複数の特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３，・・・が階層的に接続された構成である。各特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３，・・・は、それぞれコンボルーション層Ｃおよびプーリング層Ｐを備える。 Hereinafter, an embodiment according to an arithmetic processing device will be described with reference to the drawings.
(neural network)
FIG. 1 conceptually shows the configuration of a neural network, in this case, a convolutional neural network, which is applied to an arithmetic processing unit 100 described later in detail. That is, the convolutional neural network N is applied to an image recognition technique for recognizing a predetermined shape or pattern from the image data D1 that is input data, and includes an intermediate layer Na and a total coupling layer Nb. The intermediate layer Na has a configuration in which a plurality of feature quantity extraction processing layers Na1, Na2, Na3,. Each feature amount extraction processing layer Na1, Na2, Na3,... Includes a convolution layer C and a pooling layer P, respectively.

次に、中間層Ｎａにおける処理の流れについて説明する。図２に例示するように、第１層目の特徴量抽出処理層Ｎａ１では、演算処理装置は、入力される画像データＤ１を例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第１層目の特徴量抽出処理層Ｎａ１では、例えば水平方向に延びる線状の特徴量や斜め方向に延びる線状の特徴量などといった比較的シンプルな単独の特徴量を抽出する。 Next, the flow of processing in the intermediate layer Na will be described. As illustrated in FIG. 2, in the first feature amount extraction processing layer Na1, the arithmetic processing unit scans the input image data D1 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. Note that the first feature amount extraction processing layer Na1 extracts relatively simple single feature amounts such as a linear feature amount extending in the horizontal direction and a linear feature amount extending in the oblique direction.

第２層目の特徴量抽出処理層Ｎａ２では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ１から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第２層目の特徴量抽出処理層Ｎａ２では、第１層目の特徴量抽出処理層Ｎａ１で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。 In the second feature amount extraction processing layer Na2, the arithmetic processing unit scans the input data input from the preceding feature amount extraction processing layer Na1 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. In addition, in the feature amount extraction processing layer Na2 of the second layer, by integrating the spatial positional relationship of a plurality of feature amounts extracted by the feature amount extraction processing layer Na1 of the first layer, Extract higher-dimensional composite features.

第３層目の特徴量抽出処理層Ｎａ３では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ２から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第３層目の特徴量抽出処理層Ｎａ３では、第２層目の特徴量抽出処理層Ｎａ２で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このように、複数の特徴量抽出処理層による特徴量の抽出処理を繰り返すことで、演算処理装置は、画像データＤ１に含まれる検出対象物体の画像認識を行う。 In the third feature quantity extraction processing layer Na3, the arithmetic processing unit scans the input data input from the previous feature quantity extraction processing layer Na2 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. The feature extraction processing layer Na3 of the third layer is integrated by considering the spatial positional relationship of a plurality of feature amounts extracted by the feature extraction processing layer Na2 of the second layer, Extract higher-dimensional composite features. In this way, by repeating the feature amount extraction processing by the plurality of feature amount extraction processing layers, the arithmetic processing device performs image recognition of the detection target object included in the image data D1.

演算処理装置は、中間層Ｎａにおいて複数の特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３，・・・による処理を繰り返すことで入力画像データＤ１に含まれる種々の特徴量を高次元で抽出していく。そして、演算処理装置は、中間層Ｎａの処理により得られた結果を中間演算結果データとして全結合層Ｎｂに出力する。 The arithmetic processing unit extracts various feature amounts included in the input image data D1 in a high dimension by repeating the processing by the plurality of feature amount extraction processing layers Na1, Na2, Na3,... In the intermediate layer Na. . Then, the arithmetic processing unit outputs the result obtained by the processing of the intermediate layer Na to the all coupling layer Nb as intermediate operation result data.

全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。即ち、全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合し、さらに、その結合結果に対して重み係数を異ならせながら積和演算を行うことにより、最終的な演算結果データ、即ち、入力データである画像データＤ１に含まれる検出対象物を認識した画像データを出力する。このとき、積和演算による演算結果の値が大きい部分が検出対象物の一部または全部として認識される。 The total coupling layer Nb combines a plurality of intermediate calculation result data obtained from the intermediate layer Na and outputs final calculation result data. That is, the total connection layer Nb combines a plurality of intermediate operation result data obtained from the intermediate layer Na, and further performs a sum-of-products operation while varying the weighting coefficient for the combined result, thereby obtaining a final operation. Result data, that is, image data in which the detection target included in the image data D1 as input data is recognized is output. At this time, the part where the value of the result of the product-sum operation is large is recognized as a part or all of the detection target.

次に、演算処理装置による特徴量抽出処理の流れについて説明する。図３に例示するように、演算処理装置は、複数の演算ブロックを備える構成であり、それぞれの演算ブロックにより演算処理を実行する。即ち、演算処理装置は、前階層の特徴量抽出処理層から入力される入力データＤｎを所定サイズ、この場合、図にてハッチングで示す５×５画素ごとに走査する。なお、画素サイズは、５×５画素に限られず適宜変更することができる。 Next, a flow of feature amount extraction processing by the arithmetic processing device will be described. As illustrated in FIG. 3, the arithmetic processing device is configured to include a plurality of arithmetic blocks, and executes arithmetic processing by each arithmetic block. That is, the arithmetic processing unit scans the input data Dn input from the feature amount extraction processing layer of the previous hierarchy, in this case, every 5 × 5 pixels indicated by hatching in the drawing. The pixel size is not limited to 5 × 5 pixels and can be changed as appropriate.

そして、演算処理装置は、走査したデータに対して、それぞれ周知の畳み込み演算を行う。そして、演算処理装置は、畳み込み演算後のデータに対して周知の活性化処理を行い、コンボルーション層Ｃの出力とする。そして、演算処理装置は、コンボルーション層Ｃの出力データＣｎ１，Ｃｎ２，Ｃｎ３，・・・に対して、それぞれ所定サイズ、この場合、２×２画素ごとに周知のプーリング処理を行い、プーリング層Ｐの出力とする。そして、演算処理装置は、プーリング層Ｐの出力データＰｎ１，Ｐｎ２，Ｐｎ３，・・・を次の階層の特徴量抽出処理層に出力する。なお、画素サイズは、２×２画素に限られず適宜変更することができる。 The arithmetic processing unit performs a known convolution operation on the scanned data. Then, the arithmetic processing unit performs a well-known activation process on the data after the convolution calculation, and outputs the result of the convolution layer C. Then, the arithmetic processing unit performs a well-known pooling process on the output data Cn1, Cn2, Cn3,... Of the convolution layer C for each predetermined size, in this case, every 2 × 2 pixels. Output. Then, the arithmetic processing unit outputs the output data Pn1, Pn2, Pn3,... Of the pooling layer P to the feature amount extraction processing layer of the next layer. The pixel size is not limited to 2 × 2 pixels and can be changed as appropriate.

図４には、畳み込み演算処理に用いられる畳み込み関数、活性化処理に用いられる関数、プーリング処理に用いられる関数の一般的な例を参考として示している。即ち、畳み込み関数ｙｊは、直前の層の出力ｙｉに、学習により得られる重み係数ｗｉｊを乗算した値の和に所定のバイアス値Ｂｊを加算する関数となっている。また、活性化処理には、周知のロジスティックジグモイド関数やＲｅＬＵ関数（Rectified Linear Units）など、あるいは、その他の非線形関数が用いられる。また、プーリング処理には、入力されるデータの最大値を出力する周知の最大プーリング関数や、入力されるデータの平均値を出力する周知の平均プーリング関数などが用いられる。 FIG. 4 shows, as a reference, general examples of a convolution function used for convolution operation processing, a function used for activation processing, and a function used for pooling processing. That is, the convolution function yj is a function that adds a predetermined bias value Bj to the sum of values obtained by multiplying the output yi of the immediately previous layer by the weighting coefficient wij obtained by learning. For the activation process, a well-known logistic sigmoid function, ReLU function (Rectified Linear Units), or other nonlinear functions are used. For the pooling process, a known maximum pooling function that outputs a maximum value of input data, a known average pooling function that outputs an average value of input data, or the like is used.

（一実施形態）
図５に例示する演算処理装置１００は、畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３を備える。図示はしないが、演算処理装置１００は、これら畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３を含む演算ブロック１０４を複数備える。演算ブロック１０４は、演算処理手段の一例である。また、演算処理装置１００は、記憶処理部１０５、復元処理部１０６、補正処理部１０７を備える。演算処理装置１００は、各処理部をソフトウェアにより仮想的に実現する構成としてもよいし、ハードウェアにより実現する構成としてもよいし、ソフトウェアによる構成とハードウェアによる構成を混在させる構成としてもよい。 (One embodiment)
An arithmetic processing device 100 illustrated in FIG. 5 includes a convolution arithmetic processing unit 101, an activation processing unit 102, and a pooling processing unit 103. Although not shown, the arithmetic processing device 100 includes a plurality of arithmetic blocks 104 including the convolution arithmetic processing unit 101, the activation processing unit 102, and the pooling processing unit 103. The arithmetic block 104 is an example of arithmetic processing means. The arithmetic processing apparatus 100 includes a storage processing unit 105, a restoration processing unit 106, and a correction processing unit 107. The arithmetic processing device 100 may have a configuration in which each processing unit is virtually realized by software, a configuration realized by hardware, or a configuration in which a configuration by software and a configuration by hardware are mixed.

畳み込み演算処理部１０１は、前階層から入力される入力データに対して周知の畳み込み演算処理を実行して、その処理結果データを活性化処理部１０２に出力する。活性化処理部１０２は、畳み込み演算処理部１０１による処理結果データに対して周知の活性化処理を実行して、その処理結果データをプーリング処理部１０３に出力する。プーリング処理部１０３は、活性化処理部１０２による処理結果データに対して周知のプーリング処理を実行して、その処理結果データを出力する。即ち、演算ブロック１０４は、この場合、低解像度化処理としてプーリング処理を行う構成である。 The convolution operation processing unit 101 performs a well-known convolution operation process on the input data input from the previous layer, and outputs the processing result data to the activation processing unit 102. The activation processing unit 102 performs a well-known activation process on the processing result data by the convolution operation processing unit 101, and outputs the processing result data to the pooling processing unit 103. The pooling processing unit 103 performs a well-known pooling process on the processing result data from the activation processing unit 102 and outputs the processing result data. That is, the arithmetic block 104 is configured to perform pooling processing as the resolution reduction processing in this case.

記憶処理部１０５は、記憶手段の一例であり、プーリング処理の際に特徴を示す画素データの位置情報を図示しない記憶媒体に記憶する。この場合、記憶処理部１０５は、プーリング処理の際に「最大値」を示す画素データの位置情報を記憶する。記憶媒体としては、例えば半導体メモリなどが考えられる。復元処理部１０６、復元手段の一例であり、記憶処理部１０５が記憶している位置情報に基づいて、入力データ、つまり入力画像における検出対象物の位置を特定する。補正処理部１０７、復元処理部１０６が特定した検出対象物の位置を補正する。 The storage processing unit 105 is an example of a storage unit, and stores position information of pixel data indicating characteristics during pooling processing in a storage medium (not shown). In this case, the storage processing unit 105 stores the position information of the pixel data indicating the “maximum value” during the pooling process. As the storage medium, for example, a semiconductor memory can be considered. The restoration processing unit 106 is an example of a restoration unit, and specifies the position of the detection target in the input data, that is, the input image, based on the position information stored in the storage processing unit 105. The position of the detection target specified by the correction processing unit 107 and the restoration processing unit 106 is corrected.

図６に例示するように、畳み込みニューラルネットワークＮによれば、コンボルーション層Ｃによる処理およびプーリング層Ｐによる処理が繰り返されることにより、より高次元の特徴量の抽出が可能となる。しかしながら、プーリング処理が繰り返されることに伴いデータの解像度が徐々に低下していく。即ち、図６に示す例では、第２層目のプーリング層Ｐ２におけるデータの解像度は、第１層目のプーリング層Ｐ１におけるデータの解像度の１／４まで低下し、第３層目のプーリング層Ｐ３におけるデータの解像度は、第１層目のプーリング層Ｐ１におけるデータの解像度の１／６まで低下している。そのため、入力画像データＤ１における検出対象物の位置を特定するための情報が後層に進むほど不足することとなり、入力画像における検出対象物の位置を精度良く特定できなくなる懸念がある。 As illustrated in FIG. 6, according to the convolutional neural network N, the processing by the convolution layer C and the processing by the pooling layer P are repeated, whereby higher-dimensional feature amounts can be extracted. However, as the pooling process is repeated, the data resolution gradually decreases. That is, in the example shown in FIG. 6, the resolution of the data in the second pooling layer P2 is reduced to ¼ of the resolution of the data in the first pooling layer P1, and the third pooling layer The data resolution in P3 is reduced to 1/6 of the data resolution in the first pooling layer P1. For this reason, the information for specifying the position of the detection target in the input image data D1 becomes insufficient as it advances to the subsequent layer, and there is a concern that the position of the detection target in the input image cannot be specified with high accuracy.

そこで、演算処理装置１００によれば、プーリング処理の繰り返しに伴いデータの解像度が低下する場合であっても、入力画像における検出対象物の位置を精度良く特定するための工夫が施されている。
即ち、図７に例示するように、演算処理装置１００は、記憶処理部１０５により、プーリング処理の際に最大値を示す画素データの位置情報を記憶していく。具体的には、記憶処理部１０５は、１回のプーリング処理により処理される画素群Ｇ、この場合、２×２の４画素ごとに位置情報「１」〜「４」を設定する。そして、記憶処理部１０５は、プーリング処理の際に、これら４画素のうち最大値を示す画素データの位置情報を記憶する。この場合、第１層目の４画素のうち位置情報「４」の画素が最大値を示しているものとする。そのため、記憶処理部１０５は、第１層目で最大値を示した画素の位置情報が「４」であることを記憶して、次の第２層に伝達する。 Therefore, according to the arithmetic processing device 100, even when the resolution of data is reduced as the pooling process is repeated, a device is provided for accurately identifying the position of the detection target in the input image.
That is, as illustrated in FIG. 7, the arithmetic processing apparatus 100 causes the storage processing unit 105 to store the position information of the pixel data indicating the maximum value during the pooling process. Specifically, the storage processing unit 105 sets position information “1” to “4” for each pixel group G processed by one pooling process, in this case, 2 × 2 four pixels. And the memory | storage process part 105 memorize | stores the positional information on the pixel data which shows the maximum value among these four pixels in the case of a pooling process. In this case, it is assumed that the pixel of the position information “4” among the four pixels in the first layer shows the maximum value. For this reason, the storage processing unit 105 stores that the position information of the pixel having the maximum value in the first layer is “4”, and transmits it to the next second layer.

第２層目では、第１層目よりも解像度が低下した状態で、再び、１回のプーリング処理により処理される４画素ごとに位置情報「１」〜「４」が設定される。この場合、第２層目の位置情報「２」の画素に、前層において最大値を示した画素の位置情報が「４」であることを示す情報「２［４］」が示されている。なお、情報「２［４］」のうち括弧内に示されている数値が、前層において最大値を示した画素の位置情報である。このようにして、演算処理装置１００は、図８に例示するように、プーリング処理の際に最大値を示す画素データの位置情報を記憶しながら、コンボルーション層Ｃによる処理およびプーリング層Ｐによる処理を多階層にわたって繰り返し実行する。 In the second layer, the position information “1” to “4” is set for every four pixels processed by one pooling process in a state where the resolution is lower than that in the first layer. In this case, information “2 [4]” indicating that the position information of the pixel having the maximum value in the previous layer is “4” is displayed in the pixel of the position information “2” in the second layer. . It should be noted that the numerical value indicated in parentheses in the information “2 [4]” is pixel position information indicating the maximum value in the previous layer. In this way, as illustrated in FIG. 8, the arithmetic processing device 100 stores the position information of the pixel data indicating the maximum value during the pooling process, and the process by the convolution layer C and the process by the pooling layer P. Is repeatedly executed over multiple layers.

そして、演算処理装置１００は、各層の処理において記憶した位置情報に基づいて、復元処理部１０６により、入力画像データＤ１、換言すれば入力画像における検出対象物の位置を特定する。即ち、図９に例示するように、演算処理装置１００は、例えば第３層目で記憶した位置情報Ｊ３に基づいて、入力画像において検出対象物が存在すると推定される領域を囲む矩形枠Ｗ３を設定する。この矩形枠Ｗ３は、第３層目で記憶された複数の位置情報Ｊ３からなる位置情報群を含む最小の大きさの枠である。 Then, the arithmetic processing device 100 specifies the position of the detection target object in the input image data D1, in other words, the input image, by the restoration processing unit 106 based on the position information stored in the processing of each layer. That is, as illustrated in FIG. 9, for example, the arithmetic processing device 100 sets a rectangular frame W3 that surrounds an area where it is estimated that a detection target exists in the input image based on the position information J3 stored in the third layer. Set. The rectangular frame W3 is a frame having a minimum size including a position information group including a plurality of position information J3 stored in the third layer.

しかし、第３層目の位置情報Ｊ３に基づく矩形枠Ｗ３は、検出対象物を含んではいるものの、検出対象物が存在しない領域も多く含んでいる。そのため、入力画像において検出対象物が存在する位置の特定精度が良くない。そこで、演算処理装置１００は、補正処理部１０７により、復元処理部１０６が設定した矩形枠Ｗ２を補正する。即ち、演算処理装置１００は、第２層目で記憶した位置情報Ｊ２に基づいて、矩形枠Ｗ３を、当該矩形枠Ｗ３よりも小さい矩形枠Ｗ２に補正する。この矩形枠Ｗ２は、第２層目で記憶された複数の位置情報Ｊ２からなる位置情報群を含む最小の大きさの枠である。矩形枠Ｗ２によれば、矩形枠Ｗ３に比べ、検出対象物が存在しない領域を少なくすることができる。よって、入力画像において検出対象物が存在する位置の特定精度を向上することができる。 However, the rectangular frame W3 based on the position information J3 in the third layer includes a detection target object, but also includes many areas where the detection target object does not exist. Therefore, the accuracy of specifying the position where the detection target exists in the input image is not good. Therefore, the arithmetic processing apparatus 100 corrects the rectangular frame W2 set by the restoration processing unit 106 by the correction processing unit 107. That is, the arithmetic processing device 100 corrects the rectangular frame W3 to a rectangular frame W2 smaller than the rectangular frame W3 based on the position information J2 stored in the second layer. The rectangular frame W2 is a frame having a minimum size including a position information group including a plurality of position information J2 stored in the second layer. According to the rectangular frame W2, compared with the rectangular frame W3, it is possible to reduce the area where the detection target does not exist. Therefore, it is possible to improve the accuracy of specifying the position where the detection target exists in the input image.

しかし、第２層目の位置情報Ｊ２に基づく矩形枠Ｗ２によっても、検出対象物が存在しない領域が残存している。そこで、演算処理装置１００は、補正処理部１０７により、矩形枠Ｗ２をさらに補正する。即ち、演算処理装置１００は、第１層目で記憶した位置情報Ｊ１に基づいて、矩形枠Ｗ２を、当該矩形枠Ｗ２よりも小さい矩形枠Ｗ１に補正する。この矩形枠Ｗ１は、第１層目で記憶された複数の位置情報Ｊ１からなる位置情報群を含む最小の大きさの枠である。矩形枠Ｗ１によれば、矩形枠Ｗ２に比べ、検出対象物が存在しない領域をさらに少なくすることができる。よって、入力画像において検出対象物が存在する位置の特定精度をさらに向上することができる。 However, even in the rectangular frame W2 based on the position information J2 of the second layer, a region where no detection target exists remains. Therefore, the arithmetic processing apparatus 100 further corrects the rectangular frame W2 by the correction processing unit 107. That is, the arithmetic processing device 100 corrects the rectangular frame W2 to a rectangular frame W1 smaller than the rectangular frame W2 based on the position information J1 stored in the first layer. The rectangular frame W1 is a frame having a minimum size including a position information group including a plurality of pieces of position information J1 stored in the first layer. According to the rectangular frame W1, it is possible to further reduce the area where the detection target does not exist as compared to the rectangular frame W2. Therefore, it is possible to further improve the accuracy of specifying the position where the detection target exists in the input image.

ところで、実際の畳み込みニューラルネットワークによる処理によれば、前階層からは、複数の入力データが特徴マップとして入力される。そこで、演算処理装置１００は、記憶処理部１０５により、それぞれの特徴マップに対するプーリング処理の際に最大値を示す画素データをそれぞれ特徴画素データとして特定するように構成されている。そして、記憶処理部１０５は、これら複数の特徴画素データのうち最も大きい値を示す特徴画素データの位置情報を記憶するように構成されている。 By the way, according to the processing by the actual convolutional neural network, a plurality of input data is input as a feature map from the previous layer. Therefore, the arithmetic processing unit 100 is configured to specify, as the feature pixel data, the pixel data indicating the maximum value in the pooling process for each feature map by the storage processing unit 105. The storage processing unit 105 is configured to store position information of feature pixel data indicating the largest value among the plurality of feature pixel data.

即ち、図１０には、第２層目に対する入力データとして２つの特徴マップＭ１，Ｍ２が入力される場合を示している。特徴マップＭ１では、領域Ｒ１に含まれる画像群のうち最大値を示す特徴画素データＤｓの画素値は「２．７」であり、特徴マップＭ２では、領域Ｒ１に含まれる画素群のうち最大値を示す特徴画素データＤｓの画素値は「０．３」である。そのため、記憶処理部１０５は、これら特徴画素データＤｓの画素値うち最も大きい値である「２．７」を記憶して、後層に伝達する。 That is, FIG. 10 shows a case where two feature maps M1 and M2 are input as input data for the second layer. In the feature map M1, the pixel value of the feature pixel data Ds indicating the maximum value among the image groups included in the region R1 is “2.7”, and in the feature map M2, the maximum value among the pixel groups included in the region R1. The pixel value of the characteristic pixel data Ds indicating “0.3” is “0.3”. Therefore, the storage processing unit 105 stores “2.7”, which is the largest value among the pixel values of the feature pixel data Ds, and transmits it to the subsequent layer.

また、特徴マップＭ１では、領域Ｒ２に含まれる画像群のうち最大値を示す特徴画素データＤｓの画素値は「０．４」であり、特徴マップＭ２では、領域Ｒ２に含まれる画素群のうち最大値を示す特徴画素データＤｓの画素値は「３．４」である。そのため、記憶処理部１０５は、これら特徴画素データＤｓの画素値うち最も大きい値である「３．４」を記憶して、後層に伝達する。 In the feature map M1, the pixel value of the feature pixel data Ds indicating the maximum value among the image groups included in the region R2 is “0.4”. In the feature map M2, the pixel value included in the region R2 The pixel value of the feature pixel data Ds indicating the maximum value is “3.4”. Therefore, the storage processing unit 105 stores “3.4”, which is the largest value among the pixel values of the feature pixel data Ds, and transmits it to the subsequent layer.

このように、演算処理装置１００は、複数の特徴マップ間で特徴量の比較を行い、最も顕著に特徴を抽出している特徴マップの位置情報を記憶して後層に伝達するようになっている。なお、特徴マップＭ１は、入力画像において主として横方向に延びる対象物の特徴を抽出するように学習されたフィルタにより得られる特徴マップの一例である。また、特徴マップＭ２は、入力画像において主として縦方向に延びる対象物の特徴を抽出するように学習されたフィルタにより得られる特徴マップの一例である。 In this way, the arithmetic processing device 100 compares feature amounts among a plurality of feature maps, stores the position information of the feature map from which features are most prominently extracted, and transmits them to the subsequent layer. Yes. Note that the feature map M1 is an example of a feature map obtained by a filter learned to extract features of an object mainly extending in the horizontal direction in the input image. The feature map M2 is an example of a feature map obtained by a filter learned to extract features of an object that mainly extends in the vertical direction in the input image.

また、演算処理装置１００により処理される特徴マップは、図１０に例示したものに限られない。また、演算処理装置１００は、複数の特徴マップ間で特徴量を比較する場合には、図示しない正規化処理部により、特徴マップを正規化するように構成することが望ましい。即ち、演算処理装置１００は、プーリング処理後のデータに対して周知の正規化処理を施すことにより、データを所定の基準形式である正規化データに変換してから特徴量の比較を行うようにするとよい。この構成によれば、特徴量の比較をより高精度で行うことができる。 Further, the feature map processed by the arithmetic processing device 100 is not limited to the one illustrated in FIG. In addition, it is desirable that the arithmetic processing device 100 be configured to normalize the feature map by a normalization processing unit (not shown) when comparing feature amounts between a plurality of feature maps. That is, the arithmetic processing unit 100 performs a known normalization process on the data after the pooling process, thereby converting the data into normalized data having a predetermined reference format, and then comparing the feature values. Good. According to this configuration, the feature amounts can be compared with higher accuracy.

図１１には、複数の特徴マップから得られた位置情報が混在する場合において、入力画像において検出対象物が存在すると推定される領域を囲む矩形枠を補正する場合の例を示している。即ち、演算処理装置１００は、例えば第３層目で記憶した特徴マップＭ１からの位置情報Ｊ３ａおよび特徴マップＭ２からの位置情報Ｊ３ｂに基づいて矩形枠Ｗ３を設定する。この矩形枠Ｗ３は、第３層目で記憶された複数の位置情報Ｊ３ａ，Ｊ３ｂからなる位置情報群を含む最小の大きさの枠である。さらに、演算処理装置１００は、第２層目で記憶した特徴マップＭ１からの位置情報Ｊ２ａおよび特徴マップＭ２からの位置情報Ｊ２ｂに基づいて矩形枠Ｗ３を矩形枠Ｗ２に補正することが可能である。この矩形枠Ｗ２は、第２層目で記憶された複数の位置情報Ｊ２ａ，Ｊ２ｂからなる位置情報群を含む最小の大きさの枠である。さらに、演算処理装置１００は、第１層目で記憶した特徴マップＭ１からの位置情報Ｊ１ａおよび特徴マップＭ２からの位置情報Ｊ１ｂに基づいて矩形枠Ｗ２を矩形枠Ｗ１に補正することが可能である。この矩形枠Ｗ１は、第１層目で記憶された複数の位置情報Ｊ１ａ，Ｊ１ｂからなる位置情報群を含む最小の大きさの枠である。 FIG. 11 shows an example of correcting a rectangular frame surrounding an area where a detection target is estimated to exist in an input image when position information obtained from a plurality of feature maps coexists. That is, the arithmetic processing device 100 sets the rectangular frame W3 based on the position information J3a from the feature map M1 and the position information J3b from the feature map M2 stored in the third layer, for example. The rectangular frame W3 is a frame having a minimum size including a position information group including a plurality of pieces of position information J3a and J3b stored in the third layer. Furthermore, the arithmetic processing unit 100 can correct the rectangular frame W3 to the rectangular frame W2 based on the position information J2a from the feature map M1 and the position information J2b from the feature map M2 stored in the second layer. . The rectangular frame W2 is a frame having a minimum size including a position information group including a plurality of pieces of position information J2a and J2b stored in the second layer. Furthermore, the arithmetic processing device 100 can correct the rectangular frame W2 to the rectangular frame W1 based on the position information J1a from the feature map M1 and the position information J1b from the feature map M2 stored in the first layer. . The rectangular frame W1 is a frame having a minimum size including a position information group including a plurality of pieces of position information J1a and J1b stored in the first layer.

演算処理装置１００によれば、プーリング処理の繰り返しに伴いデータの解像度が低下する場合であっても、各層の処理において記憶処理部１０５により記憶した位置情報に基づいて、入力画像における検出対象物の位置を精度良く特定することができる。
また、演算処理装置１００によれば、前階層から複数の特徴マップが入力される場合には、それぞれの特徴マップに対するプーリング処理の際に最大値を示す画素データをそれぞれ特徴画素データとして特定し、これら複数の特徴画素データのうち最も大きい値を示す特徴画素データの位置情報を記憶する。これにより、最も顕著に特徴を抽出している特徴マップから得られる位置情報に基づいて検出対象物の位置を特定することができ、その特定精度を一層向上することができる。 According to the arithmetic processing device 100, even if the resolution of data decreases as the pooling process is repeated, the detection object in the input image is detected based on the position information stored by the storage processing unit 105 in the processing of each layer. The position can be specified with high accuracy.
Further, according to the arithmetic processing device 100, when a plurality of feature maps are input from the previous hierarchy, the pixel data indicating the maximum value is specified as the feature pixel data in the pooling process for each feature map, The position information of the feature pixel data indicating the largest value among the plurality of feature pixel data is stored. As a result, the position of the detection target can be specified based on the position information obtained from the feature map from which the feature is most prominently extracted, and the specification accuracy can be further improved.

また、演算処理装置１００によれば、多階層にわたって階層的に記憶した位置情報に基づいて、一旦特定された矩形枠Ｗ、つまり検出対象物の位置をさらに補正することができる。これにより、検出対象物の位置の特定精度を一層向上することができる。
なお、本発明は、上述した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能である。例えば、演算ブロック１０４は、低解像度化処理として、プーリング処理に類似するサブサンプリング処理を行う構成としてもよい。サブサンプリング処理は、複数の画素データをスキップ、即ち読み取らないようにすることで出力データを減らす処理である。よって、処理の繰り返しに伴い出力データの解像度が低下する低解像度化処理である。 Further, according to the arithmetic processing device 100, it is possible to further correct the rectangular frame W once identified, that is, the position of the detection target, based on the position information stored hierarchically over multiple hierarchies. Thereby, the specific accuracy of the position of a detection target object can be improved further.
Note that the present invention is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the gist thereof. For example, the calculation block 104 may be configured to perform a sub-sampling process similar to the pooling process as the resolution reduction process. The sub-sampling process is a process for reducing output data by skipping a plurality of pixel data, that is, not reading. Therefore, this is a resolution reduction process in which the resolution of the output data decreases with the repetition of the process.

図面中、１００は演算処理装置、１０４は演算処理ブロック（演算処理手段）、１０５は記憶処理部（記憶手段）、１０６は復元処理部（復元手段）、１０７は補正処理部（補正手段）を示す。 In the drawing, 100 is an arithmetic processing unit, 104 is an arithmetic processing block (arithmetic processing means), 105 is a storage processing section (storage means), 106 is a restoration processing section (restoration means), and 107 is a correction processing section (correction means). Show.

Claims

An arithmetic processing apparatus (100) for executing an arithmetic operation using a neural network in which a plurality of processing layers are hierarchically connected,
Arithmetic processing means (104) for executing at least convolution arithmetic processing and resolution reduction processing on input data input from the previous layer;
Storage means (105) for storing position information of pixel data indicating a maximum value among pixel data indicating characteristics during the resolution reduction processing;
A restoring means (106) for identifying the position of the detection object in the input data based on the position information stored in the storage means;
Equipped with a,
When a plurality of the input data is input from the previous layer, the storage unit specifies pixel data indicating a maximum value as the characteristic pixel data at the time of the resolution reduction processing for each of the input data, The position information of the feature pixel data indicating the largest value among the plurality of feature pixel data is stored,
An arithmetic processing apparatus , wherein the convolution calculation processing and the resolution reduction processing are repeatedly executed over a plurality of layers while storing position information of pixel data indicating a maximum value in the resolution reduction processing .

The arithmetic processing apparatus according to claim 1 , further comprising a correction unit that corrects the position of the detection target specified by the restoration unit.