JP6561877B2

JP6561877B2 - Arithmetic processing unit

Info

Publication number: JP6561877B2
Application number: JP2016038956A
Authority: JP
Inventors: 顕一蓑谷; 智章尾崎
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2016-03-01
Filing date: 2016-03-01
Publication date: 2019-08-21
Anticipated expiration: 2036-03-01
Also published as: WO2017149971A1; JP2017156941A

Description

本発明は、演算処理装置に関する。 The present invention relates to an arithmetic processing device.

従来より、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置が考えられている。特に画像認識を行う演算処理装置においては、いわゆる畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）が中核的な存在となっている。 2. Description of the Related Art Conventionally, there has been considered an arithmetic processing device that executes arithmetic operations using a neural network in which a plurality of processing layers are hierarchically connected. Particularly in arithmetic processing devices that perform image recognition, a so-called convolutional neural network (CNN) is at the core.

特許第５１８４８２４号公報Japanese Patent No. 5184824

従来の畳み込みニューラルネットワークでは、前段の階層により得られる複数の異なる演算結果データ、つまり特徴量の抽出結果データに対して畳み込み演算処理を実行し、活性化処理を実行し、プーリング処理を実行することで、より高次元の特徴量の抽出を行っている。そして、さらに、プーリング処理による処理結果データに対して正規化処理を施すことにより、特徴量の認識率を向上することができ、特徴量の抽出処理を一層優位に行うことができる。 In a conventional convolutional neural network, a convolution operation process is executed on a plurality of different calculation result data obtained by the previous hierarchy, that is, feature value extraction result data, an activation process is executed, and a pooling process is executed. Therefore, higher-dimensional feature amounts are extracted. Further, by performing normalization processing on the processing result data by the pooling processing, the feature amount recognition rate can be improved, and the feature amount extraction processing can be performed more preferentially.

ところで、処理結果データに対して正規化処理を行う場合、従来では、まず、畳み込み演算処理時にメモリからデータを読み出し、読み出したデータに対して畳み込み演算処理およびプーリング処理を行い、その演算結果データをメモリに格納する。そして、その後、同一の処理階層において、そのメモリに格納したデータを読み出し、読み出したデータに対して正規化処理を行う。即ち、従来技術では、同一の処理階層においてメモリからのデータの読み出しを２回行う必要があり、大きなレイテンシが発生してしまう。 By the way, when performing normalization processing on processing result data, conventionally, first, data is read from the memory at the time of convolution operation processing, convolution operation processing and pooling processing are performed on the read data, and the operation result data is Store in memory. After that, in the same processing hierarchy, the data stored in the memory is read, and the read data is normalized. That is, in the prior art, it is necessary to read data from the memory twice in the same processing hierarchy, which causes a large latency.

そこで、本発明は、ニューラルネットワークによる演算処理を実現する演算処理装置において、正規化処理を行うための構成に改良を施すことにより、レイテンシの発生を抑えることを目的とする。 Accordingly, an object of the present invention is to suppress the occurrence of latency by improving the configuration for performing normalization processing in an arithmetic processing device that realizes arithmetic processing using a neural network.

本発明に係る演算処理装置は、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置であって、畳み込み演算処理部、プーリング処理部、積算処理部、統計処理部、正規化処理部を備える。畳み込み演算処理部は、前階層から入力される入力データに対して畳み込み演算処理を実行する。プーリング処理部は、前記畳み込み演算部による処理結果データに対してプーリング処理を実行する。積算処理部は、次回層の演算が開始される前に、前記プーリング処理部による処理結果データを積算する。統計処理部は、前記積算処理部が積算するデータの平均値および標準偏差を算出する。正規化処理部は、次回層の演算が開始されると、前階層において前記統計処理部により算出された平均値および標準偏差を用いて、前階層において前記プーリング処理部により出力された処理結果データに対して正規化処理を実行する。 An arithmetic processing apparatus according to the present invention is an arithmetic processing apparatus that performs an arithmetic operation using a neural network in which a plurality of processing layers are hierarchically connected, and includes a convolution arithmetic processing unit, a pooling processing unit, an integration processing unit, and a statistical processing unit. And a normalization processing unit. The convolution operation processing unit performs a convolution operation process on the input data input from the previous layer. The pooling processing unit performs a pooling process on the processing result data by the convolution operation unit. The integration processing unit integrates the processing result data by the pooling processing unit before the calculation of the next layer is started. The statistical processing unit calculates an average value and a standard deviation of the data integrated by the integration processing unit. When the calculation of the next layer is started, the normalization processing unit uses the average value and the standard deviation calculated by the statistical processing unit in the previous layer, and the processing result data output by the pooling processing unit in the previous layer Normalization processing is executed for.

この構成によれば、次回層における演算処理の入力部分において、前階層の演算処理時に得られた統計値を用いて正規化処理を行うことができる。そのため、同一の処理階層においてメモリからのデータの読み出しを２回行う必要が無く、レイテンシの発生を抑えることができる。 According to this configuration, the normalization process can be performed using the statistical value obtained during the calculation process of the previous layer in the input part of the calculation process in the next layer. Therefore, it is not necessary to read data from the memory twice in the same processing hierarchy, and the occurrence of latency can be suppressed.

畳み込みニューラルネットワークの構成例を概念的に示す図A diagram conceptually showing a configuration example of a convolutional neural network 中間層における演算処理の流れを視覚的に示す図（その１）The figure which shows the flow of arithmetic processing in the middle layer visually (the 1) 中間層における演算処理の流れを視覚的に示す図（その２）A diagram visually showing the flow of arithmetic processing in the intermediate layer (Part 2) 特徴量抽出処理に用いられる一般的な演算式および関数を示す図Diagram showing general arithmetic expressions and functions used for feature extraction processing 第１実施形態に係る演算処理装置の構成例を概略的に示すブロック図1 is a block diagram schematically showing a configuration example of an arithmetic processing device according to a first embodiment. 正規化関数の一例を示す図Diagram showing an example of a normalization function 演算処理装置による演算処理の流れを視覚的に示す図The figure which shows visually the flow of the arithmetic processing by the arithmetic processing unit 第２実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing unit which concerns on 2nd Embodiment. 正規化処理を構成する減算式および除算式の一例を示す図The figure which shows an example of the subtraction type | formula and division formula which comprise a normalization process 第３実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing unit which concerns on 3rd Embodiment. プーリング処理部の処理結果データのサンプリング例を示す図The figure which shows the sampling example of the process result data of a pooling process part 第４実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing apparatus which concerns on 4th Embodiment. プーリング処理部の処理結果データの一例を２進数により示す図The figure which shows an example of the processing result data of a pooling process part by a binary number 第５実施形態に係る演算処理装置の構成例を概略的に示すブロック図（その１）Block diagram (part 1) schematically showing a configuration example of an arithmetic processing unit according to a fifth embodiment 第５実施形態に係る演算処理装置の構成例を概略的に示すブロック図（その２）Block diagram schematically showing a configuration example of an arithmetic processing unit according to the fifth embodiment (No. 2) 非線形の活性化関数の一例を示す図Diagram showing an example of a nonlinear activation function 第６実施形態に係る演算処理装置の構成例を概略的に示すブロック図FIG. 6 is a block diagram schematically showing a configuration example of an arithmetic processing unit according to the sixth embodiment.

以下、演算処理装置の複数の実施形態について図面を参照しながら説明する。なお、各実施形態において実質的に同一の要素には同一の符号を付し、説明を省略する。
（ニューラルネットワーク）
図１には、詳しくは後述する演算処理装置１０，２０，３０，４０，５０，６０に適用されるニューラルネットワーク、この場合、畳み込みニューラルネットワークの構成を概念的に示している。即ち、畳み込みニューラルネットワークＮは、入力データである画像データＤ１から所定の形状やパターンを認識する画像認識技術に応用されるものであり、中間層Ｎａと全結合層Ｎｂとを有する。中間層Ｎａは、複数の特徴量抽出処理層Ｎａ１，Ｎａ２・・・が階層的に接続された構成である。各特徴量抽出処理層Ｎａ１，Ｎａ２・・・は、それぞれ畳み込み層Ｃおよびプーリング層Ｐを備える。 Hereinafter, a plurality of embodiments of an arithmetic processing device will be described with reference to the drawings. In each embodiment, substantially the same elements are denoted by the same reference numerals, and description thereof is omitted.
(neural network)
FIG. 1 conceptually shows the configuration of a neural network, in this case a convolutional neural network, applied to arithmetic processing units 10, 20, 30, 40, 50, and 60, which will be described in detail later. That is, the convolutional neural network N is applied to an image recognition technique for recognizing a predetermined shape or pattern from the image data D1 that is input data, and includes an intermediate layer Na and a total coupling layer Nb. The intermediate layer Na has a configuration in which a plurality of feature quantity extraction processing layers Na1, Na2,. Each feature amount extraction processing layer Na1, Na2,... Includes a convolution layer C and a pooling layer P, respectively.

次に、中間層Ｎａにおける処理の流れについて説明する。図２に例示するように、第１層目の特徴量抽出処理層Ｎａ１では、演算処理装置は、入力される画像データＤ１を例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第１層目の特徴量抽出処理層Ｎａ１では、例えば水平方向に延びる線状の特徴量や斜め方向に延びる線状の特徴量などといった比較的シンプルな単独の特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 Next, the flow of processing in the intermediate layer Na will be described. As illustrated in FIG. 2, in the first feature amount extraction processing layer Na1, the arithmetic processing unit scans the input image data D1 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. Note that the first feature amount extraction processing layer Na1 extracts relatively simple single feature amounts such as a linear feature amount extending in the horizontal direction and a linear feature amount extending in the oblique direction. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image.

第２層目の特徴量抽出処理層Ｎａ２では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ１から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第２層目の特徴量抽出処理層Ｎａ２では、第１層目の特徴量抽出処理層Ｎａ１で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 In the second feature amount extraction processing layer Na2, the arithmetic processing unit scans the input data input from the preceding feature amount extraction processing layer Na1 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. In addition, in the feature amount extraction processing layer Na2 of the second layer, by integrating the spatial positional relationship of a plurality of feature amounts extracted by the feature amount extraction processing layer Na1 of the first layer, Extract higher-dimensional composite features. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image.

第３層目の特徴量抽出処理層Ｎａ３では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ２から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第３層目の特徴量抽出処理層Ｎａ３では、第２層目の特徴量抽出処理層Ｎａ２で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。このように、複数の特徴量抽出処理層による特徴量の抽出処理を繰り返すことで、演算処理装置は、画像データＤ１に含まれる検出対象物体の画像認識を行う。 In the third feature quantity extraction processing layer Na3, the arithmetic processing unit scans the input data input from the previous feature quantity extraction processing layer Na2 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. The feature extraction processing layer Na3 of the third layer is integrated by considering the spatial positional relationship of a plurality of feature amounts extracted by the feature extraction processing layer Na2 of the second layer, Extract higher-dimensional composite features. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image. In this way, by repeating the feature amount extraction processing by the plurality of feature amount extraction processing layers, the arithmetic processing device performs image recognition of the detection target object included in the image data D1.

演算処理装置は、中間層Ｎａにおいて複数の特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３・・・による処理を繰り返すことで入力画像データＤ１に含まれる種々の特徴量を高次元で抽出していく。そして、演算処理装置は、中間層Ｎａの処理により得られた結果を中間演算結果データとして全結合層Ｎｂに出力する。 The arithmetic processing unit extracts various feature amounts included in the input image data D1 in a high dimension by repeating the processing by the plurality of feature amount extraction processing layers Na1, Na2, Na3... In the intermediate layer Na. Then, the arithmetic processing unit outputs the result obtained by the processing of the intermediate layer Na to the all coupling layer Nb as intermediate operation result data.

全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。即ち、全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合し、さらに、その結合結果に対して重み係数を異ならせながら積和演算を行うことにより、最終的な演算結果データ、即ち、入力データである画像データＤ１に含まれる検出対象物を認識した画像データを出力する。このとき、積和演算による演算結果の値が大きい部分が検出対象物の一部または全部として認識される。 The total coupling layer Nb combines a plurality of intermediate calculation result data obtained from the intermediate layer Na and outputs final calculation result data. That is, the total connection layer Nb combines a plurality of intermediate operation result data obtained from the intermediate layer Na, and further performs a sum-of-products operation while varying the weighting coefficient for the combined result, thereby obtaining a final operation. Result data, that is, image data in which the detection target included in the image data D1 as input data is recognized is output. At this time, the part where the value of the result of the product-sum operation is large is recognized as a part or all of the detection target.

次に、演算処理装置による特徴量抽出処理の流れについて説明する。図３に例示するように、演算処理装置は、前階層の特徴量抽出処理層から入力される入力データＤｎを所定サイズ、この場合、図にてハッチングで示す３×３画素ごとのフィルタサイズにより走査する。なお、画素サイズは、３×３画素に限られず、例えば５×５画素など適宜変更することができる。 Next, a flow of feature amount extraction processing by the arithmetic processing device will be described. As illustrated in FIG. 3, the arithmetic processing device uses a predetermined size for the input data Dn input from the feature extraction processing layer in the previous hierarchy, in this case, according to the filter size for each 3 × 3 pixel indicated by hatching in the figure. Scan. Note that the pixel size is not limited to 3 × 3 pixels, and can be appropriately changed, for example, 5 × 5 pixels.

そして、演算処理装置は、走査したデータに対して、それぞれ周知の畳み込み演算を行う。そして、演算処理装置は、畳み込み演算後のデータに対して周知の活性化処理を行い、畳み込み層Ｃの出力とする。そして、演算処理装置は、畳み込み層Ｃの出力データＣｎに対して、所定サイズ、この場合、２×２画素ごとに周知のプーリング処理を行い、プーリング層Ｐの出力とする。そして、演算処理装置は、プーリング層Ｐの出力データＰｎを次の階層の特徴量抽出処理層に出力する。なお、画素サイズは、２×２画素に限られず適宜変更することができる。 The arithmetic processing unit performs a known convolution operation on the scanned data. Then, the arithmetic processing device performs a well-known activation process on the data after the convolution operation, and outputs the result to the convolution layer C. Then, the arithmetic processing unit performs a well-known pooling process on the output data Cn of the convolution layer C at a predetermined size, in this case, 2 × 2 pixels, and outputs the result to the pooling layer P. Then, the arithmetic processing device outputs the output data Pn of the pooling layer P to the feature amount extraction processing layer of the next layer. The pixel size is not limited to 2 × 2 pixels and can be changed as appropriate.

ここで、演算処理装置は、プーリング処理後のデータＰｎ１，Ｐｎ２，・・・に対して周知の正規化処理を施すことにより、プーリングデータＰｎを所定の基準形式である正規化データＮｎ１，Ｐｎ２，・・・に変換してから次の階層に出力する。これにより、より統一された形式でプーリングデータＰｎを次の階層に出力することができる。従って、特徴量の認識率を向上することができ、特徴量の抽出処理を一層優位に行うことができる。後述する各実施形態においては、演算処理装置には、この正規化処理を行うための構成に改良が施されている。 Here, the arithmetic processing unit performs a well-known normalization process on the pooled data Pn1, Pn2,. ... and then output to the next layer. Thereby, the pooling data Pn can be output to the next layer in a more unified format. Therefore, the recognition rate of the feature amount can be improved, and the feature amount extraction process can be performed more preferentially. In each embodiment described later, the arithmetic processing device is improved in the configuration for performing this normalization process.

図４には、畳み込み演算処理に用いられる畳み込み関数、活性化処理に用いられる関数、プーリング処理に用いられる関数の一般的な例を示している。即ち、畳み込み関数Ｙｉｊは、直前の層の出力Ｘｉｊに学習により得られる重み係数Ｗｐ，ｑを乗算した値を累積する関数となっている。なお、「Ｎ」は１サイクルの畳み込み演算処理により処理される画素サイズを示す。即ち、例えば１演算サイクルの画素サイズが「３×３」画素である場合、Ｎの値は「２」である。また、畳み込み関数Ｙｉｊは、累積値に所定のバイアス値を加算する関数としてもよい。また、畳み込み関数は、全結合処理にも対応し得る積和演算が可能な関数であれば、種々の関数を採用することができる。また、活性化処理には、周知のロジスティックジグモイド関数やＲｅＬＵ関数（Rectified Linear Units）などが用いられる。また、プーリング処理には、入力されるデータの最大値を出力する周知の最大プーリング関数や、入力されるデータの平均値を出力する周知の平均プーリング関数などが用いられる。 FIG. 4 shows general examples of a convolution function used for convolution operation processing, a function used for activation processing, and a function used for pooling processing. That is, the convolution function Yij is a function that accumulates values obtained by multiplying the output Xij of the immediately preceding layer by the weighting factors Wp, q obtained by learning. Note that “N” indicates a pixel size to be processed by one cycle of convolution operation processing. That is, for example, when the pixel size of one calculation cycle is “3 × 3” pixels, the value of N is “2”. Further, the convolution function Yij may be a function for adding a predetermined bias value to the accumulated value. Various functions can be adopted as the convolution function as long as it is a function capable of multiply-accumulate operation that can cope with all-join processing. For the activation process, a well-known logistic sigmoid function, ReLU function (Rectified Linear Units), or the like is used. For the pooling process, a known maximum pooling function that outputs a maximum value of input data, a known average pooling function that outputs an average value of input data, or the like is used.

上述した畳み込みニューラルネットワークＮによれば、コンボルーション層Ｃによる処理およびプーリング層Ｐによる処理が繰り返されることにより、より高次元の特徴量の抽出が可能となる。次に、この畳み込みニューラルネットワークＮを適用した演算処理装置に係る複数の実施形態について説明する。なお、各実施形態に係る図においては、第ｎ層目の処理を行っている演算処理装置を実線により示し、次回層である第ｎ＋１層目の処理を行っている演算処理装置を二点鎖線により示している。また、第ｎ層目の処理を行っている演算ブロックを実線により示し、次回層である第ｎ＋１層目の処理を行っている演算ブロックを二点鎖線により示している。 According to the convolutional neural network N described above, the processing by the convolution layer C and the processing by the pooling layer P are repeated, so that higher-dimensional feature amounts can be extracted. Next, a plurality of embodiments according to an arithmetic processing device to which the convolutional neural network N is applied will be described. In the drawings according to the embodiments, the arithmetic processing device that performs the processing of the nth layer is indicated by a solid line, and the arithmetic processing device that performs the processing of the (n + 1) th layer that is the next layer is indicated by a two-dot chain line. It shows by. In addition, a calculation block that performs processing of the nth layer is indicated by a solid line, and a calculation block that performs processing of the (n + 1) th layer that is the next layer is indicated by a two-dot chain line.

（第１実施形態）
図５に例示する演算処理装置１０は、畳み込み演算処理部１１、プーリング処理部１２、積算処理部１３、統計処理部１４、正規化処理部１５を備える。畳み込み演算処理部１１は、前階層から入力される入力データに対して周知の畳み込み演算処理を実行して、その処理結果データをプーリング処理部１２に出力する。なお、演算処理装置１０は、畳み込み演算処理部１１の処理結果データに対して図示しない活性化処理部により周知の活性化処理を実行してからプーリング処理部１２に出力する。プーリング処理部１２は、畳み込み演算処理部１１から入力される処理結果データに対して周知のプーリング処理を実行して、その処理結果データを、次回層の演算処理時における正規化処理部１５に出力するようになっている。 (First embodiment)
The arithmetic processing device 10 illustrated in FIG. 5 includes a convolution arithmetic processing unit 11, a pooling processing unit 12, an integration processing unit 13, a statistical processing unit 14, and a normalization processing unit 15. The convolution operation processing unit 11 performs a well-known convolution operation process on the input data input from the previous layer, and outputs the processing result data to the pooling processing unit 12. The arithmetic processing device 10 performs a known activation process on the processing result data of the convolution arithmetic processing unit 11 by an activation processing unit (not shown) and outputs the result to the pooling processing unit 12. The pooling processing unit 12 performs a well-known pooling process on the processing result data input from the convolution operation processing unit 11, and outputs the processing result data to the normalization processing unit 15 at the next layer arithmetic processing. It is supposed to be.

積算処理部１３は、プーリング処理部１２が出力する処理結果データを、次回層における演算処理が開始される前に積算する。統計処理部１４は、積算処理部１３が積算するデータ、つまり、プーリング処理部１２が出力する処理結果データの平均値および標準偏差を算出する。正規化処理部１５は、図示しない減算器や除算器などを備えており、次回層における演算処理が開始されると、前階層において統計処理部１４により算出された平均値および標準偏差を用いて、前階層においてプーリング処理部１２により出力された処理結果データに対して周知の正規化処理を実行する。正規化処理部１５は、例えば図６に示す正規化関数に基づいて正規化処理を実行する。 The integration processing unit 13 integrates the processing result data output from the pooling processing unit 12 before the calculation processing in the next layer is started. The statistical processing unit 14 calculates the average value and standard deviation of the data integrated by the integration processing unit 13, that is, the processing result data output by the pooling processing unit 12. The normalization processing unit 15 includes a subtractor, a divider, and the like (not shown), and when arithmetic processing in the next layer is started, the average value and standard deviation calculated by the statistical processing unit 14 in the previous layer are used. A known normalization process is performed on the processing result data output by the pooling processing unit 12 in the previous hierarchy. The normalization processing unit 15 performs normalization processing based on, for example, the normalization function shown in FIG.

演算処理装置１０によれば、図７に例示するように、次回層における演算処理の入力部分において、前階層の演算処理時に得られた統計値を用いて正規化処理を行うようにした。そのため、同一の処理階層においてメモリからのデータの読み出しを２回行う必要が無く、連続する２層にわたって正規化処理をパイプライン化して行うことができる。よって、処理の高速化を図ることができ、レイテンシの発生を抑えることができる。 According to the arithmetic processing unit 10, as illustrated in FIG. 7, the normalization processing is performed using the statistical value obtained during the arithmetic processing in the previous layer in the input portion of the arithmetic processing in the next layer. Therefore, it is not necessary to read out data from the memory twice in the same processing hierarchy, and normalization processing can be performed in a pipeline over two consecutive layers. Therefore, the processing speed can be increased and the occurrence of latency can be suppressed.

（第２実施形態）
図８に例示する演算処理装置２０は、畳み込み演算処理部２１、プーリング処理部２２、積算処理部２３、統計処理部２４、正規化処理部２５を備える。正規化処理部２５は、減算処理部２５ａおよび除算処理部２５ｂを備える。 (Second Embodiment)
The arithmetic processing device 20 illustrated in FIG. 8 includes a convolution arithmetic processing unit 21, a pooling processing unit 22, an integration processing unit 23, a statistical processing unit 24, and a normalization processing unit 25. The normalization processing unit 25 includes a subtraction processing unit 25a and a division processing unit 25b.

減算処理部２５ａは、図９に例示する減算式（１）に基づいて、前階層において統計処理部２４により算出された平均値を、前階層においてプーリング処理部２２により出力された処理結果データから減算する。畳み込み演算処理部２１は、減算処理部２５ａが出力する処理結果データに対し周知の畳み込み演算処理を実行する。除算処理部２５ｂは、図９に例示する除算式（２）に基づいて、畳み込み演算処理部２１が出力する処理結果データを、前階層において統計処理部２４により算出された標準偏差により除算する。 The subtraction processing unit 25a calculates the average value calculated by the statistical processing unit 24 in the previous hierarchy based on the subtraction formula (1) illustrated in FIG. 9 from the processing result data output by the pooling processing unit 22 in the previous hierarchy. Subtract. The convolution operation processing unit 21 performs a well-known convolution operation process on the processing result data output from the subtraction processing unit 25a. The division processing unit 25b divides the processing result data output from the convolution operation processing unit 21 by the standard deviation calculated by the statistical processing unit 24 in the previous hierarchy based on the division formula (2) illustrated in FIG.

減算および除算からなる正規化処理を行った上で畳み込み演算処理を行う構成では、正規化処理部に入力されるデータの数に応じた除算回路が必要であり、回路規模が大きくなるという課題がある。演算処理装置２０によれば、減算からなる正規化処理を行い、その処理結果データに対して畳み込み演算処理を行い、その処理結果データに対して除算からなる正規化処理を行うようにした。畳み込み演算処理において、入力されるデータは複数であるのに対し、出力されるデータは１つである。そのため、演算処理装置２０の構成によれば、畳み込み演算処理の後に除算を行うようにしたので、除算回路の数を削減することができる。 In a configuration in which convolution operation processing is performed after performing normalization processing including subtraction and division, a division circuit corresponding to the number of data input to the normalization processing unit is necessary, and there is a problem that the circuit scale increases. is there. According to the arithmetic processing unit 20, a normalization process including subtraction is performed, a convolution operation process is performed on the processing result data, and a normalization process including division is performed on the processing result data. In the convolution operation process, there are a plurality of input data, but only one data is output. Therefore, according to the configuration of the arithmetic processing unit 20, since the division is performed after the convolution arithmetic processing, the number of division circuits can be reduced.

（第３実施形態）
図１０に例示する演算処理装置３０は、畳み込み演算処理部３１、プーリング処理部３２、積算処理部３３、統計処理部３４、正規化処理部３５、サンプリング処理部３６を備える。図１１に例示するように、サンプリング処理部３６は、プーリング処理部３２が出力する処理結果データの一部をサンプリングする。そして、サンプリング処理部３６は、サンプリングしたデータを積算処理部３３に出力する。そして、積算処理部３３は、サンプリング処理部３６によってサンプリングされたデータ、つまり、プーリング処理部３２による処理結果データの一部のみを積算する。そして、統計処理部３４は、積算処理部３３により積算されたプーリング処理部３２による処理結果データの一部のみに基づいて、その平均値および標準偏差を算出する。 (Third embodiment)
The arithmetic processing device 30 illustrated in FIG. 10 includes a convolution arithmetic processing unit 31, a pooling processing unit 32, an integration processing unit 33, a statistical processing unit 34, a normalization processing unit 35, and a sampling processing unit 36. As illustrated in FIG. 11, the sampling processing unit 36 samples a part of the processing result data output from the pooling processing unit 32. Then, the sampling processing unit 36 outputs the sampled data to the integration processing unit 33. Then, the integration processing unit 33 integrates only part of the data sampled by the sampling processing unit 36, that is, the processing result data by the pooling processing unit 32. Then, the statistical processing unit 34 calculates the average value and the standard deviation based only on a part of the processing result data by the pooling processing unit 32 accumulated by the accumulation processing unit 33.

演算処理装置３０によれば、プーリング処理部３２による処理結果データの全てではなく一部のみに基づいて、その平均値と標準偏差を算出して正規化処理に用いる。このように、プーリング処理部３２による処理結果データの一部のみに基づく統計値によっても十分精度が高い正規化処理を行うことができる。また、演算処理装置３０によれば、積算処理部３３の処理負荷を低減することができる。 According to the arithmetic processing unit 30, the average value and the standard deviation are calculated based on only a part rather than all of the processing result data by the pooling processing unit 32 and used for the normalization process. In this way, it is possible to perform normalization processing with sufficiently high accuracy even with statistical values based only on part of the processing result data by the pooling processing unit 32. Moreover, according to the arithmetic processing unit 30, the processing load of the integration processing unit 33 can be reduced.

（第４実施形態）
図１２に例示する演算処理装置４０は、畳み込み演算処理部４１、プーリング処理部４２、積算処理部４３、統計処理部４４、正規化処理部４５、右シフト処理部４６、左シフト処理部４７を備える。右シフト処理部４６は、プーリング処理部４２と積算処理部４３との間に設けられており、プーリング処理部４２による処理結果データを所定ビットだけ右にシフトさせて積算処理部４３に出力する。左シフト処理部４７は、積算処理部４３と統計処理部４４との間に設けられており、積算処理部４３による処理結果データを所定ビットだけ左にシフトさせて統計処理部４４に出力する。 (Fourth embodiment)
The arithmetic processing device 40 illustrated in FIG. 12 includes a convolution arithmetic processing unit 41, a pooling processing unit 42, an integration processing unit 43, a statistical processing unit 44, a normalization processing unit 45, a right shift processing unit 46, and a left shift processing unit 47. Prepare. The right shift processing unit 46 is provided between the pooling processing unit 42 and the integration processing unit 43, shifts the processing result data by the pooling processing unit 42 to the right by a predetermined bit, and outputs it to the integration processing unit 43. The left shift processing unit 47 is provided between the integration processing unit 43 and the statistical processing unit 44, shifts the processing result data by the integration processing unit 43 to the left by a predetermined bit, and outputs it to the statistical processing unit 44.

なお、右シフト処理部４６がデータを右シフトするときのビット数と、左シフト処理部４７がデータを左シフトするときのビット数は同じである。そのため、図１３に例示するように、積算処理部４３は、プーリング処理部４２による処理結果データの上位の所定ビットのみを積算する構成となる。演算処理装置４０によれば、積算処理部４３により、プーリング処理部４２による処理結果データの全ビットではなく上位の所定ビットのみを積算する。この構成によれば、積算処理部４３の処理負荷を低減することができる。 The number of bits when the right shift processing unit 46 shifts the data to the right is the same as the number of bits when the left shift processing unit 47 shifts the data to the left. Therefore, as illustrated in FIG. 13, the integration processing unit 43 is configured to integrate only the upper predetermined bits of the processing result data by the pooling processing unit 42. According to the arithmetic processing unit 40, the integration processing unit 43 integrates not only all bits of the processing result data by the pooling processing unit 42 but only the upper predetermined bits. According to this configuration, the processing load of the integration processing unit 43 can be reduced.

（第５実施形態）
図１４に例示する演算処理装置５０は、畳み込み演算処理部５１、プーリング処理部５２、積算処理部５３、統計処理部５４、正規化処理部５５を有する演算ブロック５００を複数備える構成である。なお、図１４には、１つの演算ブロック５００のみを示している。演算ブロック５００には、それぞれ重み調整処理部５６が備えられている。重み調整処理部５６は、各演算ブロック５００の畳み込み演算処理部５１がそれぞれ畳み込み演算処理を実行する際に用いる重み係数のうち最大の値を示す重み係数を特定する。そして、重み調整処理部５６は、その特定した重み係数の最大値の絶対値により、畳み込み演算処理部５１が用いる重み係数を除算する。これにより、重み調整処理部５６は、重み係数を−１〜１の範囲に収まるように調整する。 (Fifth embodiment)
The arithmetic processing device 50 illustrated in FIG. 14 includes a plurality of arithmetic blocks 500 each including a convolution arithmetic processing unit 51, a pooling processing unit 52, an integration processing unit 53, a statistical processing unit 54, and a normalization processing unit 55. FIG. 14 shows only one calculation block 500. Each calculation block 500 includes a weight adjustment processing unit 56. The weight adjustment processing unit 56 specifies a weighting coefficient indicating the maximum value among the weighting coefficients used when the convolution operation processing unit 51 of each operation block 500 executes the convolution operation processing. Then, the weight adjustment processing unit 56 divides the weighting factor used by the convolution operation processing unit 51 by the absolute value of the maximum value of the identified weighting factor. As a result, the weight adjustment processing unit 56 adjusts the weighting coefficient so that it falls within the range of −1 to 1.

演算処理装置５０は、複数の畳み込み演算処理部５１がそれぞれ畳み込み演算処理を実行する際に用いる重み係数の最大値を特定し、その特定した最大値の絶対値により、複数の畳み込み演算処理部５１が用いる重み係数を除算して、重み係数を−１〜１の範囲に調整する。このように重み係数を所定の範囲内に収めることで、当該重み係数を用いて算出される畳み込み演算処理の処理結果データを所定の範囲内に収めることができ、処理結果データのぶれを抑えることができる。よって、処理結果データを、浮動小数点数よりも表現範囲が狭い固定小数点数によっても表現することができ、また、固定小数点数による処理により、より高速で精度の良い演算が可能となる。 The arithmetic processing device 50 specifies the maximum value of the weighting coefficient used when each of the plurality of convolution arithmetic processing units 51 executes the convolution arithmetic processing, and uses the absolute value of the specified maximum value to determine the plurality of convolution arithmetic processing units 51. Is divided to adjust the weighting factor to a range of −1 to 1. By keeping the weighting factor within the predetermined range in this way, the processing result data of the convolution calculation process calculated using the weighting factor can be within the predetermined range, and blurring of the processing result data is suppressed. Can do. Therefore, the processing result data can be expressed by a fixed-point number whose expression range is narrower than that of the floating-point number, and the calculation by the fixed-point number enables faster and more accurate calculation.

なお、図１５に例示するように、演算処理装置５０は、さらに、記憶部５７、乗算処理部５８を備える構成とするとよい。この構成において、記憶部５７は、重み調整処理部５６が特定した重み係数の最大値の絶対値を記憶する。乗算処理部５８は、例えば乗算器で構成されており、畳み込み演算処理部５１が出力する処理結果データに、記憶部５７に記憶されている重み係数の最大値の絶対値を乗算して、活性化処理部５９に出力する。活性化処理部５９は、入力される処理結果データに対して周知の活性化処理を実行して、その処理結果データをプーリング処理部５２に出力する。 As illustrated in FIG. 15, the arithmetic processing device 50 may further include a storage unit 57 and a multiplication processing unit 58. In this configuration, the storage unit 57 stores the absolute value of the maximum value of the weighting coefficient specified by the weight adjustment processing unit 56. The multiplication processing unit 58 is constituted by, for example, a multiplier, and multiplies the processing result data output from the convolution operation processing unit 51 by the absolute value of the maximum value of the weighting coefficient stored in the storage unit 57 to activate the multiplication result. To the processing unit 59. The activation processing unit 59 performs a well-known activation process on the input processing result data, and outputs the processing result data to the pooling processing unit 52.

この構成は、活性化処理に用いる活性化関数が非線形である場合に特に効果を発揮する。即ち、図１６に例示する非線形の活性化関数により処理結果データにおいては、畳み込み演算の重み係数を調整しない場合の処理結果Ｒ１付近における処理結果の変化量と、畳み込み演算の重み係数を調整した場合の処理結果Ｒ２付近における処理結果の変化量とが異なってくる。そのため、演算処理装置５０は、重み係数を調整した場合の処理結果データＲ２に対し、重み係数の調整の際に用いた最大値の絶対値を乗算することにより、矢印ｒで示すように、畳み込み演算処理の処理結果データを、重み係数を調整しない場合の処理結果データＲ１に戻すようにした。これにより、調整された重み係数により高速で精度の良い畳み込み演算を行いつつも、その処理結果データを、重み係数を調整しなかった場合の処理結果データに近似させることができ、特に活性化関数が非線形である場合であっても、その影響を抑えることができる。 This configuration is particularly effective when the activation function used for the activation process is non-linear. That is, in the processing result data by the non-linear activation function illustrated in FIG. 16, when the amount of change in the processing result near the processing result R1 when the weighting factor of the convolution operation is not adjusted and the weighting factor of the convolution operation are adjusted. The amount of change in the processing result in the vicinity of the processing result R2 is different. Therefore, the arithmetic processing unit 50 multiplies the processing result data R2 when the weighting factor is adjusted by the absolute value of the maximum value used when the weighting factor is adjusted, as shown by the arrow r. The processing result data of the arithmetic processing is returned to the processing result data R1 when the weighting coefficient is not adjusted. As a result, it is possible to approximate the processing result data to the processing result data when the weighting factor is not adjusted while performing the fast and accurate convolution operation with the adjusted weighting factor, and particularly the activation function. Even if is non-linear, the influence can be suppressed.

（第６実施形態）
図１７に例示する演算処理装置６０は、畳み込み演算処理部６１、プーリング処理部６２、積算処理部６３、正規化処理部６５を有する演算ブロック６００を複数備える。演算ブロック６００には、それぞれ、加算処理部６６、記憶部６７が備えられている。また、複数の演算ブロック６００のうちの何れか１つ、この場合、最上流の演算ブロック６００には、統計処理部６４が備えられている。なお、説明の便宜上、図の下側を下流側、図の上側を上流側と定義する。 (Sixth embodiment)
An arithmetic processing device 60 illustrated in FIG. 17 includes a plurality of arithmetic blocks 600 each including a convolution arithmetic processing unit 61, a pooling processing unit 62, an integration processing unit 63, and a normalization processing unit 65. The calculation block 600 includes an addition processing unit 66 and a storage unit 67, respectively. In addition, the statistical processing unit 64 is provided in any one of the plurality of calculation blocks 600, in this case, the most upstream calculation block 600. For convenience of explanation, the lower side of the figure is defined as the downstream side, and the upper side of the figure is defined as the upstream side.

加算処理部６６は、自身と同じ演算ブロック６００を構成する積算処理部６３から入力されるデータに、自身よりも下流側の演算ブロック６００から入力されるデータを加算して、記憶部６７に出力する。記憶部６７は、加算処理部６６から入力されるデータを記憶するとともに、そのデータを自身よりも上流側の演算ブロック６００の加算処理部６６に出力する。最も上流側の演算ブロック６００の記憶部６７は、記憶したデータを統計処理部６４に出力する。統計処理部６４は、入力されたデータについて平均値と標準偏差を算出し、各演算ブロック６００の正規化処理部６５にそれぞれ出力する。即ち、統計処理部６４は、複数の積算処理部６３が算出するデータの平均値および標準偏差を算出して、複数の正規化処理部６５にそれぞれ提供する。 The addition processing unit 66 adds the data input from the calculation block 600 downstream of itself to the data input from the integration processing unit 63 that constitutes the same calculation block 600 as that of the addition processing unit 66 and outputs the result to the storage unit 67. To do. The storage unit 67 stores the data input from the addition processing unit 66 and outputs the data to the addition processing unit 66 of the arithmetic block 600 on the upstream side of itself. The storage unit 67 of the most upstream calculation block 600 outputs the stored data to the statistical processing unit 64. The statistical processing unit 64 calculates an average value and a standard deviation for the input data and outputs them to the normalization processing unit 65 of each calculation block 600. That is, the statistical processing unit 64 calculates the average value and standard deviation of the data calculated by the plurality of integration processing units 63 and provides them to the plurality of normalization processing units 65, respectively.

演算処理装置６０によれば、複数の演算ブロック６００を備える構成において、各演算ブロック６００にそれぞれ統計処理部６４を設けるのではなく、１つの演算ブロック６００のみに統計処理部６４を設け、当該統計処理部６４が出力する統計値を複数の正規化処理部６５により共用するようにした。この構成によれば、複数の演算ブロック６００を備える場合であっても、回路規模が比較的大きくなる統計処理部６４を１つだけに抑えることができ、装置全体としてコンパクト化や低コスト化を図ることができる。 According to the arithmetic processing device 60, in a configuration including a plurality of arithmetic blocks 600, the statistical processing unit 64 is not provided in each arithmetic block 600, but the statistical processing unit 64 is provided only in one arithmetic block 600, and the statistical The statistical value output from the processing unit 64 is shared by the plurality of normalization processing units 65. According to this configuration, even if a plurality of operation blocks 600 are provided, the statistical processing unit 64 having a relatively large circuit scale can be suppressed to one, and the entire apparatus can be made compact and low in cost. Can be planned.

（その他の実施形態）
なお、本発明は、上述した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能である。例えば、上述した複数の実施形態を適宜組み合わせて実施してもよい。 (Other embodiments)
Note that the present invention is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the gist thereof. For example, you may implement combining several embodiment mentioned above suitably.

図面中、１０，２０，３０，４０，５０，６０は演算処理装置、１１，２１，３１，４１，５１，６１は畳み込み演算処理部、１２，２２，３２，４２，５２，６２はプーリング処理部、１３，２３，３３，４３，５３，６３は積算処理部、１４，２４，３４，４４，５４，６４は統計処理部、１５，２５，３５，４５，５５，６５は正規化処理部、２５ａは減算処理部、２５ｂは除算処理部、５６は重み調整処理部、５００は演算ブロック、６００は演算ブロックを示す。 In the drawings, 10, 20, 30, 40, 50, 60 are arithmetic processing units, 11, 21, 31, 41, 51, 61 are convolution arithmetic processing units, 12, 22, 32, 42, 52, 62 are pooling processes. , 13, 23, 33, 43, 53, 63 are integration processing units, 14, 24, 34, 44, 54, 64 are statistical processing units, and 15, 25, 35, 45, 55, 65 are normalization processing units. , 25a denotes a subtraction processing unit, 25b denotes a division processing unit, 56 denotes a weight adjustment processing unit, 500 denotes a calculation block, and 600 denotes a calculation block.

Claims

An arithmetic processing device (10, 20, 30, 40, 50, 60) that executes an arithmetic operation using a neural network in which a plurality of processing layers are hierarchically connected,
A convolution operation processing unit (11, 21, 31, 41, 51, 61) for executing convolution operation processing on input data input from the previous layer;
A pooling processing unit (12, 22, 32, 42, 52, 62) for performing a pooling process on the processing result data by the convolution operation processing unit;
An integration processing unit (13, 23, 33, 43, 53, 63) for integrating the processing result data by the pooling processing unit before the calculation of the next layer is started;
A statistical processing unit (14, 24, 34, 44, 54, 64) for calculating an average value and a standard deviation of data integrated by the integration processing unit;
When calculation of the next layer is started, normalization processing is performed on the processing result data output by the pooling processing unit in the previous layer, using the average value and the standard deviation calculated by the statistical processing unit in the previous layer A normalization processing unit (15, 25, 35, 45, 55, 65) for executing
An arithmetic processing device comprising:

The normalization processing unit (25) includes a subtraction processing unit (25a) and a division processing unit (25b).
The subtraction processing unit subtracts the average value calculated by the statistical processing unit (24) in the previous hierarchy from the processing result data output by the pooling processing unit (22) in the previous hierarchy,
The convolution operation processing unit (21) performs a convolution operation processing on the processing result data output from the subtraction processing unit,
2. The arithmetic processing device according to claim 1, wherein the division processing unit divides the processing result data output from the convolution arithmetic processing unit by a standard deviation calculated by the statistical processing unit (24) in a previous hierarchy.

The arithmetic processing unit according to claim 1 or 2, wherein the integration processing unit (33) integrates a part of processing result data obtained by the pooling processing unit (32).

The arithmetic processing unit according to any one of claims 1 to 3, wherein the integration processing unit (43) integrates predetermined high-order bits of processing result data by the pooling processing unit (42).

A plurality of calculation blocks (600) including the convolution calculation processing unit (61), the pooling processing unit (62), the integration processing unit (63), and the normalization processing unit (65) are provided.
The arithmetic processing unit according to claim 1, wherein the statistical processing unit (64) calculates an average value and a standard deviation of data calculated by the plurality of integration processing units (63).

A plurality of calculation blocks (500) including the convolution calculation processing unit (51), the pooling processing unit (52), the integration processing unit (53), and the normalization processing unit (55) are provided,
The plurality of convolution arithmetic processing units respectively specify the maximum value of the weighting coefficient used when executing the convolution arithmetic processing, and the weighting factor used by the plurality of convolution arithmetic processing units is divided by the absolute value of the specified maximum value The arithmetic processing unit according to any one of claims 1 to 5, further comprising a weight adjustment processing unit (56) for performing the processing.

A multiplication processing unit (58) for multiplying the calculation result data calculated by the convolution calculation processing unit (51) using the weighting coefficient adjusted by the weight adjustment processing unit by the absolute value of the maximum value of the weighting coefficient. The arithmetic processing apparatus according to claim 6.