JP2016099707A

JP2016099707A - Arithmetic processing unit

Info

Publication number: JP2016099707A
Application number: JP2014234487A
Authority: JP
Inventors: 智章尾崎; Tomoaki Ozaki; 顕一蓑谷; Kenichi Minoya
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2014-11-19
Filing date: 2014-11-19
Publication date: 2016-05-30
Anticipated expiration: 2034-11-19
Also published as: JP6365258B2

Abstract

PROBLEM TO BE SOLVED: To reduce the whole circuit scale of an arithmetic processing unit for achieving arithmetic processing by a neural network by constituting a circuit for achieving a whole coupling layer by using a circuit for achieving an intermediate layer.SOLUTION: An arithmetic processing unit 100 sequentially reads image data included in the same area as unit image data, while moving a search window with respect to intermediate arithmetic result data obtained from the intermediate layer and executes arithmetic processing by a convolution arithmetic processing part 101 to the unit image data, to generate unit arithmetic result data, accumulates the unit arithmetic result data, to generate cumulative arithmetic result data, stores the unit arithmetic result data and the cumulative arithmetic result data in a storage part 108 and a storage part 109, while alternately switching them, and selects the cumulative arithmetic result data stored in the storage part 108 or the storage part 109, to output it as final arithmetic result data.SELECTED DRAWING: Figure 5

Description

本発明は、演算処理装置に関する。 The present invention relates to an arithmetic processing device.

従来より、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置が考えられている。特に画像認識を行う演算処理装置においては、いわゆる畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）が中核的な存在となっている。 2. Description of the Related Art Conventionally, there has been considered an arithmetic processing device that executes arithmetic operations using a neural network in which a plurality of processing layers are hierarchically connected. Particularly in arithmetic processing devices that perform image recognition, a so-called convolutional neural network (CNN) is at the core.

特許第５１８４８２４号公報Japanese Patent No. 5184824

畳み込みニューラルネットワークによれば、入力される画像データに対して中間層の処理および全結合層の処理が順次施されることにより、画像に含まれる対象物が認識された最終的な演算結果データが得られる。中間層では、複数の特徴量抽出処理層が階層的に接続されており、各処理層において、前階層から入力される入力データに対して畳み込み演算処理、活性化処理、プーリング処理を実行する処理が行われる。中間層は、このように各処理層における処理を繰り返すことで入力画像データに含まれる特徴量を高次元で抽出し、その結果を中間演算結果データとして出力する。全結合層では、中間層から得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。 According to the convolutional neural network, the final calculation result data in which the object included in the image is recognized is obtained by sequentially performing the intermediate layer processing and the total connection layer processing on the input image data. can get. In the intermediate layer, a plurality of feature quantity extraction processing layers are hierarchically connected, and in each processing layer, processing for executing convolution operation processing, activation processing, and pooling processing on input data input from the previous layer Is done. In this way, the intermediate layer repeats the processing in each processing layer to extract the feature amount included in the input image data in a high dimension, and outputs the result as intermediate operation result data. In the fully connected layer, a plurality of intermediate operation result data obtained from the intermediate layer are combined to output final operation result data.

このように畳み込みニューラルネットワークにおいては、中間層で行われる処理と全結合層で行われる処理とが異なる。そのため、従来の構成では、中間層を実現する回路と全結合層を実現する回路とを別々に設けており、従って、演算処理装置全体として回路規模が大きくなるという課題を有している。 Thus, in the convolutional neural network, the processing performed in the intermediate layer is different from the processing performed in the fully connected layer. For this reason, in the conventional configuration, a circuit that realizes the intermediate layer and a circuit that realizes the fully coupled layer are provided separately, and therefore, there is a problem that the circuit scale of the entire arithmetic processing device becomes large.

そこで、本発明は、中間層を実現する回路を利用して全結合層を実現する回路を構成することにより、ニューラルネットワークによる演算処理を実現する演算処理装置の全体の回路規模を小さくすることを目的とする。 Therefore, the present invention reduces the overall circuit scale of an arithmetic processing unit that realizes arithmetic processing by a neural network by configuring a circuit that realizes a fully connected layer using a circuit that realizes an intermediate layer. Objective.

本発明に係る演算処理装置は、中間層の処理の後に行う全結合層の処理を以下のように行う。即ち、演算処理装置は、畳み込み演算手段、活性化手段、プーリング手段による一連の特徴量抽出処理により得られる中間演算結果データ、つまり、中間層の処理により得られる中間演算結果データに対して、複数の領域に分割された探索窓を走査しながら、同一の領域に含まれる画像データを単位画像データとして順次読み出す。そして、演算処理装置は、順次読み出す単位画像データに対して畳み込み演算手段による演算処理を実行することにより単位演算結果データを順次生成する。また、演算処理装置は、順次生成する単位演算結果データを累積して累積演算結果データを生成する。そして、演算処理装置は、単位演算結果データおよび累積演算結果データを第１記憶手段および第２記憶手段に交互に切り替えながら格納する処理を繰り返す。そして、演算処理装置は、最終的に第１記憶手段または第２記憶手段に記憶されている累積演算結果データを選択し、その累積演算結果データを最終演算結果データ、つまり、全結合層の処理による最終的な演算結果データとして出力する。 The arithmetic processing unit according to the present invention performs the processing of all coupled layers performed after the processing of the intermediate layer as follows. In other words, the arithmetic processing unit is provided with a plurality of intermediate operation result data obtained by a series of feature amount extraction processing by the convolution operation means, activation means, and pooling means, that is, for intermediate operation result data obtained by intermediate layer processing. The image data included in the same area is sequentially read out as unit image data while scanning the search window divided into the areas. Then, the arithmetic processing device sequentially generates unit calculation result data by executing arithmetic processing by the convolution calculation means on the unit image data to be read sequentially. Further, the arithmetic processing unit generates united operation result data by accumulating sequentially generated unit operation result data. The arithmetic processing device repeats the process of storing the unit calculation result data and the cumulative calculation result data while alternately switching the first calculation means and the second calculation means. Then, the arithmetic processing unit finally selects the cumulative calculation result data stored in the first storage means or the second storage means, and the cumulative calculation result data is processed into the final calculation result data, that is, the process of all coupling layers. Is output as final calculation result data.

即ち、演算処理装置によれば、中間層を実現する回路の一部、つまり畳み込み演算手段を利用して全結合層の処理を行うようにした。従って、中間層を実現する回路と別個に全結合層を実現する回路を設ける必要がなく、ニューラルネットワークによる演算処理を実現する演算処理装置の全体の回路規模を小さくすることができる。 That is, according to the arithmetic processing unit, the processing of all the coupling layers is performed using a part of the circuit that realizes the intermediate layer, that is, the convolution arithmetic means. Therefore, it is not necessary to provide a circuit that realizes the fully coupled layer separately from the circuit that realizes the intermediate layer, and the overall circuit scale of the arithmetic processing device that realizes the arithmetic processing by the neural network can be reduced.

さらに、全結合層の処理を多階層にわたって行う場合には、演算処理装置は、中間演算結果データ、単位演算結果データ、累積演算結果データを格納可能な３つの演算結果データ格納手段を備える構成とするとよい。そして、演算処理装置は、これら複数の演算結果データ格納手段のうち何れか１つを中間演算結果データ格納手段として選択し、残りの２つを第１記憶手段および第２記憶手段として選択して使用する構成とするとよい。 Further, in the case where all the connected layers are processed in multiple layers, the arithmetic processing unit includes three arithmetic result data storage means capable of storing intermediate arithmetic result data, unit arithmetic result data, and cumulative arithmetic result data. Good. The arithmetic processing unit selects any one of the plurality of calculation result data storage means as the intermediate calculation result data storage means, and selects the remaining two as the first storage means and the second storage means. It is good to use the configuration.

この構成によれば、今回の階層における最終演算結果データは、３つの演算結果データ格納手段のうち何れか１つに格納される。そのため、次の階層における処理では、その最終演算結果データを格納する演算結果データ格納手段を中間演算結果データ格納手段としてそのまま用いることができる。従って、全結合層の処理を多階層にわたって行う場合であっても、効率良く処理を進めることができる。また、このように３つの演算結果データ格納手段をローテーションさせながら中間演算結果データ格納手段、第１記憶手段、第２記憶手段として使用する構成によれば、全結合層の処理を多階層にわたって行うための回路を別個に設ける必要がない。従って、演算処理装置の回路規模を大きくしなくとも、全結合層の処理を多階層にわたって行うことができる。 According to this configuration, the final calculation result data in the current hierarchy is stored in any one of the three calculation result data storage means. Therefore, in the processing in the next hierarchy, the operation result data storage means for storing the final operation result data can be used as it is as the intermediate operation result data storage means. Therefore, even when the processing of all the connected layers is performed over multiple layers, the processing can be efficiently performed. Further, according to the configuration in which the three calculation result data storage units are rotated as described above and used as the intermediate calculation result data storage unit, the first storage unit, and the second storage unit, the processing of all the connected layers is performed in multiple layers. Therefore, it is not necessary to provide a separate circuit. Therefore, it is possible to perform processing of all coupled layers over multiple layers without increasing the circuit scale of the arithmetic processing unit.

また、演算処理装置は、複数の中間演算結果データから得られる複数の最終演算結果データを畳み込み演算手段の１演算サイクルにより処理される画素サイズと同じサイズの格納領域にまとめて格納する構成とするとよい。全結合層の処理には、最終演算結果データを得るための前段の処理と、この最終演算結果データに対して重み係数を異ならせながら積和演算を行う後段の処理とが含まれる。前段の処理により得られる最終演算結果データを畳み込み演算手段の１演算サイクルにより処理される画素サイズに合わせて格納しておけば、後段の処理における積和演算も畳み込み演算手段を利用して行うことができる。そのため、全結合層の後段の処理を実現するための回路を別個に設ける必要がなく、演算処理装置の全体の回路規模を抑えることができる。 Further, the arithmetic processing unit is configured to collectively store a plurality of final calculation result data obtained from a plurality of intermediate calculation result data in a storage area having the same size as the pixel size processed in one calculation cycle of the convolution calculation means. Good. The processing of all coupling layers includes a first-stage process for obtaining final calculation result data and a second-stage process for performing a product-sum operation with different weighting coefficients for the final calculation result data. If the final operation result data obtained by the preceding process is stored in accordance with the pixel size processed in one operation cycle of the convolution operation means, the product-sum operation in the subsequent process is also performed using the convolution operation means. Can do. Therefore, it is not necessary to separately provide a circuit for realizing the subsequent processing of all the coupling layers, and the entire circuit scale of the arithmetic processing unit can be suppressed.

畳み込みニューラルネットワークの構成例を概念的に示す図A diagram conceptually showing a configuration example of a convolutional neural network 中間層における演算処理の流れを視覚的に示す図（その１）The figure which shows the flow of arithmetic processing in the middle layer visually (the 1) 中間層における演算処理の流れを視覚的に示す図（その２）A diagram visually showing the flow of arithmetic processing in the intermediate layer (Part 2) 特徴量抽出処理に用いられる一般的な演算式および関数を示す図Diagram showing general arithmetic expressions and functions used for feature extraction processing 第１実施形態に係る演算処理装置の構成例を概略的に示すブロック図1 is a block diagram schematically showing a configuration example of an arithmetic processing device according to a first embodiment. 畳み込み演算処理部の構成例を概略的に示すブロック図A block diagram schematically showing a configuration example of a convolution operation processing unit 演算器の構成例を概略的に示すブロック図Block diagram schematically showing a configuration example of an arithmetic unit 探索窓の設定例を示す図Figure showing example of search window setting 全結合層における演算処理の流れを視覚的に示す図A diagram visually showing the flow of processing in the fully connected layer 全結合層における演算処理の結果データの格納例を示す図The figure which shows the example of storing the result data of the arithmetic processing in the all connection layer 第２実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing unit which concerns on 2nd Embodiment. 第３実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing unit which concerns on 3rd Embodiment. 第４実施形態に係る演算処理装置の構成例を概略的に示すブロック図The block diagram which shows roughly the structural example of the arithmetic processing apparatus which concerns on 4th Embodiment.

以下、演算処理装置の複数の実施形態について図面を参照しながら説明する。なお、各実施形態において実質的に同一の要素には同一の符号を付し、説明を省略する。
（ニューラルネットワーク）
図１には、詳しくは後述する演算処理装置１００，２００に適用されるニューラルネットワーク、この場合、畳み込みニューラルネットワークの構成を概念的に示している。畳み込みニューラルネットワークＮは、入力データである画像データＤ１から所定の形状やパターンを認識する画像認識技術に応用されるものであり、中間層Ｎａと全結合層Ｎｂとを有する。中間層Ｎａは、複数の特徴量抽出処理層Ｎａ１，Ｎａ２・・・が階層的に接続された構成である。各特徴量抽出処理層Ｎａ１，Ｎａ２・・・は、それぞれ畳み込み層Ｃおよびプーリング層Ｐを備える。 Hereinafter, a plurality of embodiments of an arithmetic processing device will be described with reference to the drawings. In each embodiment, substantially the same elements are denoted by the same reference numerals, and description thereof is omitted.
(neural network)
FIG. 1 conceptually shows the configuration of a neural network, in this case, a convolutional neural network, applied to arithmetic processing devices 100 and 200 described in detail later. The convolutional neural network N is applied to an image recognition technique for recognizing a predetermined shape or pattern from image data D1, which is input data, and includes an intermediate layer Na and a total coupling layer Nb. The intermediate layer Na has a configuration in which a plurality of feature quantity extraction processing layers Na1, Na2,. Each feature amount extraction processing layer Na1, Na2,... Includes a convolution layer C and a pooling layer P, respectively.

次に、中間層Ｎａにおける処理の流れについて説明する。図２に例示するように、第１層目の特徴量抽出処理層Ｎａ１では、演算処理装置は、入力される画像データＤ１を例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第１層目の特徴量抽出処理層Ｎａ１では、例えば水平方向に延びる線状の特徴量や斜め方向に延びる線状の特徴量などといった比較的シンプルな単独の特徴量を抽出する。 Next, the flow of processing in the intermediate layer Na will be described. As illustrated in FIG. 2, in the first feature amount extraction processing layer Na1, the arithmetic processing unit scans the input image data D1 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. Note that the first feature amount extraction processing layer Na1 extracts relatively simple single feature amounts such as a linear feature amount extending in the horizontal direction and a linear feature amount extending in the oblique direction.

第２層目の特徴量抽出処理層Ｎａ２では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ１から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第２層目の特徴量抽出処理層Ｎａ２では、第１層目の特徴量抽出処理層Ｎａ１で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。 In the second feature amount extraction processing layer Na2, the arithmetic processing unit scans the input data input from the preceding feature amount extraction processing layer Na1 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. In addition, in the feature amount extraction processing layer Na2 of the second layer, by integrating the spatial positional relationship of a plurality of feature amounts extracted by the feature amount extraction processing layer Na1 of the first layer, Extract higher-dimensional composite features.

第３層目の特徴量抽出処理層Ｎａ３では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ２から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる特徴量を抽出する。なお、第３層目の特徴量抽出処理層Ｎａ３では、第２層目の特徴量抽出処理層Ｎａ２で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このように、複数の特徴量抽出処理層による特徴量の抽出処理を繰り返すことで、演算処理装置は、画像データＤ１に含まれる検出対象物体の画像認識を行う。 In the third feature quantity extraction processing layer Na3, the arithmetic processing unit scans the input data input from the previous feature quantity extraction processing layer Na2 for each predetermined size by, for example, raster scanning. And the feature-value contained in an input image is extracted by performing the known feature-value extraction process with respect to the scanned data. The feature extraction processing layer Na3 of the third layer is integrated by considering the spatial positional relationship of a plurality of feature amounts extracted by the feature extraction processing layer Na2 of the second layer, Extract higher-dimensional composite features. In this way, by repeating the feature amount extraction processing by the plurality of feature amount extraction processing layers, the arithmetic processing device performs image recognition of the detection target object included in the image data D1.

演算処理装置は、中間層Ｎａにおいて複数の特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３・・・による処理を繰り返すことで入力画像データＤ１に含まれる種々の特徴量を高次元で抽出していく。そして、演算処理装置は、中間層Ｎａの処理により得られた結果を中間演算結果データとして全結合層Ｎｂに出力する。 The arithmetic processing unit extracts various feature amounts included in the input image data D1 in a high dimension by repeating the processing by the plurality of feature amount extraction processing layers Na1, Na2, Na3... In the intermediate layer Na. Then, the arithmetic processing unit outputs the result obtained by the processing of the intermediate layer Na to the all coupling layer Nb as intermediate operation result data.

全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。即ち、全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合し、さらに、その結合結果に対して重み係数を異ならせながら積和演算を行うことにより、最終的な演算結果データ、即ち、入力データである画像データＤ１に含まれる検出対象物を認識した画像データを出力する。このとき、積和演算による演算結果の値が大きい部分が検出対象物の一部または全部として認識される。 The total coupling layer Nb combines a plurality of intermediate calculation result data obtained from the intermediate layer Na and outputs final calculation result data. That is, the total connection layer Nb combines a plurality of intermediate operation result data obtained from the intermediate layer Na, and further performs a sum-of-products operation while varying the weighting coefficient for the combined result, thereby obtaining a final operation. Result data, that is, image data in which the detection target included in the image data D1 as input data is recognized is output. At this time, the part where the value of the result of the product-sum operation is large is recognized as a part or all of the detection target.

なお、後述する各実施形態においては、演算処理装置は、中間層Ｎａを実現する回路の一部を利用して全結合層Ｎｂを実現する回路を構成している。即ち、本発明に係る演算処理装置は、中間層を実現する回路と全結合層を実現する回路と共通化した構成となっている。 In each of the embodiments described later, the arithmetic processing unit constitutes a circuit that realizes the entire coupling layer Nb by using a part of the circuit that realizes the intermediate layer Na. That is, the arithmetic processing unit according to the present invention has a configuration in which a circuit for realizing the intermediate layer and a circuit for realizing the fully coupled layer are shared.

次に、演算処理装置による特徴量抽出処理の流れについて説明する。図３に例示するように、演算処理装置は、前階層の特徴量抽出処理層から入力される入力データＤｎを所定サイズ、この場合、図にてハッチングで示す３×３画素ごとに走査する。なお、画素サイズは、３×３画素に限られず適宜変更することができる。 Next, a flow of feature amount extraction processing by the arithmetic processing device will be described. As illustrated in FIG. 3, the arithmetic processing device scans the input data Dn input from the feature amount extraction processing layer of the previous hierarchy, in this case, every 3 × 3 pixels indicated by hatching in the drawing. The pixel size is not limited to 3 × 3 pixels and can be changed as appropriate.

そして、演算処理装置は、走査したデータに対して、それぞれ周知の畳み込み演算を行う。そして、演算処理装置は、畳み込み演算後のデータに対して周知の活性化処理を行い、畳み込み層Ｃの出力とする。そして、演算処理装置は、畳み込み層Ｃの出力データＣｎに対して、所定サイズ、この場合、２×２画素ごとに周知のプーリング処理を行い、プーリング層Ｐの出力とする。そして、演算処理装置は、プーリング層Ｐの出力データＰｎを次の階層の特徴量抽出処理層に出力する。なお、画素サイズは、２×２画素に限られず適宜変更することができる。 The arithmetic processing unit performs a known convolution operation on the scanned data. Then, the arithmetic processing device performs a well-known activation process on the data after the convolution operation, and outputs the result to the convolution layer C. Then, the arithmetic processing unit performs a well-known pooling process on the output data Cn of the convolution layer C at a predetermined size, in this case, 2 × 2 pixels, and outputs the result to the pooling layer P. Then, the arithmetic processing device outputs the output data Pn of the pooling layer P to the feature amount extraction processing layer of the next layer. The pixel size is not limited to 2 × 2 pixels and can be changed as appropriate.

図４には、畳み込み演算処理に用いられる畳み込み関数、活性化処理に用いられる関数、プーリング処理に用いられる関数の一般的な例を示している。即ち、畳み込み関数Ｙｉｊは、直前の層の出力Ｘｉｊに学習により得られる重み係数Ｗｐ，ｑを乗算した値を累積する関数となっている。なお、「Ｎ」は１サイクルの畳み込み演算処理により処理される画素サイズを示す。即ち、例えば１演算サイクルの画素サイズが「３×３」画素である場合、Ｎの値は「２」である。また、畳み込み関数Ｙｉｊは、累積値に所定のバイアス値を加算する関数としてもよい。また、畳み込み関数は、全結合処理にも対応し得る積和演算が可能な関数であれば、種々の関数を採用することができる。また、活性化処理には、周知のロジスティックジグモイド関数やＲｅＬＵ関数（Rectified Linear Units）などが用いられる。また、プーリング処理には、入力されるデータの最大値を出力する周知の最大プーリング関数や、入力されるデータの平均値を出力する周知の平均プーリング関数などが用いられる。 FIG. 4 shows general examples of a convolution function used for convolution operation processing, a function used for activation processing, and a function used for pooling processing. That is, the convolution function Yij is a function that accumulates values obtained by multiplying the output Xij of the immediately preceding layer by the weighting factors Wp, q obtained by learning. Note that “N” indicates a pixel size to be processed by one cycle of convolution operation processing. That is, for example, when the pixel size of one calculation cycle is “3 × 3” pixels, the value of N is “2”. Further, the convolution function Yij may be a function for adding a predetermined bias value to the accumulated value. Various functions can be adopted as the convolution function as long as it is a function capable of multiply-accumulate operation that can cope with all-join processing. For the activation process, a well-known logistic sigmoid function, ReLU function (Rectified Linear Units), or the like is used. For the pooling process, a known maximum pooling function that outputs a maximum value of input data, a known average pooling function that outputs an average value of input data, or the like is used.

（第１実施形態）
図５に例示する演算処理装置１００は、畳み込みニューラルネットワークによる演算を実行する装置であり、畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３、データ読み出し処理部１０４、加算処理部１０５、切替回路１０６、記憶部１０７、記憶部１０８、記憶部１０９、選択回路１１０などを備える。記憶部１０７，１０８，１０９は、例えば半導体メモリなどといった記憶媒体で構成される。また、記憶部１０７，１０８，１０９以外の構成要素は、ソフトウェアにより仮想的に実現してもよいし、ハードウェアにより構成してもよい。 (First embodiment)
An arithmetic processing device 100 illustrated in FIG. 5 is a device that performs arithmetic operations using a convolutional neural network, and includes a convolution arithmetic processing unit 101, an activation processing unit 102, a pooling processing unit 103, a data read processing unit 104, and an addition processing unit 105. A switching circuit 106, a storage unit 107, a storage unit 108, a storage unit 109, a selection circuit 110, and the like. The storage units 107, 108, and 109 are configured by a storage medium such as a semiconductor memory. In addition, the constituent elements other than the storage units 107, 108, and 109 may be virtually realized by software or may be configured by hardware.

この場合、演算処理装置１００は、中間層Ｎａを実現する回路として機能する状態と全結合層Ｎｂを実現する回路として機能する状態とに切り替えられるようになっている。活性化処理部１０２、プーリング処理部１０３は、演算処理装置１００が中間層Ｎａを実現する回路として機能する場合に動作し、全結合層Ｎｂを実現する回路として機能する場合には動作しないようになっている。 In this case, the arithmetic processing unit 100 can be switched between a state that functions as a circuit that realizes the intermediate layer Na and a state that functions as a circuit that realizes the entire coupling layer Nb. The activation processing unit 102 and the pooling processing unit 103 operate when the arithmetic processing device 100 functions as a circuit that realizes the intermediate layer Na, and does not operate when it functions as a circuit that realizes the all coupling layer Nb. It has become.

畳み込み演算処理部１０１は、畳み込み演算手段の一例であり、入力されるデータに対して周知の畳み込み演算処理を行う。畳み込み演算処理部１０１は、複数のシストリックアレイにより構成されており、１回の演算サイクルにより、この場合、３×３画素のデータに対して演算処理を実行するようになっている。 The convolution operation processing unit 101 is an example of a convolution operation means, and performs a known convolution operation process on input data. The convolution operation processing unit 101 is composed of a plurality of systolic arrays, and in this case, the operation processing is performed on data of 3 × 3 pixels by one operation cycle.

ここで、畳み込み演算処理部１０１の構成例についてさらに詳細に説明する。図６に例示するように、畳み込み演算処理部１０１は、同一構成の複数の演算器１２０を相互に接続した構成である。この場合、演算器１２０は、前階層からデータが入力される入力側から演算結果データを出力する出力側に向かって複数段、この場合、３段に配列されている。また、畳み込み演算処理部１０１は、それぞれ３段の演算器１２０からなる３本の演算ラインＬａ１〜Ｌａ３を並列に備えている。 Here, a configuration example of the convolution operation processing unit 101 will be described in more detail. As illustrated in FIG. 6, the convolution operation processing unit 101 has a configuration in which a plurality of arithmetic units 120 having the same configuration are connected to each other. In this case, the arithmetic units 120 are arranged in a plurality of stages, in this case, in three stages, from the input side where data is input from the previous layer to the output side where the operation result data is output. In addition, the convolution operation processing unit 101 includes three operation lines La1 to La3 each including three operation units 120 in parallel.

畳み込み演算処理部１０１は、それぞれの演算ラインＬａ１〜Ｌａ３において各段の演算器１２０による演算結果を演算サイクルごとに出力側の演算器に順次転送してくことで、畳み込み演算を並列的に行うようにした構成である。そして、畳み込み演算処理部１０１は、各演算ラインＬａ１〜Ｌａ３により得られる演算結果データを加算器１２１により加算して畳み込み演算結果データとして出力する。 The convolution operation processing unit 101 performs the convolution operation in parallel by sequentially transferring the operation results of the operation units 120 of the respective stages to the output-side operation units for each operation cycle in each of the operation lines La1 to La3. It is the composition made. Then, the convolution operation processing unit 101 adds the operation result data obtained by the operation lines La1 to La3 by the adder 121 and outputs the result as convolution operation result data.

また、各段の演算器１２０には、前階層から入力される入力データがフリップフロップ回路１３０によりタイミングを調整されながら入力される構成である。また、各段の演算器１２０には、重み係数記憶部１４０から重み係数が入力される構成である。なお、フリップフロップ回路１３０や重み係数記憶部１４０も、入力側から出力側に向かって各段の演算器１２０に対応して複数段に配列されている。 In addition, each stage of the arithmetic unit 120 is configured such that input data input from the previous layer is input while the timing is adjusted by the flip-flop circuit 130. In addition, a weighting factor is input from the weighting factor storage unit 140 to the calculator 120 at each stage. Note that the flip-flop circuit 130 and the weight coefficient storage unit 140 are also arranged in a plurality of stages corresponding to the computing units 120 in each stage from the input side to the output side.

図７に例示するように、演算器１２０は、入力データと重み係数とを乗算する乗算器１２０ａと、乗算器１２０ａによる演算結果を加算する加算器１２０ｂとを備える。そして、演算器１２０は、加算器１２０ｂによる演算結果データをフリップフロップ回路１２０ｃを介して出力する構成である。なお、演算器１２０は、この構成に限られるものではない。 As illustrated in FIG. 7, the arithmetic unit 120 includes a multiplier 120 a that multiplies input data and a weighting factor, and an adder 120 b that adds the calculation results of the multiplier 120 a. The computing unit 120 is configured to output the computation result data from the adder 120b via the flip-flop circuit 120c. Note that the arithmetic unit 120 is not limited to this configuration.

活性化処理部１０２は、活性化手段の一例であり、畳み込み演算処理部１０１が出力するデータに対して周知の活性化処理を行う。なお、活性化処理部１０２による活性化処理は、ロジスティックジグモイド関数またはＲｅＬＵ関数の何れの関数を採用してもよい。また、活性化処理部１０２による活性化処理は、その他の非線形関数を採用してもよい。プーリング処理部１０３は、プーリング手段の一例であり、活性化処理部１０２が出力するデータに対して周知のプーリング処理を行う。 The activation processing unit 102 is an example of an activation unit, and performs a known activation process on the data output from the convolution operation processing unit 101. Note that the activation processing by the activation processing unit 102 may employ either a logistic sigmoid function or a ReLU function. Further, the activation processing by the activation processing unit 102 may employ other nonlinear functions. The pooling processing unit 103 is an example of a pooling unit, and performs a well-known pooling process on the data output from the activation processing unit 102.

演算処理装置１００は、中間層Ｎａを実現する回路として機能する場合には、畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３による一連の特徴量抽出処理を繰り返すことにより、入力データに含まれる特徴量を高次元で抽出していく。そして、演算処理装置１００は、中間層Ｎａとして機能することにより得られた最終的な演算結果データを中間演算結果データとして出力する。そして、演算処理装置１００は、出力した中間演算結果データを記憶部１０７に格納する。この場合、記憶部１０７は、中間演算結果データ格納手段の一例として機能する。演算処理装置１００は、記憶部１０７に中間演算結果データを格納すると、全結合層Ｎｂを実現する回路として機能するように切り替えられる。 When the arithmetic processing device 100 functions as a circuit that realizes the intermediate layer Na, the input data is obtained by repeating a series of feature amount extraction processing by the convolution arithmetic processing unit 101, the activation processing unit 102, and the pooling processing unit 103. The feature quantity contained in is extracted in a high dimension. And the arithmetic processing unit 100 outputs the final calculation result data obtained by functioning as the intermediate layer Na as intermediate calculation result data. Then, the arithmetic processing device 100 stores the output intermediate calculation result data in the storage unit 107. In this case, the storage unit 107 functions as an example of an intermediate calculation result data storage unit. When the arithmetic processing device 100 stores the intermediate calculation result data in the storage unit 107, the arithmetic processing device 100 is switched so as to function as a circuit that realizes the entire coupling layer Nb.

データ読み出し処理部１０４は、単位画像データ読み出し手段の一例として機能する。即ち、データ読み出し処理部１０４は、矩形状の探索窓Ｗを設定するとともに、その探索窓Ｗを複数、この場合、４つの矩形状の領域に分割する。この場合、図８に例示するように、データ読み出し処理部１０４は、畳み込み演算処理部１０１の１回の演算サイクルにより処理される画素サイズ、この場合、３×３画素の探索窓よりも大きい５×５画素のサイズの探索窓Ｗを設定する。そして、データ読み出し処理部１０４は、畳み込み演算処理部１０１の１回の演算サイクルにより処理される画素サイズ、この場合、３×３画素ごとに、探索窓Ｗを４つの領域ｗ１〜ｗ４に分割する。なお、この場合、領域ｗ２〜ｗ４は、その一部が探索窓Ｗからはみ出してしまう。そのため、領域ｗ２〜ｗ４のうち探索窓Ｗからはみ出す部分に含まれる画素に対する処理は無効化されるようになっている。 The data read processing unit 104 functions as an example of a unit image data read unit. That is, the data read processing unit 104 sets a rectangular search window W and divides the search window W into a plurality of, in this case, four rectangular areas. In this case, as illustrated in FIG. 8, the data read processing unit 104 has a pixel size processed by one calculation cycle of the convolution calculation processing unit 101, in this case, 5 larger than the search window of 3 × 3 pixels. A search window W having a size of × 5 pixels is set. Then, the data reading processing unit 104 divides the search window W into four regions w1 to w4 for each pixel size processed in one calculation cycle of the convolution calculation processing unit 101, in this case, 3 × 3 pixels. . In this case, the areas w2 to w4 partially protrude from the search window W. For this reason, processing for pixels included in a portion of the regions w2 to w4 that protrudes from the search window W is invalidated.

データ読み出し処理部１０４は、記憶部１０７に格納されている中間演算結果データに対して探索窓Ｗを移動させながら同一の領域ｗ１〜ｗ４に含まれる画像データを単位画像データとして順次読み出す。即ち、データ読み出し処理部１０４は、１回目の走査処理においては、中間演算結果データに対して探索窓Ｗを移動させながら領域ｗ１に含まれる画像データを単位画像データとして読み出し、畳み込み演算処理部１０１に出力する。また、データ読み出し処理部１０４は、２回目の走査処理においては、中間演算結果データに対して探索窓Ｗを移動させながら領域ｗ２に含まれる画像データを単位画像データとして読み出し、畳み込み演算処理部１０１に出力する。以降、データ読み出し処理部１０４は、３回目の走査処理においては領域ｗ３に、４回目の走査処理においては領域ｗ４に含まれる画像データをそれぞれ単位画像データとして読み出して畳み込み演算処理部１０１に出力する。このようにして、データ読み出し処理部１０４は、探索窓Ｗに含まれる全ての領域ｗ１〜ｗ４について、それぞれ走査処理を行っていく。 The data read processing unit 104 sequentially reads out the image data included in the same region w1 to w4 as unit image data while moving the search window W with respect to the intermediate calculation result data stored in the storage unit 107. That is, in the first scanning process, the data read processing unit 104 reads the image data included in the region w1 as unit image data while moving the search window W with respect to the intermediate calculation result data, and performs the convolution calculation processing unit 101. Output to. Further, in the second scanning process, the data read processing unit 104 reads the image data included in the region w2 as unit image data while moving the search window W with respect to the intermediate calculation result data, and performs the convolution calculation processing unit 101. Output to. Thereafter, the data read processing unit 104 reads the image data included in the region w3 in the third scanning process and the image data included in the region w4 in the fourth scanning process as unit image data, and outputs the unit image data to the convolution calculation processing unit 101. . In this way, the data read processing unit 104 performs the scanning process for all the regions w1 to w4 included in the search window W.

演算処理装置１００が全結合層Ｎｂを実現する回路として機能する場合には、畳み込み演算処理部１０１は、単位演算結果データ生成手段の一例として機能する。即ち、畳み込み演算処理部１０１は、畳み込み演算処理に用いられる畳み込み関数を用いて、データ読み出し処理部１０４が読み出す単位画像データに対して演算処理を行うことにより単位演算結果データを生成する。畳み込み演算処理部１０１は、データ読み出し処理部１０４が順次読み出す全ての単位画像データについて、それぞれ単位演算結果データを生成する。そして、畳み込み演算処理部１０１は、単位演算結果データを生成するごとに、その単位演算結果データを加算処理部１０５および切替回路１０６に出力する。
加算処理部１０５は、累積演算結果データ生成手段の一例として機能する。即ち、加算処理部１０５は、詳しくは後述するように、畳み込み演算処理部１０１が順次生成する単位演算結果データを累積することにより累積演算結果データを生成する。 When the arithmetic processing device 100 functions as a circuit that realizes the all coupling layer Nb, the convolution arithmetic processing unit 101 functions as an example of a unit arithmetic result data generation unit. That is, the convolution operation processing unit 101 generates unit operation result data by performing operation processing on the unit image data read by the data read processing unit 104 using a convolution function used for the convolution operation processing. The convolution operation processing unit 101 generates unit operation result data for all the unit image data sequentially read by the data read processing unit 104. Then, every time the unit calculation result data is generated, the convolution calculation processing unit 101 outputs the unit calculation result data to the addition processing unit 105 and the switching circuit 106.
The addition processing unit 105 functions as an example of a cumulative calculation result data generation unit. That is, as will be described in detail later, the addition processing unit 105 generates accumulated calculation result data by accumulating unit calculation result data sequentially generated by the convolution calculation processing unit 101.

切替回路１０６は、演算結果データ格納手段の一例として機能する。畳み込み演算処理部１０１が生成する単位演算結果データおよび加算処理部１０５が生成する累積演算結果データを記憶部１０８および記憶部１０９に交互に切り替えながら格納する。即ち、切替回路１０６は、例えば、１回目の演算サイクルにおいて、単位演算結果データを記憶部１０８に格納し、累積演算結果データを記憶部１０９に格納した場合には、２回目の演算サイクルにおいては、単位演算結果データを記憶部１０９に格納し、累積演算結果データを記憶部１０８に格納する。そして、切替回路１０６は、３回目の演算サイクルにおいては、再び、単位演算結果データを記憶部１０８に、累積演算結果データを記憶部１０９に格納し、以降、１演算サイクルごとに、単位演算結果データを格納する記憶部と累積演算結果データを格納する記憶部とを順次切り替えていく。なお、ここでは、便宜的に、記憶部１０８を第１記憶手段の一例として定義し、記憶部１０９を第２記憶手段の一例として定義する。但し、記憶部１０８を第２記憶手段の一例として定義し、記憶部１０９を第１記憶手段の一例として定義してもよい。 The switching circuit 106 functions as an example of calculation result data storage means. The unit calculation result data generated by the convolution calculation processing unit 101 and the cumulative calculation result data generated by the addition processing unit 105 are stored in the storage unit 108 and the storage unit 109 while being switched alternately. That is, for example, when the switching circuit 106 stores the unit calculation result data in the storage unit 108 and stores the cumulative calculation result data in the storage unit 109 in the first calculation cycle, in the second calculation cycle, The unit calculation result data is stored in the storage unit 109, and the cumulative calculation result data is stored in the storage unit 108. Then, in the third calculation cycle, the switching circuit 106 stores the unit calculation result data in the storage unit 108 and the cumulative calculation result data in the storage unit 109 again, and thereafter the unit calculation result for each calculation cycle. A storage unit for storing data and a storage unit for storing cumulative calculation result data are sequentially switched. Here, for convenience, the storage unit 108 is defined as an example of a first storage unit, and the storage unit 109 is defined as an example of a second storage unit. However, the storage unit 108 may be defined as an example of the second storage unit, and the storage unit 109 may be defined as an example of the first storage unit.

選択回路１１０は、演算結果データ選択手段の一例として機能する。即ち、選択回路１１０は、畳み込み演算処理部１０１からの単位演算結果データの出力が続いている場合には、その時点において記憶部１０８または記憶部１０９に格納されている単位演算結果データを選択して加算処理部１０５に与える。加算処理部１０５は、選択回路１１０から与えられる単位演算結果データに畳み込み演算処理部１０１から与えられる単位演算結果データを加算することにより、単位演算結果データを順次累積して累積演算結果データを生成する。一方、選択回路１１０は、畳み込み演算処理部１０１からの単位演算結果データの出力が途絶えると、その時点において記憶部１０８または記憶部１０９に格納されている累積演算結果データを選択して最終演算結果データとして出力する。 The selection circuit 110 functions as an example of calculation result data selection means. That is, when the output of the unit calculation result data from the convolution calculation processing unit 101 continues, the selection circuit 110 selects the unit calculation result data stored in the storage unit 108 or the storage unit 109 at that time. To the addition processing unit 105. The addition processing unit 105 sequentially accumulates the unit operation result data by adding the unit operation result data supplied from the convolution operation processing unit 101 to the unit operation result data supplied from the selection circuit 110, thereby generating accumulated operation result data. To do. On the other hand, when the output of the unit calculation result data from the convolution calculation processing unit 101 is interrupted, the selection circuit 110 selects the cumulative calculation result data stored in the storage unit 108 or the storage unit 109 at that time and selects the final calculation result data. Output as data.

次に、演算処理装置１００が全結合層Ｎｂを実現する回路として機能する場合における動作例について説明する。即ち、演算処理装置１００は、中間演算結果データに対する全結合処理を、探索窓Ｗの左上領域ｗ１、右上領域ｗ２、左下領域ｗ３、右下領域ｗ４に分けて行う。 Next, an operation example in the case where the arithmetic processing device 100 functions as a circuit for realizing the all coupling layer Nb will be described. In other words, the arithmetic processing device 100 performs the total combining process on the intermediate calculation result data in the upper left area w1, the upper right area w2, the lower left area w3, and the lower right area w4 of the search window W.

まず、左上領域ｗ１についての全結合処理では、演算処理装置１００は、中間演算結果データに対して探索窓Ｗを左上端から右下端に向けて走査させながら、その走査に伴い移動する左上領域ｗ１に含まれる９画素の画像データを記憶部１０７から順次読み込み、順次読み込んだ画像データに対してそれぞれ演算処理を行っていく。そして、演算処理装置１００は、左上領域ｗ１について得られた演算結果データを、この場合、記憶部１０８に格納する。 First, in the full joining process for the upper left area w1, the arithmetic processing unit 100 scans the intermediate calculation result data from the upper left end toward the lower right end while moving the upper left area w1 that moves with the scanning. Are sequentially read from the storage unit 107, and each of the sequentially read image data is subjected to arithmetic processing. Then, the arithmetic processing unit 100 stores the calculation result data obtained for the upper left region w1 in the storage unit 108 in this case.

次に、右上領域ｗ２についての全結合処理では、演算処理装置１００は、中間演算結果データに対して探索窓Ｗを左上端から右下端に向けて走査させながら、その走査に伴い移動する右上領域ｗ２に含まれる９画素の画像データを記憶部１０７から順次読み込み、読み込んだ画像データに対してそれぞれ演算処理を行っていく。なお、右上領域ｗ２のうち探索窓Ｗからはみ出した部分に含まれる画素に対する演算処理の結果は無効化される。 Next, in the full join process for the upper right area w2, the arithmetic processing unit 100 scans the intermediate calculation result data from the upper left corner toward the lower right corner while moving the upper right area moving with the scanning. The image data of 9 pixels included in w2 is sequentially read from the storage unit 107, and each of the read image data is subjected to arithmetic processing. Note that the result of the arithmetic processing for the pixels included in the portion of the upper right region w2 that protrudes from the search window W is invalidated.

そして、演算処理装置１００は、右上領域ｗ２について得られた演算結果データを、この場合、記憶部１０９に格納する。このとき、演算処理装置１００は、記憶部１０８に格納されているデータを読み出して加算処理部１０５に入力することにより、左上領域ｗ１の演算結果データと右上領域ｗ２の演算結果データを累積した累積演算結果データを得る。そして、演算処理装置１００は、得られた累積演算結果データを記憶部１０９に格納する。 Then, the arithmetic processing unit 100 stores the calculation result data obtained for the upper right region w2 in the storage unit 109 in this case. At this time, the arithmetic processing unit 100 reads out the data stored in the storage unit 108 and inputs it to the addition processing unit 105, thereby accumulating the operation result data of the upper left region w1 and the operation result data of the upper right region w2. Obtain operation result data. Then, the arithmetic processing device 100 stores the obtained cumulative calculation result data in the storage unit 109.

次に、左下領域ｗ３についての全結合処理では、演算処理装置１００は、中間演算結果データに対して探索窓Ｗを左上端から右下端に向けて走査させながら、その走査に伴い移動する左下領域ｗ３に含まれる９画素の画像データを記憶部１０７から順次読み込み、読み込んだ画像データに対してそれぞれ演算処理を行っていく。なお、左下領域ｗ３のうち探索窓Ｗからはみ出した部分に含まれる画素に対する演算処理の結果は無効化される。 Next, in the full combination process for the lower left area w3, the arithmetic processing unit 100 scans the intermediate calculation result data from the upper left end toward the lower right end while moving the lower left area moving with the scan. The image data of 9 pixels included in w3 is sequentially read from the storage unit 107, and each of the read image data is subjected to arithmetic processing. Note that the result of the arithmetic processing for the pixels included in the portion of the lower left region w3 that protrudes from the search window W is invalidated.

そして、演算処理装置１００は、左下領域ｗ３について得られた演算結果データを、この場合、記憶部１０８に格納する。このとき、演算処理装置１００は、記憶部１０９に格納されているデータを読み出して加算処理部１０５に入力することにより、左上領域ｗ１の演算結果データと右上領域ｗ２の演算結果データと左下領域ｗ３の演算結果データを累積した累積演算結果データを得る。そして、演算処理装置１００は、得られた累積演算結果データを記憶部１０８に格納する。 Then, the arithmetic processing device 100 stores the operation result data obtained for the lower left region w3 in the storage unit 108 in this case. At this time, the arithmetic processing device 100 reads out the data stored in the storage unit 109 and inputs the data to the addition processing unit 105, whereby the arithmetic result data in the upper left region w1, the arithmetic result data in the upper right region w2, and the lower left region w3. Cumulative calculation result data obtained by accumulating the calculation result data is obtained. Then, the arithmetic processing apparatus 100 stores the obtained cumulative calculation result data in the storage unit 108.

次に、右下領域ｗ４についての全結合処理では、演算処理装置１００は、中間演算結果データに対して探索窓Ｗを左上端から右下端に向けて走査させながら、その走査に伴い移動する右下領域ｗ４に含まれる９画素の画像データを記憶部１０７から順次読み込み、読み込んだ画像データに対してそれぞれ演算処理を行っていく。なお、右下領域ｗ４のうち探索窓Ｗからはみ出した部分に含まれる画素に対する演算処理の結果は無効化される。 Next, in the full combination process for the lower right region w4, the arithmetic processing unit 100 scans the search window W from the upper left end toward the lower right end with respect to the intermediate calculation result data, and moves to the right as it moves. The image data of 9 pixels included in the lower area w4 is sequentially read from the storage unit 107, and the calculation processing is performed on each of the read image data. Note that the result of the arithmetic processing for pixels included in the portion of the lower right region w4 that protrudes from the search window W is invalidated.

そして、演算処理装置１００は、右下領域ｗ４について得られた演算結果データを、この場合、記憶部１０９に格納する。このとき、演算処理装置１００は、記憶部１０８に格納されているデータを読み出して加算処理部１０５に入力することにより、左上領域ｗ１の演算結果データと右上領域ｗ２の演算結果データと左下領域ｗ３の演算結果データと右下領域ｗ４の演算結果データを累積した累積演算結果データを得る。そして、演算処理装置１００は、得られた累積演算結果データを記憶部１０９に格納する。これにより、探索窓Ｗ全体についての全結合処理の演算結果データが得られる。 Then, the arithmetic processing unit 100 stores the operation result data obtained for the lower right region w4 in the storage unit 109 in this case. At this time, the arithmetic processing device 100 reads out the data stored in the storage unit 108 and inputs it to the addition processing unit 105, whereby the arithmetic result data in the upper left region w1, the arithmetic result data in the upper right region w2, and the lower left region w3. Cumulative calculation result data obtained by accumulating the calculation result data and the calculation result data of the lower right region w4 is obtained. Then, the arithmetic processing device 100 stores the obtained cumulative calculation result data in the storage unit 109. As a result, operation result data of the total joining process for the entire search window W is obtained.

演算処理装置１００は、左上領域ｗ１についての全結合処理、右上領域ｗ２についての全結合処理、左下領域ｗ３についての全結合処理、右下領域ｗ４についての全結合処理において、それぞれ演算に用いる重み係数を異ならせる。そのため、このように、左上領域ｗ１の全結合処理を全て完了してから、次の右上領域ｗ２の全結合処理に進み、右上領域ｗ２の全結合処理を全て完了してから、次の左下領域ｗ３の全結合処理に進む、という処理態様を採用する演算処理装置によれば、演算対象となる領域が変わるときにのみ重み係数を変更すればよい。なお、分割領域ｗ１〜ｗ４の全てについて演算処理を行ってから探索窓Ｗを移動させる構成では、探索窓Ｗを移動させるたびに分割領域の数に応じた重み係数の変更処理が必要となり、その処理負荷が増大する。 The arithmetic processing device 100 uses weighting factors used for computation in all combination processing for the upper left region w1, all combination processing for the upper right region w2, all combination processing for the lower left region w3, and all combination processing for the lower right region w4. Make them different. Therefore, after completing all the joining processes in the upper left area w1 in this way, the process proceeds to the next all joining process in the upper right area w2, and after completing all the joining processes in the upper right area w2, the next lower left area According to the arithmetic processing device that employs the processing mode of proceeding to the full combination processing of w3, the weighting factor only needs to be changed when the region to be calculated changes. In addition, in the configuration in which the search window W is moved after performing arithmetic processing for all of the divided areas w1 to w4, each time the search window W is moved, a weighting coefficient changing process according to the number of divided areas is required. Processing load increases.

以上の全結合処理を視覚的に示すと、図９（ａ）に例示するように、まず左上領域ｗ１についての全結合処理では、探索窓Ｗの移動に伴い左上領域ｗ１から読み出される９画素ずつの画像データに対してそれぞれ畳み込み演算処理部１０１による演算処理が行われる。そして、その演算結果データがメモリ３００に順次格納されていく。 When the above-described total combination processing is shown visually, as illustrated in FIG. 9A, first, in the total combination processing for the upper left region w1, nine pixels read from the upper left region w1 as the search window W moves. Each of the image data is subjected to arithmetic processing by the convolution arithmetic processing unit 101. The calculation result data is sequentially stored in the memory 300.

続く右上領域ｗ２についての全結合処理では、図９（ｂ）に例示するように、探索窓Ｗの移動に伴い右上領域ｗ２から読み出される９画素ずつの画像データに対してそれぞれ畳み込み演算処理部１０１による演算処理が行われる。そして、その演算結果データがメモリ３００に加算されていく。 In the subsequent full combination process for the upper right region w2, as illustrated in FIG. 9B, convolution operation processing units 101 are respectively applied to the image data of 9 pixels read from the upper right region w2 as the search window W is moved. The arithmetic processing by is performed. Then, the calculation result data is added to the memory 300.

続く左下領域ｗ３についての全結合処理では、図９（ｃ）に例示するように、探索窓Ｗの移動に伴い左上領域ｗ３から読み出される９画素ずつの画像データに対してそれぞれ畳み込み演算処理部１０１による演算処理が行われる。そして、その演算結果データがメモリ３００に加算されていく。以降、右下領域ｗ４についても同様な演算処理が行われ、その演算結果が加算されていく。 In the subsequent full joining process for the lower left region w3, as illustrated in FIG. 9C, convolution operation processing units 101 are respectively applied to the image data of 9 pixels read from the upper left region w3 as the search window W moves. The arithmetic processing by is performed. Then, the calculation result data is added to the memory 300. Thereafter, similar calculation processing is performed for the lower right region w4, and the calculation results are added.

このようにして演算処理装置１００は、１つの中間演算結果データについての全結合処理の結果データをメモリ３００に格納する。ところで、中間層Ｎａからは、特徴マップとして複数の中間演算結果データが出力される。そのため、演算処理装置１００は、複数の中間演算結果データに対してそれぞれ同様の全結合処理を行い、その結果データをメモリ３００に格納していく。図１０には、特徴マップとして９つの中間演算結果データが出力された場合において、これら複数の中間演算結果データに対して全結合処理を実施し、その結果データを格納した状態を示している。 In this way, the arithmetic processing unit 100 stores the result data of the total joining process for one intermediate operation result data in the memory 300. By the way, from the intermediate layer Na, a plurality of intermediate calculation result data is output as a feature map. Therefore, the arithmetic processing unit 100 performs the same all-join processing on each of the plurality of intermediate calculation result data, and stores the result data in the memory 300. FIG. 10 shows a state in which, when nine intermediate calculation result data are output as a feature map, a full join process is performed on the plurality of intermediate calculation result data, and the result data is stored.

メモリ３００は、記憶部１０８，１０９を概念的に含む記憶媒体であり、最終演算結果データ格納手段の一例として機能する。なお、メモリ３００は、記憶部１０８，１０９とは別個の記憶媒体として構成してもよい。そして、演算処理装置１００によれば、メモリ３００に対する最終演算結果データの格納態様にも工夫が施されている。即ち、演算処理装置１００は、複数、この場合、９つの中間演算結果データから得られる複数の最終演算結果データＭ１〜Ｍ９を、畳み込み演算処理部１０１の１演算サイクルにより処置される画素サイズと同じサイズである所定の格納領域Ｒにまとめて格納するようになっている。この場合、畳み込み演算処理部１０１の１演算サイクルにより処理される画素サイズは３×３画素であるから、演算処理装置１００は、これと同じサイズの３×３画素の格納領域Ｒを設定する。 The memory 300 is a storage medium conceptually including the storage units 108 and 109, and functions as an example of a final calculation result data storage unit. Note that the memory 300 may be configured as a storage medium separate from the storage units 108 and 109. In addition, according to the arithmetic processing device 100, the storage mode of the final calculation result data in the memory 300 is also devised. That is, the arithmetic processing unit 100 uses the plurality of final calculation result data M <b> 1 to M <b> 9 obtained from the nine intermediate calculation result data in this case, the same as the pixel size treated by one calculation cycle of the convolution calculation processing unit 101. The data is stored together in a predetermined storage area R having a size. In this case, since the pixel size processed in one operation cycle of the convolution operation processing unit 101 is 3 × 3 pixels, the operation processing device 100 sets a storage region R of 3 × 3 pixels having the same size.

上述したように演算処理装置１００は、各分割領域ｗ１〜ｗ４についての全結合処理において、探索窓Ｗを左上端から右下端に向けて走査させていく。このとき、演算処理装置１００は、１走査サイクルごとに探索窓Ｗを１画素ずつずらしてく。そして、演算処理装置１００は、１つ目の中間演算結果データに対する全結合処理の結果データを次のように格納する。即ち、図１０に例示するように、演算処理装置１００は、分割領域ｗ１についての１走査サイクル目の最終演算結果データＭ１［１］をメモリ３００のアドレス（１，１）に格納し、分割領域ｗ１についての２走査サイクル目の最終演算結果データＭ１［２］をメモリ３００のアドレス（４，１）に格納し、分割領域ｗ１についての３走査サイクル目の最終演算結果データＭ１［３］をメモリ３００のアドレス（１，４）に格納し、分割領域ｗ１についての４走査サイクル目の最終演算結果データＭ１［４］をメモリ３００のアドレス（４，４）に格納する。 As described above, the arithmetic processing unit 100 scans the search window W from the upper left end toward the lower right end in the full combination process for each of the divided regions w1 to w4. At this time, the arithmetic processing unit 100 shifts the search window W by one pixel every scanning cycle. Then, the arithmetic processing unit 100 stores the result data of the full joining process for the first intermediate operation result data as follows. That is, as illustrated in FIG. 10, the arithmetic processing unit 100 stores the final calculation result data M1 [1] of the first scanning cycle for the divided area w1 at the address (1, 1) of the memory 300, and The final calculation result data M1 [2] of the second scanning cycle for w1 is stored in the address (4, 1) of the memory 300, and the final calculation result data M1 [3] of the third scanning cycle for the divided area w1 is stored in the memory. The final operation result data M1 [4] of the fourth scanning cycle for the divided area w1 is stored in the address (4, 4) of the memory 300.

また、演算処理装置１００は、２つ目の中間演算結果データに対する全結合処理の結果データを次のように格納する。即ち、演算処理装置１００は、分割領域ｗ１についての１走査サイクル目の最終演算結果データＭ２［１］をメモリ３００のアドレス（２，１）に格納し、分割領域ｗ１についての２走査サイクル目の最終演算結果データＭ２［２］をメモリ３００のアドレス（５，１）に格納し、分割領域ｗ１についての３走査サイクル目の最終演算結果データＭ２［３］をメモリ３００のアドレス（２，４）に格納し、分割領域ｗ１についての４走査サイクル目の最終演算結果データＭ２［４］をメモリ３００のアドレス（５，４）に格納する。 In addition, the arithmetic processing unit 100 stores the result data of the full join process for the second intermediate calculation result data as follows. That is, the arithmetic processing unit 100 stores the final calculation result data M2 [1] in the first scanning cycle for the divided area w1 at the address (2, 1) of the memory 300, and the second scanning cycle for the divided area w1. The final calculation result data M2 [2] is stored in the address (5, 1) of the memory 300, and the final calculation result data M2 [3] in the third scanning cycle for the divided area w1 is stored in the address (2, 4) of the memory 300. And the final calculation result data M2 [4] of the fourth scanning cycle for the divided region w1 is stored at the address (5, 4) of the memory 300.

以降、演算処理装置１００は、３つ目以降の中間演算結果データに対する全結合処理の結果データＭ３，Ｍ４，・・・を同様に格納していく。即ち、演算処理装置１００は、この場合、メモリ３００のＸ方向およびＹ方向にそれぞれ３アドレス分のオフセットをとりながら各走査サイクルの最終演算結果データを格納していく。なお、図１０では、左右方向をメモリ３００のＸ方向、上下方向をメモリ３００のＹ方向と定義している。 Thereafter, the arithmetic processing unit 100 similarly stores the result data M3, M4,... Of the total combination processing for the third and subsequent intermediate calculation result data. That is, in this case, the arithmetic processing unit 100 stores the final calculation result data of each scanning cycle while offsetting three addresses in the X direction and the Y direction of the memory 300, respectively. In FIG. 10, the horizontal direction is defined as the X direction of the memory 300, and the vertical direction is defined as the Y direction of the memory 300.

演算処理装置１００は、このように格納領域Ｒごとにまとめた最終演算結果データに対して、全結合層の後段の処理を実行する。即ち、演算処理装置１００は、格納領域Ｒごとにまとめられた最終演算結果データに対して、重み係数を異ならせながら積和演算を行う。このとき、最終演算結果データは、畳み込み演算処理部１０１の１演算サイクルにより処理される画素サイズと同じサイズの格納領域Ｒにまとめられている。そのため、演算処理装置１００は、畳み込み演算処理部１０１を利用して各格納領域Ｒごとに積和演算を行うことができるようになっている。 The arithmetic processing device 100 executes the subsequent processing of all the coupling layers on the final calculation result data collected for each storage region R in this way. That is, the arithmetic processing device 100 performs a product-sum operation on the final operation result data collected for each storage area R while changing the weighting factor. At this time, the final calculation result data is collected in the storage area R having the same size as the pixel size processed in one calculation cycle of the convolution calculation processing unit 101. Therefore, the arithmetic processing device 100 can perform a product-sum operation for each storage region R using the convolution operation processing unit 101.

演算処理装置１００によれば、中間層Ｎａの処理の後に行う全結合層Ｎｂの処理を以下のように行う。即ち、演算処理装置１００は、畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３による一連の特徴量抽出処理により得られる中間演算結果データ、つまり、中間層Ｎａの処理により得られる中間演算結果データに対して、複数の領域ｗ１〜ｗ４に分割された探索窓Ｗを走査しながら、同一の領域ｗ１〜ｗ４に含まれる画像データを単位画像データとして順次読み出す。そして、演算処理装置１００は、順次読み出す単位画像データに対して畳み込み演算処理部１０１による演算処理を実行することにより単位演算結果データを順次生成する。また、演算処理装置１００は、順次生成する単位演算結果データを累積して累積演算結果データを生成する。そして、演算処理装置１００は、単位演算結果データおよび累積演算結果データを記憶部１０８および記憶部１０９に交互に切り替えながら格納する処理を繰り返す。そして、演算処理装置１００は、最終的に記憶部１０８または記憶部１０９に記憶されている累積演算結果データを選択し、その累積演算結果データを最終演算結果データ、つまり、全結合層Ｎｂの前段の処理による最終的な演算結果データとして出力する。 According to the arithmetic processing unit 100, the processing of all the coupling layers Nb performed after the processing of the intermediate layer Na is performed as follows. In other words, the arithmetic processing device 100 includes intermediate calculation result data obtained by a series of feature amount extraction processing by the convolution arithmetic processing unit 101, the activation processing unit 102, and the pooling processing unit 103, that is, intermediate data obtained by processing of the intermediate layer Na. Image data included in the same region w1 to w4 is sequentially read out as unit image data while scanning the search window W divided into the plurality of regions w1 to w4 with respect to the calculation result data. Then, the arithmetic processing apparatus 100 sequentially generates unit calculation result data by executing arithmetic processing by the convolution arithmetic processing unit 101 on the unit image data to be sequentially read. In addition, the arithmetic processing device 100 accumulates unit calculation result data that is sequentially generated to generate accumulated calculation result data. Then, the arithmetic processing device 100 repeats the process of storing the unit arithmetic result data and the cumulative arithmetic result data while alternately switching them to the storage unit 108 and the storage unit 109. Then, the arithmetic processing unit 100 finally selects the cumulative calculation result data stored in the storage unit 108 or the storage unit 109, and uses the cumulative calculation result data as the final calculation result data, that is, the previous stage of the all coupling layers Nb. This is output as final calculation result data by the above process.

即ち、演算処理装置１００によれば、中間層Ｎａを実現する回路の一部である畳み込み演算処理部１０１を利用して全結合層Ｎｂの処理を行うようにした。従って、中間層Ｎａを実現する回路と別個に全結合層Ｎｂを実現する回路を設ける必要がなく、ニューラルネットワークによる演算処理を実現する演算処理装置１００の全体の回路規模を小さくすることができる。 That is, according to the arithmetic processing unit 100, the processing of all the coupling layers Nb is performed using the convolution arithmetic processing unit 101 that is a part of the circuit that realizes the intermediate layer Na. Therefore, it is not necessary to provide a circuit for realizing the all coupling layer Nb separately from the circuit for realizing the intermediate layer Na, and the overall circuit scale of the arithmetic processing unit 100 that realizes the arithmetic processing by the neural network can be reduced.

また、演算処理装置１００は、複数の中間演算結果データから得られる複数の最終演算結果データを畳み込み演算処理部１０１の１演算サイクルにより処理される画素サイズと同じサイズの格納領域Ｒにまとめて格納する。このように、全結合層Ｎｂの前段の処理により得られる最終演算結果データを畳み込み演算処理部１０１の１演算サイクルにより処理される画素サイズに合わせて格納しておけば、後段の処理における積和演算を畳み込み演算処理部１０１により行うことができる。そのため、全結合層Ｎｂの後段の処理を実現するための回路を別個に設ける必要がなく、演算処理装置１００の全体の回路規模を抑えることができる。 In addition, the arithmetic processing device 100 stores a plurality of final calculation result data obtained from a plurality of intermediate calculation result data in a storage area R having the same size as the pixel size processed in one calculation cycle of the convolution calculation processing unit 101. To do. In this way, if the final calculation result data obtained by the preceding process of all the coupling layers Nb is stored in accordance with the pixel size processed in one calculation cycle of the convolution calculation processing unit 101, the product sum in the subsequent process is stored. The calculation can be performed by the convolution calculation processing unit 101. Therefore, it is not necessary to separately provide a circuit for realizing the subsequent processing of all the coupling layers Nb, and the entire circuit scale of the arithmetic processing device 100 can be suppressed.

（第２実施形態）
図１１に例示する演算処理装置２００は、畳み込みニューラルネットワークによる演算を実行する装置であり、畳み込み演算処理部２０１、活性化処理部２０２、プーリング処理部２０３、データ読み出し処理部２０４、加算処理部２０５、切替回路２０６、記憶部２０７、記憶部２０８、記憶部２０９、第１選択回路２１０、第２選択回路２１１などを備える。記憶部２０７、記憶部２０８、記憶部２０９は、例えば半導体メモリなどといった記憶媒体で構成される。また、記憶部２０７，２０８，２０９以外の構成要素は、ソフトウェアにより仮想的に実現してもよいし、ハードウェアにより構成してもよい。 (Second Embodiment)
An arithmetic processing device 200 illustrated in FIG. 11 is a device that performs arithmetic operations using a convolutional neural network, and includes a convolution arithmetic processing unit 201, an activation processing unit 202, a pooling processing unit 203, a data read processing unit 204, and an addition processing unit 205. , A switching circuit 206, a storage unit 207, a storage unit 208, a storage unit 209, a first selection circuit 210, a second selection circuit 211, and the like. The storage unit 207, the storage unit 208, and the storage unit 209 are configured by a storage medium such as a semiconductor memory. Further, the components other than the storage units 207, 208, and 209 may be virtually realized by software or may be configured by hardware.

この場合、演算処理装置２００は、中間層Ｎａを実現する回路として機能する状態と全結合層Ｎｂを実現する回路として機能する状態とに切り替えられるようになっている。また、畳み込み演算処理部２０１、活性化処理部２０２、プーリング処理部２０３、データ読み出し処理部２０４、加算処理部２０５、切替回路２０６は、それぞれ、畳み込み演算処理部１０１、活性化処理部１０２、プーリング処理部１０３、データ読み出し処理部１０４、加算処理部１０５、切替回路１０６に相当するものである。以下、異なる部分を説明する。 In this case, the arithmetic processing device 200 can be switched between a state that functions as a circuit that realizes the intermediate layer Na and a state that functions as a circuit that realizes the entire coupling layer Nb. In addition, the convolution operation processing unit 201, the activation processing unit 202, the pooling processing unit 203, the data read processing unit 204, the addition processing unit 205, and the switching circuit 206 are respectively a convolution operation processing unit 101, an activation processing unit 102, and a pooling. This corresponds to the processing unit 103, the data reading processing unit 104, the addition processing unit 105, and the switching circuit 106. Hereinafter, different parts will be described.

記憶部２０７、記憶部２０８、記憶部２０９は、何れも演算結果データ格納手段の一例として機能するものであり、中間演算結果データ、単位演算結果データ、累積演算結果データを格納することが可能な記憶媒体である。第１選択回路２１０は、第１選択手段の一例として機能するものであり、記憶部２０７、記憶部２０８、記憶部２０９のうち何れか１つを中間演算結果データ格納手段として選択する。第２選択回路２１１は、第２選択手段の一例として機能するものであり、記憶部２０７、記憶部２０８、記憶部２０９のうち残りの２つを第１記憶手段および第２記憶手段として選択する。そして、第２選択回路２１１は、第１記憶手段および第２記憶手段として選択した２つの記憶部を用いて演算結果データ選択手段として動作する。 The storage unit 207, the storage unit 208, and the storage unit 209 all function as an example of calculation result data storage means, and can store intermediate calculation result data, unit calculation result data, and cumulative calculation result data. It is a storage medium. The first selection circuit 210 functions as an example of a first selection unit, and selects any one of the storage unit 207, the storage unit 208, and the storage unit 209 as an intermediate calculation result data storage unit. The second selection circuit 211 functions as an example of a second selection unit, and selects the remaining two of the storage unit 207, the storage unit 208, and the storage unit 209 as a first storage unit and a second storage unit. . The second selection circuit 211 operates as calculation result data selection means using the two storage units selected as the first storage means and the second storage means.

次に、演算処理装置２００が全結合層Ｎｂを実現する回路として機能する場合における動作例について説明する。この場合、演算処理装置２００は、全結合層の前段の処理を多階層にわたって行う。即ち、演算処理装置２００は、全結合層の前段の処理を多層にわたって複数回繰り返すことにより、より高次元の全結合処理を行う。 Next, an operation example in the case where the arithmetic processing unit 200 functions as a circuit that realizes the all coupling layer Nb will be described. In this case, the arithmetic processing device 200 performs the processing of the previous stage of all the connection layers over multiple layers. In other words, the arithmetic processing unit 200 performs higher-order full connection processing by repeating the previous processing of the full connection layer a plurality of times over multiple layers.

演算処理装置２００は、全結合層Ｎｂの前段の処理を開始すると、第１層目では、中間層Ｎａから得られた中間演算結果データを記憶部２０７に格納する。即ち、演算処理装置２００は、記憶部２０７を中間演算結果データ格納手段として選択する。そして、演算処理装置２００は、第１層目で得られた最終演算結果データを記憶部２０８または記憶部２０９に格納する。最終演算結果データを記憶部２０８および記憶部２０９の何れに格納するのかは、適宜変更して設定することができる。例えば、中間演算結果データに対する走査回数、つまり探索窓Ｗの分割数が奇数である場合には最終演算結果データを記憶部２０８に格納し、偶数である場合には最終演算結果データを記憶部２０９に格納するように構成してもよい。また、奇数である場合には最終演算結果データを記憶部２０９に格納し、偶数である場合には最終演算結果データを記憶部２０８に格納するように構成してもよい。 When the arithmetic processing unit 200 starts the preceding process of all the coupling layers Nb, the arithmetic processing device 200 stores the intermediate operation result data obtained from the intermediate layer Na in the storage unit 207 in the first layer. That is, the arithmetic processing device 200 selects the storage unit 207 as an intermediate arithmetic result data storage unit. Then, the arithmetic processing device 200 stores the final calculation result data obtained in the first layer in the storage unit 208 or the storage unit 209. Whether the final calculation result data is stored in the storage unit 208 or the storage unit 209 can be appropriately changed and set. For example, if the number of scans for the intermediate calculation result data, that is, the number of divisions of the search window W is an odd number, the final calculation result data is stored in the storage unit 208, and if it is an even number, the final calculation result data is stored in the storage unit 209. You may comprise so that it may store. Further, the final calculation result data may be stored in the storage unit 209 when the number is odd, and the final calculation result data may be stored in the storage unit 208 when the number is even.

第１層目の処理により、記憶部２０８または記憶部２０９の何れかに最終演算結果データが格納される。そのため、演算処理装置２００は、第２層目の処理においては、その最終演算結果データを格納する記憶部を中間演算結果データ格納手段としてそのまま用いる。そして、演算処理装置２００は、中間演算結果データ格納手段としてそのまま用いられる記憶部以外の記憶部を第１記憶手段および第２記憶手段として選択する。そして、第２層目の処理により、第１記憶手段または第２記憶手段として選択されている記憶部の何れかに最終演算結果データが格納される。そのため、演算処理装置２００は、第３層目の処理においては、その最終演算結果データを格納する記憶部を中間演算結果データ格納手段としてそのまま用い、残りの２つの記憶部を第１記憶手段および第２記憶手段として用いる。このように、演算処理装置２００は、２つの記憶部２０７，２０８，２０９を各層ごとにローテーションさせながら全結合層Ｎｂの前段の処理を進める。 By the processing of the first layer, the final calculation result data is stored in either the storage unit 208 or the storage unit 209. Therefore, the arithmetic processing unit 200 uses the storage unit for storing the final operation result data as it is as the intermediate operation result data storage unit in the processing of the second layer. Then, the arithmetic processing unit 200 selects a storage unit other than the storage unit used as it is as the intermediate calculation result data storage unit as the first storage unit and the second storage unit. Then, by the processing of the second layer, final calculation result data is stored in either the storage unit selected as the first storage unit or the second storage unit. Therefore, in the processing of the third layer, the arithmetic processing device 200 uses the storage unit that stores the final calculation result data as it is as the intermediate calculation result data storage unit, and uses the remaining two storage units as the first storage unit and Used as second storage means. As described above, the arithmetic processing device 200 proceeds with the processing in the previous stage of all the coupling layers Nb while rotating the two storage units 207, 208, and 209 for each layer.

演算処理装置２００によれば、中間演算結果データ、単位演算結果データ、累積演算結果データを格納可能な３つの記憶部２０７，２０８，２０９を備える。そして、演算処理装置２００は、記憶部２０７，２０８，２０９のうち何れか１つを中間演算結果データ格納手段として選択し、残りの２つを第１記憶手段および第２記憶手段として選択して使用する。 The arithmetic processing device 200 includes three storage units 207, 208, and 209 that can store intermediate calculation result data, unit calculation result data, and cumulative calculation result data. Then, the arithmetic processing unit 200 selects any one of the storage units 207, 208, and 209 as intermediate calculation result data storage means, and selects the remaining two as first storage means and second storage means. use.

この構成によれば、今回の階層における最終演算結果データは、記憶部２０７，２０８，２０９のうち何れか１つに格納される。そのため、次の階層における処理では、その最終演算結果データを格納する記憶部を中間演算結果データ格納手段としてそのまま用いることができる。従って、全結合層の処理を多階層にわたって行う場合であっても、効率良く処理を進めることができる。また、このように３つの記憶部２０７，２０８，２０９をローテーションさせながら中間演算結果データ格納手段、第１記憶手段、第２記憶手段として使用する構成によれば、全結合層の処理を多階層にわたって行うための回路を別個に設ける必要がない。従って、演算処理装置２００の回路規模を大きくしなくとも、全結合層の処理を多階層にわたって行うことができる。
なお、演算処理装置２００は、演算処理装置１００と同様に、複数の中間演算結果データから得られる複数の最終演算結果データを畳み込み演算処理部２０１の１演算サイクルにより処理される画素サイズと同じサイズの格納領域にまとめてメモリ３００に格納する。 According to this configuration, the final calculation result data in the current hierarchy is stored in any one of the storage units 207, 208, and 209. Therefore, in the processing in the next hierarchy, the storage unit for storing the final calculation result data can be used as it is as intermediate calculation result data storage means. Therefore, even when the processing of all the connected layers is performed over multiple layers, the processing can be efficiently performed. Further, according to the configuration in which the three storage units 207, 208, and 209 are rotated as described above and used as the intermediate calculation result data storage unit, the first storage unit, and the second storage unit, the processing of all coupled layers is performed in multiple layers. There is no need to provide a separate circuit for the operation. Therefore, the processing of all coupled layers can be performed in multiple layers without increasing the circuit scale of the arithmetic processing unit 200.
Similar to the arithmetic processing device 100, the arithmetic processing device 200 is the same size as the pixel size processed by one arithmetic cycle of the convolution arithmetic processing unit 201 by convolving a plurality of final arithmetic result data obtained from a plurality of intermediate arithmetic result data. Are stored in the memory 300 together.

（第３実施形態）
図１２に例示するように、演算処理装置１００は、畳み込み演算処理部１０１と切替回路１０６との間に活性化処理部１０２およびプーリング処理部１０３を備える構成としてもよい。この構成によれば、加算処理部１０５、切替回路１０６、記憶部１０８，１０９、選択回路１１０からなる構成を中間層Ｎａにおける処理にも利用することができる。 (Third embodiment)
As illustrated in FIG. 12, the arithmetic processing device 100 may include an activation processing unit 102 and a pooling processing unit 103 between the convolution arithmetic processing unit 101 and the switching circuit 106. According to this configuration, the configuration including the addition processing unit 105, the switching circuit 106, the storage units 108 and 109, and the selection circuit 110 can be used for processing in the intermediate layer Na.

（第４実施形態）
図１３に例示するように、演算処理装置２００は、畳み込み演算処理部２０１と切替回路２０６との間に活性化処理部２０２およびプーリング処理部２０３を備える構成としてもよい。この構成によれば、加算処理部２０５、切替回路２０６、記憶部２０７，２０８，２０９、選択回路２１１からなる構成を中間層Ｎａにおける処理にも利用することができる。 (Fourth embodiment)
As illustrated in FIG. 13, the arithmetic processing device 200 may include an activation processing unit 202 and a pooling processing unit 203 between the convolution arithmetic processing unit 201 and the switching circuit 206. According to this configuration, the configuration including the addition processing unit 205, the switching circuit 206, the storage units 207, 208, and 209, and the selection circuit 211 can be used for processing in the intermediate layer Na.

（その他の実施形態）
本発明は、上述した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能である。 (Other embodiments)
The present invention is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the scope of the invention.

図面中、１００，２００は演算処理装置、１０１，２０１は畳み込み演算処理部（畳み込み演算手段、単位演算結果データ生成手段）、１０２，２０２は活性化処理部（活性化手段）、１０３，２０３はプーリング処理部（プーリング手段）、１０４，２０４はデータ読み出し処理部（単位画像データ読み出し手段）、１０５，２０５は加算処理部（累積演算結果データ生成手段）、１０６，２０６は切替回路（演算結果データ格納手段）、１０７，２０７，２０８，２０９は記憶部（中間演算結果データ格納手段）、１０８，１０９、２０７，２０８，２０９は記憶部（第１記憶手段）、１０８，１０９、２０７，２０８，２０９は記憶部（第２記憶手段）、１１０，２１１は選択回路（演算結果データ選択手段）、２０７，２０８，２０９は記憶部（演算結果データ格納手段）、２１０は第１選択回路（第１選択手段）、２１１は第２選択回路（第２選択手段）、３００はメモリ（最終演算結果データ格納手段）を示す。 In the drawing, 100 and 200 are arithmetic processing units, 101 and 201 are convolution arithmetic processing units (convolution arithmetic means, unit arithmetic result data generating means), 102 and 202 are activation processing units (activation means), and 103 and 203 are Pooling processing units (pooling means), 104 and 204 are data reading processing units (unit image data reading means), 105 and 205 are addition processing units (cumulative calculation result data generating means), and 106 and 206 are switching circuits (calculation result data). Storage means), 107, 207, 208, 209 are storage units (intermediate operation result data storage means), 108, 109, 207, 208, 209 are storage units (first storage means), 108, 109, 207, 208, 209 is a storage unit (second storage unit), 110 and 211 are selection circuits (calculation result data selection unit), and 207, 208, and 20 Is a storage unit (calculation result data storage means), 210 is a first selection circuit (first selection means), 211 is a second selection circuit (second selection means), and 300 is a memory (final calculation result data storage means). .

Claims

Arithmetic processing that includes a convolution operation means (101, 201), an activation means (102, 202), and a pooling means (103, 203) for extracting a feature amount included in the input image, and executes an operation by a convolutional neural network A device (100, 200),
Intermediate calculation result data storage means (107, 207, 208, 209) for storing intermediate calculation result data obtained by processing by the convolution calculation means, the activation means, and the pooling means;
A search window divided into a plurality of areas is set, and the image data included in the same area is moved while moving the search window with respect to the intermediate calculation result data stored in the intermediate calculation result data storage means. Unit image data reading means (104, 204) for sequentially reading as unit image data;
Unit calculation result data generation means (101, 201) for generating unit calculation result data by executing calculation processing by the convolution calculation means on the unit image data sequentially read by the unit image data reading means;
Accumulated operation result data generating means (105, 205) for generating accumulated operation result data by accumulating the unit operation result data sequentially generated by the unit operation result data generating means;
The unit calculation result data generated by the unit calculation result data generation means and the cumulative calculation result data generated by the cumulative calculation result data generation means are stored in the first storage means (108, 109, 207, 208, 209) and the second. Calculation result data storage means (106, 206) for alternately switching to storage means (108, 109, 207, 208, 209);
The unit calculation result data stored in the first storage means or the second storage means is selected and given to the cumulative calculation result data generation means, and stored in the first storage means or the second storage means. Calculation result data selection means (110, 211) for selecting the accumulated calculation result data and outputting as final calculation result data;
An arithmetic processing apparatus comprising:

Three calculation result data storage means (207, 208, 209) capable of storing the intermediate calculation result data, the unit calculation result data, and the cumulative calculation result data;
First selection means (210) for selecting any one of the calculation result data storage means as the intermediate calculation result data storage means;
Second selection means (211) for selecting the remaining two of the calculation result data storage means as the first storage means and the second storage means;
With
2. The second selection unit operates as the calculation result data selection unit using the two calculation result data storage units selected as the first storage unit and the second storage unit. The arithmetic processing unit described.

A final calculation result data storage unit (300) for storing the final calculation result data output by the calculation result data selection unit;
The plurality of final calculation result data obtained from a plurality of the intermediate calculation result data are collectively stored in a storage area having the same size as a pixel size processed by one calculation cycle of the convolution calculation means. The arithmetic processing apparatus according to 1 or 2.