JP7279921B2

JP7279921B2 - Neural network circuit device, neural network processing method, and neural network execution program

Info

Publication number: JP7279921B2
Application number: JP2019012505A
Authority: JP
Inventors: 啓貴中原
Original assignee: Tokyo Artisan Intelligence
Current assignee: Tokyo Artisan Intelligence
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2023-05-23
Anticipated expiration: 2039-01-28
Also published as: JP2020119462A; WO2020158760A1

Description

本発明は、ニューラルネットワーク回路装置、ニューラルネットワーク処理方法およびニューラルネットワークの実行プログラムに関する。 The present invention relates to a neural network circuit device, a neural network processing method, and a neural network execution program.

近年、ロボティクス、ＡＤＡＳ（advanced driver assistance system）、ドローン等の画像認識や自動翻訳などで注目を集める新方式として畳込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）（層間が全結合でないＮＮ）や再帰型ニューラルネットワーク（双方向伝搬）が登場している。
ニューラルネットワーク（Neural Network）は、「入力層」、「隠れ層」、「出力層」を有し、各層は複数の「ノード」が「エッジ」で結ばれる。「隠れ層」は、複数の層を持つことができ、特に深い隠れ層を持つものをディープニューラルネットワーク（ＤＮＮ：Deep Neural Network）と呼ぶ。畳込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）は、「隠れ層」が全結合層だけでなく畳込み層（Convolution Layer）とプーリング層（Pooling Layer）とを交互に２０層から３０層程度繰り返す多層の積層構造を有する。ＣＮＮは、畳込み、プーリングおよび全結合などの演算と、特徴マップとから構成される。 In recent years, robotics, ADAS (advanced driver assistance system), image recognition and automatic translation for drones, etc., have attracted attention as new methods such as convolutional neural networks (CNNs) (NNs where layers are not fully connected) and recursive neural networks. Neural networks (bi-directional propagation) are emerging.
A neural network has an "input layer," a "hidden layer," and an "output layer," and each layer has a plurality of "nodes" connected by "edges." A "hidden layer" can have a plurality of layers, and a network having a particularly deep hidden layer is called a deep neural network (DNN). Convolutional Neural Network (CNN) is a multi-layered “hidden layer” that alternately repeats not only a fully connected layer but also a convolution layer and a pooling layer for about 20 to 30 layers. It has a laminated structure of A CNN consists of operations such as convolution, pooling and full connections, and feature maps.

通常、ニューラルネットワークは、コンピュータ上で仮想的に構築され、ソフトウェアによって、各層の演算処理が実行される。しかし、ＣＮＮは、このように複雑な処理を実行する層が多数（２０層～３０層程度）重畳しているので、その演算量は膨大なものとなる。そのため、一般的な汎用のＣＰＵ等を用いる通常のコンピュータではなく、ＧＰＵ（Graphics Processing Unit）を用いた演算装置を用いてニューラルネットワークの演算処理を実行させる場合も多い。 Normally, a neural network is virtually constructed on a computer, and arithmetic processing of each layer is executed by software. However, in CNN, a large number of layers (approximately 20 to 30 layers) for executing such complicated processing are superimposed, so the amount of computation is enormous. Therefore, it is often the case that arithmetic processing of a neural network is executed using an arithmetic device using a GPU (Graphics Processing Unit) instead of an ordinary computer using a general-purpose CPU or the like.

特許文献１には、複数の処理ノードの各々に対し演算処理装置が有するメモリの部分領域を割り当てるメモリ制御手段と、複数の処理ノードから、演算処理を実行すべき処理ノードを順次に指定する指定手段と、複数の処理ノードによる演算処理を、指定された順番で実行させる実行制御手段とを備える演算処理装置が記載されている。特許文献１に記載の演算処理装置は、メモリ制御手段が、演算結果のデータ量に対応したサイズのメモリ領域を単位としてデータの書き込み先を前記部分領域内で循環させながら、実行制御手段によって得られた各処理ノードの演算結果を割り当てられた部分領域に書き込むことにより、部分領域をリングバッファとして利用する。 Patent Document 1 discloses a memory control means for allocating a partial area of a memory possessed by an arithmetic processing unit to each of a plurality of processing nodes, and a specification for sequentially designating a processing node to execute arithmetic processing from among the plurality of processing nodes. means and an execution control means for causing arithmetic processing by a plurality of processing nodes to be executed in a specified order. In the arithmetic processing device described in Patent Document 1, the memory control means circulates the data write destination within the partial area in units of memory areas having a size corresponding to the data amount of the calculation result, and the execution control means obtains the data. The partial area is used as a ring buffer by writing the operation result of each processing node assigned to the allocated partial area.

特許文献２には、受け付けられた画像に対して、畳込処理を行う複数の畳込処理手段と、前記畳込処理手段による処理結果に対して、整流処理を行う複数の整流処理手段と、前記整流処理手段による処理結果に対して、正規化処理を行う複数の正規化処理手段と、前記正規化処理手段による処理結果に対して、サブサンプリング処理を行うことによって、前記画像の特徴量を抽出する複数の特徴抽出手段と、前記特徴抽出手段によって抽出された特徴量に基づいて、前記画像を識別する識別手段を備える画像処理装置が記載されている。特許文献２に記載の画像処理装置は、前記正規化処理手段が、２つの前記整流処理手段による処理結果間の差を、該処理結果に基づいた値を除数として除算することによって正規化する。 Patent Document 2 describes a plurality of convolution processing means for performing convolution processing on an accepted image, a plurality of rectification processing means for performing rectification processing on the processing results of the convolution processing means, a plurality of normalization processing means for performing normalization processing on the processing result of the rectification processing means; An image processing apparatus is described which includes a plurality of feature extracting means for extracting, and an identifying means for identifying the image based on the feature quantity extracted by the feature extracting means. In the image processing device described in Patent Document 2, the normalization processing means normalizes the difference between the processing results of the two rectification processing means by dividing the difference by a value based on the processing result as a divisor.

既存のＣＮＮは、短精度（多ビット）による積和演算回路で構成されており、多数の乗算回路が必要である。このため、面積・消費電力が多大になる欠点があった。そこで、２値化した精度、すなわち＋１と－１（または０と１）のみ用いてＣＮＮを構成する回路が提案されている（例えば、非特許文献１～４参照）。 The existing CNN is composed of short-precision (multi-bit) product-sum operation circuits, and requires a large number of multiplication circuits. For this reason, there was a drawback that the area and power consumption were large. Therefore, a circuit has been proposed that configures a CNN using only binarized precision, that is, +1 and -1 (or 0 and 1) (see Non-Patent Documents 1 to 4, for example).

非特許文献１～４の技術では、精度を２値に落とすことでＣＮＮの認識精度も落としてしまう。これを避けて２値化ＣＮＮの精度を維持するためには、バッチ正規化回路が必要である。 In the techniques of Non-Patent Documents 1 to 4, the recognition accuracy of the CNN is also lowered by lowering the accuracy to binary. To avoid this and maintain the accuracy of the binarized CNN, a batch normalization circuit is required.

特許第５１８４８２４号公報Japanese Patent No. 5184824 特許第５７７２４４２号公報Japanese Patent No. 5772442

M. Courbariaux, I. Hubara, D. Soudry, R.E.Yaniv, Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1," Computer Research Repository (CoRR)、「２値化ＮＮのアルゴリズム」、[online]、２０１６年３月、［平成３０年１０月５日検索］、<URL:http:// arxiv.org/pdf/1602.02830v3.pdf >M. Courbariaaux, I. Hubara, D. Soudry, R.E. Yaniv, Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1," Computer Research Repository (CoRR), "2 Algorithm of Valued NN", [online], March 2016, [searched on October 5, 2018], <URL:http://arxiv.org/pdf/1602.02830v3.pdf> Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,”Computer Vision and Pattern recognition、「２値化ＮＮのアルゴリズム」、[online]、２０１６年３月、［平成３０年１０月５日検索］、<URL: https://arxiv.org/pdf/1603.05279v4 >Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi, "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks," Computer Vision and Pattern recognition, "Algorithms for Binarized Neural Networks," [online], March 2016, [Retrieved on October 5, 2018], <URL: https://arxiv.org/pdf/1603.05279v4> Hiroki Nakahara, Haruyoshi Yonekawa, Tsutomu Sasao, Hisashi Iwamoto and Masato Motomura, ” A Memory-Based Realization of a Binarized Deep Convolutional Neural Network,” Proc. of the 2016 International Conference on Field-Programmable Technology (FPT), Xi'an, China, Dec 2016 (To Appear).Hiroki Nakahara, Haruyoshi Yonekawa, Tsutomu Sasao, Hisashi Iwamoto and Masato Motomura, ”A Memory-Based Realization of a Binarized Deep Convolutional Neural Network,” Proc. of the 2016 International Conference on Field-Programmable Technology (FPT), Xi'an, China, Dec 2016 (To Appear). Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, Debbie Marr,”Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC,” Proc. of the 2016 International Conference on Field-Programmable Technology (FPT), Xi'an, China, Dec 2016 (To Appear).Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, Debbie Marr,”Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC,” Proc. of the 2016 International Conference on Field-Programmable Technology (FPT) ), Xi'an, China, Dec 2016 (To Appear).

ＣＮＮでは、物体と位置を認識するために、観測画像と重みマトリクスの畳込み演算を多層に渡り膨大な回数実行する必要がある。このため、コンピュータパワーと計算時間、重みマトリクス（パラメータ）を格納するメモリ容量が膨大となるという課題があった。また、大量の積和演算とそのパラメータを保持するため、組込み機器で実現することが難しい。 In CNN, in order to recognize an object and its position, it is necessary to perform convolution operations of observed images and weight matrices over multiple layers an enormous number of times. For this reason, there is a problem that the computer power, the calculation time, and the memory capacity for storing the weight matrix (parameters) are enormous. Moreover, since a large amount of sum-of-products operations and their parameters are stored, it is difficult to implement in embedded devices.

本発明は、このような事情に鑑みてなされたものであり、認識精度を落とすことなく畳込み計算の時間短縮とメモリの削減が可能なニューラルネットワーク回路装置、ニューラルネットワーク処理方法およびニューラルネットワークの実行プログラムを提供することを課題とする。 SUMMARY OF THE INVENTION The present invention has been made in view of such circumstances, and a neural network circuit device, neural network processing method, and execution of a neural network capable of reducing convolution calculation time and memory without degrading recognition accuracy. The task is to provide a program.

前記した課題を解決するため、本発明に係るニューラルネットワーク回路装置は、入力層、１以上の中間層、および、出力層を少なくとも含むニューラルネットワーク回路装置であって、前記中間層は、畳込み演算を実行する畳込み演算回路と、雑音畳込み演算を実行する雑音畳込み演算回路と、を備え、前記雑音畳込み演算回路は、畳込みを行う入力値Ｘと、雑音を受け取り、前記入力値Ｘに前記雑音を乗せる演算回路と、重みＷを受け取り、前記演算回路の雑音畳込み演算値と前記重みＷを乗算する乗算回路と、を備え、
前記雑音は、
期待値：Ｅ（ｎ _ｃ）＝Ｅ（Σ _Ｎｃ ε _ｉＷ _ｉ ’）＝０、かつ、
分散：Ｅ（ｎ _ｃ ^２）＝Ｅ（Σ _Ｎｃ ε _ｉＷ _ｉ ’） ^２＝２σ ^２ δ’
となる雑音ｎ _ｃを用いる場合、
前記畳込み演算回路による畳込み演算の出力ｙ’と、
前記雑音畳込み演算回路による雑音畳込み演算による出力ｙ’とは、下記式で示され、統計的に等価である

ことを特徴とする。 In order to solve the above-described problems, a neural network circuit device according to the present invention is a neural network circuit device including at least an input layer, one or more intermediate layers, and an output layer, wherein the intermediate layer performs a convolution operation . and a noise convolution circuit for performing a noise convolution operation, the noise convolution circuit receiving an input value X to be convolved and noise, and receiving the input value An arithmetic circuit that adds the noise to X, and a multiplier circuit that receives the weight W and multiplies the noise convoluted value of the arithmetic circuit and the weight W ,
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y′ of the convolution operation by the convolution operation circuit;
The output y' by the noise convolution operation by the noise convolution operation circuit is represented by the following formula and is statistically equivalent

It is characterized by

また、本発明に係るニューラルネットワーク回路装置は、入力層、１以上の中間層、および、出力層を少なくとも含むニューラルネットワーク回路装置であって、前記中間層は、ｎ（ｎは任意の自然数）層を有し、ｋ（ｋは、ｎ－ｋを満たす自然数）層において、畳込み演算を実行する畳込み演算回路と、（ｎ－ｋ）層において、雑音畳込み演算を実行する雑音畳込み演算回路と、を備え、前記雑音畳込み演算回路は、畳込みを行う入力値Ｘと、雑音を受け取り、前記入力値Ｘに前記雑音を乗せる演算回路と、重みＷを受け取り、前記演算回路の雑音畳込み演算値と前記重みＷを乗算する乗算回路と、を備え、
前記雑音は、
期待値：Ｅ（ｎ _ｃ）＝Ｅ（Σ _Ｎｃ ε _ｉＷ _ｉ ’）＝０、かつ、
分散：Ｅ（ｎ _ｃ ^２）＝Ｅ（Σ _Ｎｃ ε _ｉＷ _ｉ ’） ^２＝２σ ^２ δ’
となる雑音ｎ _ｃを用いる場合、
前記畳込み演算回路による畳込み演算の出力ｙ’と、
前記雑音畳込み演算回路による雑音畳込み演算による出力ｙ’とは、下記式で示され、統計的に等価である

ことを特徴とする。
その他の手段については、発明を実施するための形態のなかで説明する。 Further, a neural network circuit device according to the present invention is a neural network circuit device including at least an input layer, one or more intermediate layers, and an output layer, wherein the intermediate layers are n layers (n is any natural number). and a convolution circuit that performs convolution operation in k (k is a natural number that satisfies nk) layer, and a noise convolution operation that performs noise convolution operation in (nk) layer a circuit, wherein the noise convolution arithmetic circuit receives an input value X to be convolved, an arithmetic circuit that receives noise and adds the noise to the input value X, a weight W, and the noise of the arithmetic circuit a multiplication circuit that multiplies the convolution operation value and the weight W ,
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y′ of the convolution operation by the convolution operation circuit;
The output y' by the noise convolution operation by the noise convolution operation circuit is represented by the following formula and is statistically equivalent

It is characterized by
Other means are described in the detailed description.

本発明によれば、認識精度を落とすことなく畳込み計算の時間短縮とメモリの削減が可能なニューラルネットワーク回路装置、ニューラルネットワーク処理方法およびニューラルネットワークの実行プログラムを提供することができる。 According to the present invention, it is possible to provide a neural network circuit device, a neural network processing method, and a neural network execution program capable of shortening the convolution calculation time and reducing the memory without lowering the recognition accuracy.

本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路のブロック図である。3 is a block diagram of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の基本原理を説明するＣＮＮの概略構成図であり、（ａ）はそのＣＮＮのモジュール構成図、（ｂ）はその学習後のパラメータ（重み）を統計的に示すヒストグラム、（ｃ）はその学習後のパラメータ（重み）を統計的に示すヒストグラム、（ｄ）はその学習後のパラメータ（重み）を統計的に示すヒストグラムである。1 is a schematic configuration diagram of a CNN for explaining the basic principle of the present invention, (a) is a module configuration diagram of the CNN, (b) is a histogram statistically showing parameters (weights) after learning, (c) is A histogram statistically showing parameters (weights) after learning, and (d) is a histogram statistically showing parameters (weights) after learning. 本発明の基本原理を説明する雑音のヒストグラムを示す図であり、（ａ）は雑音を統計的に示す図、（ｂ）は（ａ）の雑音を統計的に解析したヒストグラムである。FIG. 2 is a diagram showing histograms of noise for explaining the basic principle of the present invention, where (a) is a diagram statistically showing noise, and (b) is a histogram obtained by statistically analyzing the noise in (a). 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の構成を示す図である。4 is a diagram showing the configuration of a noise convolution circuit included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路と比較例の畳込み演算回路とを比較して示す図であり、（ａ）は比較例の畳込み演算回路の概略構成図、（ｂ）は雑音畳込み演算回路の概略構成図である。FIG. 2 is a diagram showing a comparison between a noise convolution circuit included in a neural network circuit according to an embodiment of the present invention and a convolution circuit of a comparative example, and (a) is a schematic configuration diagram of the convolution circuit of the comparative example; , (b) is a schematic configuration diagram of a noise convolution arithmetic circuit; 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路と比較例の畳込み演算回路との作用効果を比較して示す図であり、（ａ）は比較例の畳込み演算回路の雑音を加えなかった場合の入力画像と畳込みを説明する図、（ｂ）は雑音畳込み演算回路の雑音を加える場合の入力画像と１×１畳込みを説明する図である。FIG. 4 is a diagram showing a comparison of the effects of a noise convolution circuit included in a neural network circuit according to an embodiment of the present invention and a convolution circuit of a comparative example; FIG. 10B is a diagram for explaining an input image and convolution when noise is not added, and FIG. 1B is a diagram for explaining an input image and 1×1 convolution when noise is added in the noise convolution arithmetic circuit; 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路の特徴を説明する図であり、（ａ）は入力画像の各画素の値とその度数分布を示すヒストグラム、（ｂ）はＣＮＮの１層目の畳込み層の出力の各画素の値とその度数分布を示すヒストグラムである。FIG. 2 is a diagram for explaining features of a noise convolution arithmetic circuit included in a neural network circuit according to an embodiment of the present invention, (a) is a histogram showing the value of each pixel of an input image and its frequency distribution, and (b) is a CNN 3 is a histogram showing the value of each pixel of the output of the first convolutional layer of , and its frequency distribution. 本発明の実施形態に係るニューラルネットワーク回路の入力画像に適用した場合の雑音畳込みＣＮＮと既存畳込みＣＮＮのVGG16モデルを用いて画像認識タスクCIFAR-10およびCIFAR-100を学習させた結果の認識精度を説明する図である。Recognition of results of learning image recognition tasks CIFAR-10 and CIFAR-100 using noise convolution CNN and VGG16 model of existing convolution CNN when applied to input image of neural network circuit according to the embodiment of the present invention It is a figure explaining precision. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込みＣＮＮの構成を示すブロック図である。3 is a block diagram showing the configuration of a noise convolution CNN included in the neural network circuit according to the embodiment of the present invention; FIG. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込みＣＮＮの畳込みＣＮＮ層数ｋと、認識精度(%)、ratioおよび全層の重み(MB)との実験結果を表にして示す図である。FIG. 4 is a table showing experimental results of the convolution CNN layer number k of the noise convolution CNN provided in the neural network circuit according to the embodiment of the present invention, the recognition accuracy (%), the ratio, and the weight of all layers (MB). is. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込みＣＮＮの学習時間と推論時間を表にして示す図である。FIG. 5 is a table showing learning time and inference time of the noise convolution CNN included in the neural network circuit according to the embodiment of the present invention; 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路をFPGA上に実装し、畳込み演算を既存畳込みＣＮＮと比較した結果を表にして示す図である。FIG. 10 is a diagram showing a table showing the result of comparing the noise convolution operation circuit included in the neural network circuit according to the embodiment of the present invention on an FPGA and comparing the convolution operation with the existing convolution CNN. 本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込みＣＮＮの実装例を説明する図である。It is a figure explaining the implementation example of the noise convolution CNN with which the neural network circuit based on embodiment of this invention is provided. ＣＮＮのモジュール構成図である。It is a module block diagram of CNN. ディープニューラルネットワーク（ＤＮＮ）の構造の一例を説明する図である。It is a figure explaining an example of the structure of a deep neural network (DNN). 比較例のニューラルネットワーク回路の構成の一例を示す図である。It is a figure which shows an example of a structure of the neural network circuit of a comparative example. 比較例の畳込み演算回路の構成を示す図である。FIG. 5 is a diagram showing the configuration of a convolution circuit of a comparative example; 比較例の畳込み演算回路の構成を示す図である。FIG. 5 is a diagram showing the configuration of a convolution circuit of a comparative example; 比較例の畳込み演算回路の構成を示す図である。FIG. 5 is a diagram showing the configuration of a convolution circuit of a comparative example; 比較例の畳込み演算回路の構成を示す図である。FIG. 5 is a diagram showing the configuration of a convolution circuit of a comparative example; 比較例の畳込み演算回路の構成を示す図である。FIG. 5 is a diagram showing the configuration of a convolution circuit of a comparative example;

以下、図面を参照して本発明を実施するための形態（以下、「本実施形態」という）におけるディープニューラルネットワークについて説明する。
（背景説明）
［ＣＮＮ］
図２３は、ＣＮＮのモジュール構成図である。画像認識のための深層ＣＮＮのアーキテクチャを例に採る。
図２３に示すＣＮＮは、入力層ＩＮＰＵＴと、入力ボリュームを出力ボリュームに変換する畳込み層Ｃ１～Ｃ５と、全結合層Ａ１～Ａ２と、出力層ＯＵＴＰＵＴとの積層体から構成される。
入力層ＩＮＰＵＴは、例えばラスタスキャンされた画像データであり、ＣＮＮ演算への入力データとなる検出対象画像データ（ここではＣＡＲの正面画像）を入力する。
ＣＮＮの各層Ｃ１～Ｃ５，Ａ１～Ａ２は、幅、高さおよび奥行きの３次元的に配列されたニューロンを有する。 Hereinafter, a deep neural network in a mode for carrying out the present invention (hereinafter referred to as "this embodiment") will be described with reference to the drawings.
(Background explanation)
[CNN]
FIG. 23 is a module configuration diagram of CNN. Take the architecture of deep CNN for image recognition as an example.
The CNN shown in FIG. 23 is composed of a laminate of an input layer INPUT, convolutional layers C1 to C5 for converting an input volume into an output volume, fully connected layers A1 to A2, and an output layer OUTPUT.
The input layer INPUT is, for example, raster-scanned image data, and inputs detection target image data (here, the front image of CAR) which is input data to the CNN calculation.
Each layer C1-C5, A1-A2 of the CNN has neurons three-dimensionally arranged in width, height and depth.

畳込み層Ｃ１～Ｃ５および全結合層Ａ１の内部のニューロンは、１つ前の層の受容野と呼ばれる小領域のノードのみに接続されている。
図２３に示すＣＮＮは、重みを持つ７つの層、すなわち畳込み層Ｃ１～Ｃ５と、完全に接続された全結合層Ａ１～Ａ２とを含む。全結合層Ａ１～Ａ２には、ドロップアウトが含まれる。
畳込み層Ｃ１～Ｃ５と全結合層Ａ１のニューロンは、前の層の受容野に接続され、全結合層Ａ２のニューロンは、前の層の全てのニューロンに接続されている。 The neurons inside the convolutional layers C1-C5 and the fully connected layer A1 are connected only to the nodes of a small area called the receptive field of the previous layer.
The CNN shown in FIG. 23 includes seven layers with weights: convolutional layers C1-C5 and fully connected fully connected layers A1-A2. All connected layers A1-A2 include dropouts.
The neurons of the convolutional layers C1-C5 and the fully connected layer A1 are connected to the receptive fields of the previous layer, and the neurons of the fully connected layer A2 are connected to all neurons of the previous layer.

畳込み層Ｃ１～Ｃ５の後にはバッチ正規化層が配置される。各バッチ正規化層の後には、それぞれプーリング（例えば、最大プーリング）を実行するプーリング層が配置される。 A batch normalization layer is placed after the convolutional layers C1-C5. Each batch normalization layer is followed by a pooling layer that performs respective pooling (eg, max pooling).

<畳込み層の処理内容>
図２３に示すＣＮＮでは、隠れ層は、畳込み層Ｃ１～Ｃ５とプーリング層とから構成される。畳込み層Ｃ１～Ｃ５は、前の層で近くにあるノードにフィルタ処理して特徴マップを生成する。
畳込み層Ｃ１～Ｃ５の入力は、縦長のサイズがＳ×Ｓ画素のＮ枚（Ｎチャンネル）の形式となっている。ＣＮＮにおいて、最初の入力層（畳込み層Ｃ１）のチャンネル数は、入力画像がグレースケールの場合はＮ＝１となり、カラーの場合はＮ＝３（ＲＧＢの３チャンネル）となる。
畳込み層Ｃ１～Ｃ５では、入力にフィルタ（以下、「カーネル」という）を畳み込む計算が実行される。具体的には、各チャンネルｋ（ｋ＝１～Ｎ）の入力のサイズＳ×Ｓの画素に、Ｌ×Ｌのサイズの２次元フィルタを畳込み、その結果を全チャンネルｋ＝１～Ｎにわたって加算する。この計算結果は、１チャンネルの画像ｕｉｊの形式となる。 <Processing content of the convolutional layer>
In the CNN shown in FIG. 23, the hidden layers are composed of convolutional layers C1 to C5 and pooling layers. The convolutional layers C1-C5 filter to nodes that are nearby in the previous layers to generate feature maps.
The inputs to the convolutional layers C1 to C5 are in the format of N sheets (N channels) each having a lengthwise size of S×S pixels. In CNN, the number of channels in the first input layer (convolutional layer C1) is N=1 if the input image is grayscale, and N=3 (3 RGB channels) if the input image is color.
In the convolution layers C1-C5, calculations are performed to convolve filters (hereinafter referred to as "kernels") to the inputs. Specifically, the input size S×S pixels of each channel k (k=1 to N) are convolved with a two-dimensional filter of size L×L, and the result is applied to all channels k=1 to N. to add. The result of this calculation is in the form of a one-channel image uij.

<プーリング層の処理内容>
プーリング層は、畳込み層Ｃ１～Ｃ５と対で存在する。畳込み層Ｃ１～Ｃ５の出力はプーリング層への入力となる。
図２３に示すＣＮＮでは、プーリング層は、畳込み層Ｃ１～Ｃ５から出力された特徴マップを、さらに縮小して新たな特徴マップを生成する。
プーリング層の目的は、画像のどの位置でフィルタの応答が強かったかという情報を一部捨てて、特徴の微少な変化に対する応答の不変性を実現する。ＣＮＮにおいて、平均プーリングまたは最大プーリングのいずれを採用してもよい。最大プーリングでは、画像中の一定の領域内の最大値を取る手法で、画像の多少のずれを吸収し、画像の位置移動に対する移動不変性を獲得する。
プーリング層の局所受容野のサイズは、畳込み層Ｃ１～Ｃ５のそれ（フィルタサイズ）と無関係に設定される。
なお、プーリング層では、畳込み層Ｃ１～Ｃ５と異なり、学習によって変化する重みは存在せず、活性化関数も適用されない。 <Processing content of the pooling layer>
The pooling layers exist in pairs with the convolutional layers C1-C5. The outputs of the convolutional layers C1-C5 are the inputs to the pooling layers.
In the CNN shown in FIG. 23, the pooling layer further reduces the feature maps output from the convolutional layers C1 to C5 to generate new feature maps.
The purpose of the pooling layer is to achieve invariance of the response to small feature changes by throwing away some information about where in the image the filter response was strong. In CNN, either average pooling or max pooling may be employed. In the max pooling, a method of taking the maximum value within a certain area in the image is used to absorb some deviation of the image and to obtain movement invariance with respect to the positional movement of the image.
The size of the local receptive field of the pooling layer is set independently of that (filter size) of the convolutional layers C1-C5.
Note that in the pooling layer, unlike the convolutional layers C1 to C5, there are no weights that change due to learning, and no activation function is applied.

<正規化線形ユニット（Rectified Linear Unit）>
図２３に示すＣＮＮでは、畳込み層Ｃ１～Ｃ５と全結合層Ａ１～Ａ２のための非線形マッピング関数として、正規化線形ユニット（Rectified Linear Unit）と呼ばれる区分線形な活性化関数を使用する。 <Rectified Linear Unit>
The CNN shown in FIG. 23 uses piecewise linear activation functions called Rectified Linear Units as nonlinear mapping functions for the convolutional layers C1-C5 and the fully connected layers A1-A2.

図２３に示すＣＮＮについて詳細に説明する。
図２３は、第１畳込み層Ｃ１～第４畳込み層Ｃ４の特徴数が３、第５畳込み層Ｃ５の特徴数が１のＣＮＮの例を示している。画像データ１００１は、ＣＮＮ演算への入力データであり、例えばラスタスキャンされた画像データである。参照画像領域１００２は、コンボリューションフィルタ（特徴抽出フィルタ）（後記）のコンボリューション演算に必要な参照画像領域である。
第１畳込み層Ｃ１～第５畳込み層Ｃ５は、特徴マップを生成する。特徴マップは、所定の特徴抽出フィルタ（コンボリューション演算の累積和および非線形処理）で前階層のデータを走査しながら演算して得られた処理結果を示す画像データ面である。特徴マップは、ラスタスキャンされた画像データに対する検出結果であるため、検出結果も面で表される。 The CNN shown in FIG. 23 will be described in detail.
FIG. 23 shows an example of a CNN in which the first to fourth convolutional layers C1 to C4 have three features, and the fifth convolutional layer C5 has one feature. The image data 1001 is input data to the CNN calculation, and is raster-scanned image data, for example. A reference image region 1002 is a reference image region necessary for convolution calculation of a convolution filter (feature extraction filter) (described later).
The first convolutional layer C1 to the fifth convolutional layer C5 generate feature maps. A feature map is an image data surface showing a processing result obtained by performing an operation while scanning data of the previous layer with a predetermined feature extraction filter (cumulative sum of convolution operation and non-linear processing). Since the feature map is the detection result for raster-scanned image data, the detection result is also represented by a plane.

第１畳込み層Ｃ１は、サイズが５×５×３の６４個のカーネル（kernel）により、２画素のストライドで５６×５６×３の入力画像（ＡＧＥ画像）をフィルタリングする。ストライド（歩幅）は、カーネルマップ内で隣接するニューロンの受容野の中心間の距離である。ストライドは、すべての畳込み層において１ピクセルに設定されている。
第１畳込み層Ｃ１では、入力される画像データを例えばラスタスキャンにより所定サイズごとに走査する。走査したデータに対して、重みマトリクスの畳込み演算とプーリング処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。第１畳込み層Ｃ１では、例えば水平方向に延びる線状の特徴量や斜め方向に延びる線状の特徴量などといった比較的シンプルな単独の特徴量を抽出する。また、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 The first convolutional layer C1 filters a 56×56×3 input image (AGE image) with a stride of 2 pixels by 64 kernels of size 5×5×3. The stride is the distance between the receptive field centers of adjacent neurons in the kernel map. The stride is set to 1 pixel in all convolutional layers.
In the first convolutional layer C1, input image data is scanned, for example, by raster scanning at predetermined size intervals. A plurality of feature amounts included in the input image are extracted by performing weight matrix convolution and pooling processing on the scanned data. In the first convolutional layer C1, a relatively simple single feature quantity such as a linear feature quantity extending in the horizontal direction or a linear feature quantity extending in an oblique direction is extracted. Also, a plurality of feature maps corresponding to the plurality of features included in the input image are generated.

第２畳込み層Ｃ２の入力は、バッチ正規化および最大プールされた第１畳込み層Ｃ１の出力である。第２畳込み層Ｃ２は、サイズが３×３×６４である１２８のカーネルで入力をフィルタリングする。
第１畳込み層Ｃ２では、第１畳込み層Ｃ１から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。走査したデータに対して、重みマトリクスの畳込み演算とプーリング処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。第２畳込み層Ｃ１では、第１畳込み層Ｃ１で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。また、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 The input of the second convolutional layer C2 is the batch normalized and max pooled output of the first convolutional layer C1. The second convolutional layer C2 filters the input with 128 kernels of size 3x3x64.
In the first convolutional layer C2, the input data input from the first convolutional layer C1 is scanned for each predetermined size by raster scanning, for example. A plurality of feature amounts included in the input image are extracted by performing weight matrix convolution and pooling processing on the scanned data. In the second convolutional layer C1, a higher-dimensional composite feature quantity is extracted by integrating the plurality of feature quantities extracted in the first convolutional layer C1 while considering the spatial positional relationship and the like. . Also, a plurality of feature maps corresponding to the plurality of features included in the input image are generated.

第３畳込み層Ｃ３は、サイズが３×３×６４である１２８のカーネルを有し、第２層畳込み層Ｃ２（バッチ正規化とＭＡＸプーリング）の出力に接続されている。第３畳込み層Ｃ３では、第２層畳込み層Ｃ２で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。また、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 The third convolutional layer C3 has 128 kernels of size 3×3×64 and is connected to the output of the second convolutional layer C2 (batch normalization and MAX pooling). The third convolutional layer C3 extracts a higher-dimensional composite feature quantity by integrating the plurality of feature quantities extracted in the second convolutional layer C2 while considering the spatial positional relationship. do. Also, a plurality of feature maps corresponding to the plurality of features included in the input image are generated.

第４畳込み層Ｃ４は、サイズが３×３×１２８である１２８のカーネルを備えている。
第５畳込み層Ｃ５は、サイズが３×３×１２８である１２８のカーネルを備えている。
このように、図２３に示すＣＮＮでは、畳込み層Ｃ１～Ｃ５による処理を繰り返すことで入力画像データＤ１に含まれる種々の特徴量を高次元で抽出していく。そして、畳込み層Ｃ１～Ｃ５の処理により得られた結果を中間演算結果データとして全結合層Ａ１に出力する。 The fourth convolutional layer C4 comprises 128 kernels of size 3x3x128.
The fifth convolutional layer C5 comprises 128 kernels of size 3x3x128.
In this manner, in the CNN shown in FIG. 23, various feature amounts included in the input image data D1 are extracted at high dimensions by repeating the processing by the convolution layers C1 to C5. Then, the results obtained by the processing of the convolution layers C1 to C5 are output to the fully connected layer A1 as intermediate operation result data.

このように、ＣＮＮ演算では、複数のフィルタカーネルを画素単位で走査しながら積和演算を繰り返し、最終的な積和結果を非線形変換することで特徴マップが生成される。特徴マップは、コンボリューションフィルタの出力を累積加算し、最後に非線形変換処理することによって得る。 In this way, in the CNN calculation, a feature map is generated by repeating product-sum calculation while scanning a plurality of filter kernels pixel by pixel, and non-linearly transforming the final product-sum result. A feature map is obtained by cumulatively adding the outputs of the convolution filters and finally performing nonlinear transformation processing.

図２３に示すＣＮＮでは、畳込み層Ｃ１～Ｃ５は、画像の局所的な特徴を抽出し、プーリング層は、局所的な特徴をまとめあげる。入力画像の特徴を維持しながら画像を縮小処理することで、画像の持つ情報量を大幅に圧縮できる。 In the CNN shown in FIG. 23, the convolutional layers C1-C5 extract local features of the image, and the pooling layers aggregate the local features. By reducing the image while maintaining the features of the input image, the amount of information in the image can be significantly compressed.

第５畳込み層Ｃ５のプーリング層の後に全結合層Ａ１が配置される。
隣接層間のノードをすべて結合した全結合層Ａ１～Ａ２は、それぞれ１０２４のニューロンを備えている。
全結合層Ａ１～Ａ２は、第５畳込み層Ｃ５から得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。詳細には、全結合層Ａ１～Ａ２は、第５畳込み層Ｃ５から得られる複数の中間演算結果データを結合し、さらに、その結合結果に対して重み係数を異ならせながら積和演算を行うことにより、最終的な演算結果データ、すなわち、入力データである画像データに含まれる検出対象物を認識した画像データを出力する。このとき、積和演算による演算結果の値が大きい部分が検出対象物の一部または全部として認識される。 A fully connected layer A1 is placed after the pooling layer of the fifth convolutional layer C5.
Fully connected layers A1 to A2 connecting all nodes between adjacent layers each have 1024 neurons.
The fully connected layers A1-A2 combine a plurality of intermediate operation result data obtained from the fifth convolutional layer C5 and output final operation result data. Specifically, the fully-connected layers A1 and A2 combine a plurality of intermediate operation result data obtained from the fifth convolution layer C5, and further perform a sum-of-products operation on the combined result while varying the weight coefficients. As a result, the final calculation result data, that is, the image data obtained by recognizing the detection object included in the image data which is the input data, is output. At this time, a part with a large value of the calculation result of the sum-of-products calculation is recognized as part or all of the object to be detected.

出力層ＯＵＴＰＵＴは、全結合層Ａ２と完全に接続された最終層である。出力層ＯＵＴＰＵＴには、予め設定された分類クラス数と同数の最終ノードが含まれる。入力画像のクラス分類を目的とする場合は、分類クラス数と同数のノードが出力層ＯＵＴＰＵＴに配置され、出力層ＯＵＴＰＵＴの活性化関数には「ソフトマックス関数」が用いられる。
図２３に示すように、検出対象画像データ（ＣＡＲの正面画像）が入力された場合、最終的な処理結果として、この画像データに含まれる検出対象物体「ＣＡＲ」「ＴＲＵＣＫ」「ＶＡＮ」…「ＢＩＣＹＣＬＥ」の画像認識結果を出力する。 The output layer OUTPUT is the final layer fully connected with the fully connected layer A2. The output layer OUTPUT includes the same number of final nodes as the preset number of classification classes. For the purpose of class classification of an input image, the same number of nodes as the number of classification classes are arranged in the output layer OUTPUT, and a "softmax function" is used as the activation function of the output layer OUTPUT.
As shown in FIG. 23, when the detection target image data (the front image of CAR) is input, the final processing result is the detection target objects "CAR", "TRUCK", "VAN", ..., included in the image data. BICYCLE” image recognition result is output.

［ＤＮＮ］
図２４は、ディープニューラルネットワーク（ＤＮＮ）の構造の一例を説明する図である。
図２４に示すように、ディープニューラルネットワーク（ＤＮＮ）１は、入力層（input layer）１１、任意の数の中間層である隠れ層（hidden layer）１２、出力層（output layer）１３を有して構成される。
入力層（input layer）１１は、複数個（ここでは８）の入力ノード（ニューロン）を有する。隠れ層１２は、複数（ここでは３層（hidden layer1，hidden layer2，hidden layer3））である。実際には、隠れ層１２の層数ｎは、例えば２０～１００に達する。出力層１３は、識別対象の数（ここでは４）の出力ノード（ニューロン）を有する。なお、層数およびノード数（ニューロン数）は、一例である。
ディープニューラルネットワーク１は、入力層１１と隠れ層１２のノード間が全て結合し、隠れ層１２と出力層１３のノード間が全て結合している。 [DNN]
FIG. 24 is a diagram illustrating an example of the structure of a deep neural network (DNN).
As shown in FIG. 24, a deep neural network (DNN) 1 has an input layer 11, an arbitrary number of hidden layers 12, and an output layer 13. consists of
The input layer 11 has a plurality of (eight here) input nodes (neurons). There are a plurality of hidden layers 12 (here, three layers (hidden layer 1, hidden layer 2, hidden layer 3)). In practice, the number n of hidden layers 12 reaches 20-100, for example. The output layer 13 has a number of output nodes (neurons) to be identified (here, 4). Note that the number of layers and the number of nodes (the number of neurons) are examples.
In the deep neural network 1, the nodes of the input layer 11 and the hidden layer 12 are all connected, and the nodes of the hidden layer 12 and the output layer 13 are all connected.

入力層１１、隠れ層１２および出力層１３には、任意の数のノード（図２４の○印参照）が存在する。このノードは、入力を受け取り、値を出力する関数である。入力層１１には、入力ノードとは別に独立した値を入れるバイアス（bias）ノードがある。構成は、複数のノードを持つ層を重ねることで構築される。伝播は、受け取った入力に対して重み（weight）をかけ、受け取った入力を次層に活性化関数（activation function）で変換して出力する。活性化関数は、sigmoid関数やtanh関数（双曲正接関数）などの非線形関数、ReLU（Rectified Linear Unit function：正規化線形関数）がある。ノード数を増やすことで、扱う変数を増やし、多数の要素を加味して値／境界を決定できる。層数を増やすことで、直線境界の組み合わせ、複雑な境界を表現できる。学習は、誤差を計算し、それを基に各層の重みを調整する。学習は、誤差を最小化する最適化問題を解くことであり、最適化問題の解法は誤差逆伝播法（Backpropagation）を使うのが一般的である。誤差は、二乗和誤差を使うのが一般的である。汎化能力を高めるために、誤差に正則化項を加算する。誤差逆伝播法は、誤差を出力層１３から伝播させていき、各層の重みを調整する。 An arbitrary number of nodes (see circles in FIG. 24) exist in the input layer 11, the hidden layer 12, and the output layer 13. FIG. This node is a function that takes an input and outputs a value. The input layer 11 has a bias node for inputting a value independent of the input node. Configurations are built by stacking layers with multiple nodes. Propagation weights the received input, converts the received input to the next layer with an activation function, and outputs it. Activation functions include non-linear functions such as sigmoid function and tanh function (hyperbolic tangent function), and ReLU (Rectified Linear Unit function: normalized linear function). By increasing the number of nodes, it is possible to handle more variables and determine values/boundaries in consideration of many factors. By increasing the number of layers, a combination of straight boundaries and complex boundaries can be expressed. Learning calculates the error and adjusts the weights of each layer based on it. Learning is to solve an optimization problem that minimizes the error, and it is common to use the error backpropagation method to solve the optimization problem. As for the error, it is common to use the sum of squares error. A regularization term is added to the error to improve the generalization ability. The error backpropagation method propagates the error from the output layer 13 and adjusts the weight of each layer.

図２４のディープニューラルネットワーク１の構成を２次元に展開することで画像処理に適したＣＮＮを構築できる。また、ディープニューラルネットワーク１にフィードバックを入れることで、双方向に信号が伝播するＲＮＮ（Recurrent Neural Network：再帰型ニューラルネットワーク）を構成することができる。 By developing the configuration of the deep neural network 1 in FIG. 24 two-dimensionally, a CNN suitable for image processing can be constructed. Further, by inputting feedback into the deep neural network 1, it is possible to construct an RNN (Recurrent Neural Network) in which signals propagate in both directions.

図２４の太破線三角部に示すように、ディープニューラルネットワーク１は、多層のニューラルネットワークを実現する回路（以下、ニューラルネットワーク回路という）２から構成されている。
本技術は、ニューラルネットワーク回路２を対象とする。ニューラルネットワーク回路２の適用箇所および適用数は限定されない。例えば、隠れ層１２の層数ｎ：２０～３０の場合、これらの層のどの位置に適用してもよく、またどのノードを入出力ノードとするものでもよい。さらに、ディープニューラルネットワーク１に限らず、どのようなニューラルネットワークでもよい。ただし、入力層１１または出力層１３のノード出力には、２値化出力ではなく多ビット出力が求められるので、ニューラルネットワーク回路２は、対象外である。ただし、出力層１３のノードを構成する回路に、乗算回路が残ったとしても面積的には問題にはならない。
なお、入力データに対し学習済のものを評価していくことを前提としている。したがって、学習結果として重みｗｉは既に得られている。 24, the deep neural network 1 is composed of a circuit (hereinafter referred to as a neural network circuit) 2 that realizes a multi-layered neural network.
The present technology targets the neural network circuit 2 . The application locations and the number of applications of the neural network circuit 2 are not limited. For example, when the number of layers n of the hidden layer 12 is 20 to 30, it may be applied to any position in these layers, and any node may be used as an input/output node. Further, the deep neural network 1 is not limited to any neural network. However, the neural network circuit 2 is out of the scope because the node output of the input layer 11 or the output layer 13 requires a multi-bit output instead of a binarized output. However, even if a multiplier circuit remains in the circuit forming the node of the output layer 13, there is no problem in terms of area.
In addition, it is premised on evaluating what has been learned with respect to input data. Therefore, the weight wi is already obtained as a learning result.

<ニューラルネットワーク回路>
図２５は、比較例のニューラルネットワーク回路の構成の一例を示す図である。図２５は、１つのニューロンの構成を示す。
比較例のニューラルネットワーク回路２０は、図２４のディープニューラルネットワーク１を構成するニューラルネットワーク回路２に適用できる。なお、以下の各図の表記において、値が多ビットである場合は太実線矢印とバンドルで、また値が２値である場合は細太実線矢印で示す。 <Neural network circuit>
FIG. 25 is a diagram illustrating an example of a configuration of a neural network circuit of a comparative example; FIG. 25 shows the organization of one neuron.
The neural network circuit 20 of the comparative example can be applied to the neural network circuit 2 forming the deep neural network 1 of FIG. In the notation of each figure below, a multi-bit value is indicated by a thick solid line arrow and a bundle, and a binary value is indicated by a thin thick solid line arrow.

図２５に示すように、ニューラルネットワーク回路２０は、入力値（判別データ）Ｘ１～Ｘｎ（多ビット）を入力する入力ノードおよび重みＷ１～Ｗｎ（多ビット）を入力する入力部２１と、バイアスＷ０（多ビット）を入力するバイアスＷ０入力部２２と、入力値Ｘ１～Ｘｎおよび重みＷ１～Ｗｎを受け取り、入力値Ｘ１～Ｘｎに重みＷ１～Ｗｎをそれぞれ乗算する複数の乗算回路２３と、各乗算値とバイアスＷ０との総和を取る総和回路２４と、総和を取った信号Ｙを活性化関数ｆact(Y)で変換する活性化関数回路２５と、を備えて構成される。 As shown in FIG. 25, the neural network circuit 20 includes input nodes for inputting input values (discrimination data) X1 to Xn (multi-bit), an input section 21 for inputting weights W1 to Wn (multi-bit), and a bias W0. a bias W0 input unit 22 for inputting (multi-bit), a plurality of multiplication circuits 23 for receiving input values X1 to Xn and weights W1 to Wn, and multiplying the input values X1 to Xn by the weights W1 to Wn, and each multiplication It comprises a summation circuit 24 that sums the value and the bias W0, and an activation function circuit 25 that converts the summed signal Y with an activation function fact(Y).

入力値（判別データ）Ｘ１～Ｘｎ（多ビット）は、ニューロンの入力値であり、第２階層以降では、前階層ニューロンの出力値である。
乗算回路２３は、学習によって得られた重みＷ１～Ｗｎ（多ビット）を各前階層ニューロンの出力値に乗じた結果を出力する。重みＷ１～Ｗｎは、一般的に知られているバックプロパゲーション等の学習アルゴリズムを使用して、検出する対象物毎に決定されている。
総和回路２４は、乗算回路２３からの乗算結果を累積加算する。
活性化関数回路２５は、総和回路２４からの累積加算結果Ｙを、sigmoid関数やtanh関数等により非線形変換し、その変換結果を検出結果Ｚとして出力する。
以上の構成において、ニューラルネットワーク回路２０は、入力値Ｘ１～Ｘｎ（多ビット）を受け取り、重みＷ１～Ｗｎを乗算した後に、バイアスＷ０を含めて総和を取った信号Ｙを活性化関数回路２５を通すことで人間のニューロンに模した処理を実現している。 The input values (discrimination data) X1 to Xn (multi-bit) are the input values of the neurons, and the output values of the previous-layer neurons in the second and subsequent layers.
The multiplication circuit 23 outputs the result of multiplying the output value of each previous layer neuron by the weights W1 to Wn (multi-bit) obtained by learning. The weights W1 to Wn are determined for each object to be detected using a generally known learning algorithm such as back propagation.
The summation circuit 24 cumulatively adds the multiplication results from the multiplication circuit 23 .
The activation function circuit 25 non-linearly transforms the cumulative addition result Y from the summation circuit 24 using a sigmoid function, a tanh function, or the like, and outputs the transformation result as a detection result Z. FIG.
In the above configuration, the neural network circuit 20 receives the input values X1 to Xn (multi-bit), multiplies them by the weights W1 to Wn, and then outputs the signal Y, which is the sum including the bias W0, to the activation function circuit 25. By passing it through, it realizes processing that imitates human neurons.

［比較例の畳込み演算回路］
次に、比較例の畳込み演算回路について説明する。
図２６～図３０は、比較例の畳込み演算回路３０の構成を示す図である。比較例の畳込み演算回路３０は、例えば図２３の畳込み層Ｃ１～Ｃ５の畳込み演算部を構成する。比較例の畳込み演算回路３０は、図２５に示すニューラルネットワーク回路２０に備えられる。図２５と同一構成部分には同一符号を付している。
比較例の畳込み演算回路３０は、物体検出（Dilated Convolution）する際に、入力画像（画像データ）と重み関数の畳込み演算を行う。 [Convolution operation circuit of comparative example]
Next, a convolution operation circuit of a comparative example will be described.
26 to 30 are diagrams showing the configuration of the convolution circuit 30 of the comparative example. The convolutional operation circuit 30 of the comparative example constitutes the convolutional operation units of the convolutional layers C1 to C5 in FIG. 23, for example. The convolution circuit 30 of the comparative example is provided in the neural network circuit 20 shown in FIG. The same components as those in FIG. 25 are denoted by the same reference numerals.
The convolution circuit 30 of the comparative example performs convolution of an input image (image data) and a weighting function when detecting an object (dilated convolution).

図２６～図３０に示すように、比較例の畳込み演算回路３０は、入力値Ｘ００～Ｘ４４を展開するバッファメモリ３１と、重みＷ１～Ｗｎ（多ビット）を記憶する重みメモリ３２と、３×３カーネル（kernel）３３の値Ｘに、重みメモリ３２から読み出した重みＷ１～Ｗｎをそれぞれ乗算する複数の乗算回路２３と、畳込み乗算値の総和を取る総和回路２４ａと、総和結果にバイアスＷ０を加算する加算回路２４ｂと、加算回路２４ｂからの累積加算結果Ｙを活性化関数ｆact(Y)で変換する活性化関数回路２５と、を備えて構成される。
総和回路２４ａと加算回路２４ｂは、図２５の総和回路２４を構成する。活性化関数回路２５は、sigmoid関数やtanh関数などの非線形関数、ReLUである。 As shown in FIGS. 26 to 30, the convolution operation circuit 30 of the comparative example includes a buffer memory 31 that develops input values X00 to X44, a weight memory 32 that stores weights W1 to Wn (multi-bit), 3 A plurality of multiplication circuits 23 for multiplying the value X of a ×3 kernel 33 by the weights W1 to Wn read from the weight memory 32, a summation circuit 24a for summing the convolution multiplied values, and a bias for the summation result. It comprises an adder circuit 24b that adds W0 and an activation function circuit 25 that converts the cumulative addition result Y from the adder circuit 24b by an activation function fact(Y).
The summing circuit 24a and the adding circuit 24b constitute the summing circuit 24 of FIG. The activation function circuit 25 is a non-linear function such as a sigmoid function or a tanh function, ReLU.

図２６～図３０に示すように、比較例の畳込み演算回路３０は、バッファメモリ３１に展開したInput Feature Mapに画像（行列の各要素はそれぞれ画像の１ピクセルに対応）を入力する。この入力画像に対して、カーネル（この例ではＫ＝３の３×３カーネル３３）の値と行列の値を要素毎に掛け合わせ、それらの値を合計する。この操作をカーネルをスライドさせながら各要素に対して行うことで全体の畳込み演算を行う。以下、具体的に説明する。 As shown in FIGS. 26 to 30, the convolution circuit 30 of the comparative example inputs an image (each element of the matrix corresponds to one pixel of the image) to the Input Feature Map developed in the buffer memory 31. FIG. For this input image, the values of the kernel (in this example a 3×3 kernel 33 with K=3) are multiplied element by element by the values of the matrix, and the values are summed. By performing this operation on each element while sliding the kernel, the entire convolution operation is performed. A specific description will be given below.

まず、図２６に示すように、比較例の畳込み演算回路３０は、３×３カーネル３３（図Ｄ網掛け参照）を、左上のＸ００，Ｘ０１，Ｘ０２，Ｘ１０，Ｘ１１，Ｘ１２，Ｘ２０，Ｘ２１，Ｘ２２に移動させる。乗算回路２３は、３×３カーネル３３の値（Ｘ００，Ｘ０１，Ｘ０２，Ｘ１０，Ｘ１１，Ｘ１２，Ｘ２０，Ｘ２１，Ｘ２２）と重みメモリ３２から読み出した重みＷ１～Ｗｎを要素毎に掛け合わせる畳込み演算を行う。総和回路２４ａは、それらの値を合計し（畳込み乗算値の総和を取り）、加算回路２４ｂは、総和結果にバイアスＷ０を加算する。そして、活性化関数回路２５は、加算回路２４ｂからの累積加算結果Ｙを活性化関数ｆact(Y)で変換する。 First, as shown in FIG. 26, the convolution circuit 30 of the comparative example converts the 3×3 kernel 33 (see hatching in FIG. , X22. The multiplication circuit 23 multiplies the values (X00, X01, X02, X10, X11, X12, X20, X21, X22) of the 3×3 kernel 33 by the weights W1 to Wn read out from the weight memory 32 element by element. perform calculations. The summation circuit 24a sums the values (sums the convolution multiplied values), and the summation circuit 24b adds the bias W0 to the summation result. The activation function circuit 25 converts the cumulative addition result Y from the addition circuit 24b with the activation function fact(Y).

次いで、図２７に示すように、比較例の畳込み演算回路３０は、３×３カーネル３３を、左から右に１つずらす。乗算回路２３は、３×３カーネル３３の値（Ｘ０１，Ｘ０２，Ｘ０３，Ｘ１１，Ｘ１２，Ｘ１３，Ｘ２１，Ｘ２２，Ｘ２３）と重みメモリ３２から読み出した重みＷ１～Ｗｎを要素毎に掛け合わせる。総和回路２４ａは、それらの値を合計する。加算回路２４ｂは、総和結果にバイアスＷ０を加算し、活性化関数回路２５は、加算回路２４ｂからの累積加算結果Ｙを活性化関数ｆact(Y)で変換する。 Next, as shown in FIG. 27, the convolution circuit 30 of the comparative example shifts the 3×3 kernel 33 by one from left to right. The multiplication circuit 23 multiplies the values (X01, X02, X03, X11, X12, X13, X21, X22, X23) of the 3×3 kernel 33 by the weights W1 to Wn read out from the weight memory 32 element by element. A summation circuit 24a sums those values. The adder circuit 24b adds the bias W0 to the total sum result, and the activation function circuit 25 converts the cumulative addition result Y from the adder circuit 24b with the activation function fact(Y).

次いで、図２８に示すように、比較例の畳込み演算回路３０は、３×３カーネル３３を、左から右に１つずらす。この場合、３×３カーネル３３は、右上終端までスライドさせる。この入力画像に対して同様の演算を行う。すなわち、３×３カーネル３３の値（Ｘ０２，Ｘ０３，Ｘ０４，Ｘ１２，Ｘ１３，Ｘ１４，Ｘ２２，Ｘ２３，Ｘ２４）と行列の値を要素毎に掛け合わせ、それらの値を合計する。加算回路２４ｂは、総和結果にバイアスＷ０を加算し、活性化関数回路２５は、加算回路２４ｂからの累積加算結果Ｙを活性化関数ｆact(Y)で変換する。 Next, as shown in FIG. 28, the convolution circuit 30 of the comparative example shifts the 3×3 kernel 33 by one from left to right. In this case, the 3x3 kernel 33 is slid to the upper right end. A similar operation is performed on this input image. That is, the values of the 3×3 kernel 33 (X02, X03, X04, X12, X13, X14, X22, X23, X24) are multiplied element by element by the matrix values, and the values are totaled. The adder circuit 24b adds the bias W0 to the total sum result, and the activation function circuit 25 converts the cumulative addition result Y from the adder circuit 24b with the activation function fact(Y).

次いで、図２９に示すように、比較例の畳込み演算回路３０は、３×３カーネル３３を、左端の１つ下に移動させる。この入力画像に対して、３×３カーネル３３の値（Ｘ１１，Ｘ１１，Ｘ１２，Ｘ２０，Ｘ２１，Ｘ２２，Ｘ３０，Ｘ３１，Ｘ３２）と行列の値を要素毎に掛け合わせ、それらの値を合計する。 Next, as shown in FIG. 29, the convolution operation circuit 30 of the comparative example moves the 3×3 kernel 33 one position below the left end. For this input image, the values of the 3×3 kernel 33 (X11, X11, X12, X20, X21, X22, X30, X31, X32) are multiplied element by element by the matrix values, and the values are summed. .

以下同様に、３×３カーネル３３を、右下までずらし、比較例の畳込み演算回路３０は、入力画像に対して同様の演算を行う。図３０に示すように、左下かつ右下が３×３カーネル３３をスライドさせる終端となる。比較例の畳込み演算回路３０は、３×３カーネル３３の値（Ｘ２２，Ｘ２３，Ｘ２４，Ｘ３２，Ｘ３３，Ｘ３４，Ｘ４２，Ｘ４３，Ｘ４４）と行列の値を要素毎に掛け合わせ、それらの値を合計する。加算回路２４ｂは、総和結果にバイアスＷ０を加算し、活性化関数回路２５は、加算回路２４ｂからの累積加算結果Ｙを活性化関数ｆact(Y)で変換する。
右下に移動後、Ｘ００にカーネルを戻し（図２６参照）、再度計算を行う。 Similarly, the 3×3 kernel 33 is shifted to the lower right, and the convolution operation circuit 30 of the comparative example performs similar operations on the input image. As shown in FIG. 30, the lower left and lower right are the ends for sliding the 3×3 kernel 33 . The convolution operation circuit 30 of the comparative example multiplies the values (X22, X23, X24, X32, X33, X34, X42, X43, X44) of the 3×3 kernel 33 and the values of the matrix element by element, and obtains the values sum up The adder circuit 24b adds the bias W0 to the total sum result, and the activation function circuit 25 converts the cumulative addition result Y from the adder circuit 24b with the activation function fact(Y).
After moving to the lower right, return the kernel to X00 (see FIG. 26) and perform the calculation again.

このように、比較例の畳込み演算回路３０は、バッファメモリ３１の入力画像に対して、３×３カーネル３３の値と行列の値を要素毎に掛け合わせ、それらの値を合計する。この操作を、３×３カーネル３３を左上から右下までスライドさせながら各要素に対して行うことで全体の畳込み演算を行う。
なお、ゼロパディングは、行う場合と行わない場合とがある。 In this manner, the convolution operation circuit 30 of the comparative example multiplies the input image in the buffer memory 31 by the value of the 3×3 kernel 33 and the value of the matrix element by element, and totals the values. By performing this operation on each element while sliding the 3×3 kernel 33 from the upper left to the lower right, the entire convolution operation is performed.
Note that zero padding may or may not be performed.

ＣＮＮでは、上述したような、観測画像と重みマトリクスの畳込み演算が約１千万回ないし１０億回実行される。このため、コンピュータパワーと計算時間、重みマトリクスを格納するメモリ容量が膨大となる。また、大量の積和演算とそのパラメータを保持するため、組込み機器で実現することが難しい。 In the CNN, the convolution operation of the observed image and the weight matrix as described above is executed about 10 million to 1 billion times. Therefore, the computer power, calculation time, and memory capacity for storing the weight matrix become enormous. Moreover, since a large amount of sum-of-products operations and their parameters are stored, it is difficult to implement in embedded devices.

（実施形態）
図１は、本発明の実施形態に係るニューラルネットワーク回路が備える雑音畳込み演算回路１００の構成を示す図である。「雑音畳込み」は、発見に基づき本発明者が命名したものである。雑音畳込み演算回路１００は、図２６～図３０に示す比較例の畳込み演算回路３０に置き換えられる。
図１に示すように、雑音畳込み演算回路１００は、雑音（ノイズ）ｎ_ｃを生成する雑音生成回路１１０と、１×１畳込みを実行するための入力値Ｘを受け取り、入力値Ｘに雑音生成回路１１０で生成された雑音ｎ_ｃを加算する加算回路１２０（演算回路，演算手段）と、重みＷを受け取り、雑音を乗せた加算回路１２０の雑音畳込み演算値と重みＷを乗算する乗算回路１３０（乗算手段）と、を備える。 (embodiment)
FIG. 1 is a diagram showing the configuration of a noise convolution arithmetic circuit 100 included in a neural network circuit according to an embodiment of the present invention. "Noise convolution" is what the inventor named based on his findings. The noise convolution circuit 100 is replaced with the convolution circuit 30 of the comparative example shown in FIGS.
As shown in FIG. 1, the noise convolution arithmetic circuit 100 receives a noise generation circuit 110 that generates noise _nc , and an input value X for performing a 1×1 convolution. An adder circuit 120 (arithmetic circuit, arithmetic means) that adds the noise _nc generated by the noise generator circuit 110 receives the weight W and multiplies the weight W by the noise convoluted value of the adder circuit 120 with added noise. and a multiplication circuit 130 (multiplication means).

雑音生成回路１１０は、乱数を生成する乱数生成器により構成される。
雑音ｎ_ｃは、乱数生成器で出力される値である。
乱数生成器は、乱数の可制御性（再現性）があればどのような手法（例えば、ＸＯＲ－Ｓｈｉｆｔ，ＬＦＳＲ（linear feedback shift register：線形帰還シフトレジスタ），線形合同法（Linear congruential generators）など）で乱数を生成するものでもよい。 The noise generation circuit 110 is composed of a random number generator that generates random numbers.
The noise n _c is the value output by the random number generator.
The random number generator can be any method (for example, XOR-Shift, LFSR (linear feedback shift register: linear feedback shift register), linear congruential generators, etc. if there is controllability (reproducibility) of random numbers. ) to generate random numbers.

本実施形態では、乱数に疑似乱数を用いる。疑似乱数は、学習ツールを用いることで容易に作成することができる。ただし、乱数に真性乱数を用いることも可能である。真正乱数を用いる場合、１×１畳込みの学習が困難になると予想される。すなわち、学習後の重みパラメータの観測から、十分に精度がでる重みの分布は、乱数生成による分布とほぼ一致する。このとき、一致しない重みは、１×１畳込みで補正する。乱数に真性乱数を用いると、１×１畳込みの学習が困難になる。 In this embodiment, pseudo-random numbers are used as random numbers. Pseudo-random numbers can be easily created using a learning tool. However, it is also possible to use true random numbers as the random numbers. It is expected that learning a 1×1 convolution will be difficult if true random numbers are used. In other words, the distribution of weights with sufficient accuracy from the observation of the weight parameters after learning substantially matches the distribution by random number generation. At this time, non-matching weights are corrected by 1×1 convolution. Learning 1×1 convolution becomes difficult if true random numbers are used as random numbers.

雑音ｎ_ｃを乗せる演算は、加算回路１２０による加算である。雑音を加算により乗せる理由について述べる。雑音を乗算により乗せることも考えられるが、乗算すると学習ができなくなる。深層学習の学習方法は、一般に誤差逆伝搬法を使用する。誤差逆伝搬法では、誤算の微分値を更新に用いるが、雑音を乗算する態様では、微分値が残ってしまい雑音の更新が行われてしまう。結果として、雑音の分布の仮定が崩れるのでいつまで経っても学習が進まないことになる。そこで、雑音畳込み演算回路１００は、雑音を加算により乗せている。なお、雑音を乗せる演算を、加算により行うことについては、本発明者が初めて見出したものである。 The calculation for adding the noise _nc is addition by the adder circuit 120 . The reason why noise is superimposed by addition will be described. It is conceivable to add noise by multiplication, but multiplication makes learning impossible. Deep learning learning methods generally use error backpropagation. In the error backpropagation method, the differential value of the miscalculation is used for updating, but in the aspect of multiplying the noise, the differential value remains and the noise is updated. As a result, the assumption of the noise distribution collapses, and the learning does not progress no matter how long it takes. Therefore, the noise convolution arithmetic circuit 100 adds noise by addition. It should be noted that the inventor of the present invention was the first to find out that the operation of adding noise is performed by addition.

図２は、本発明の実施形態に係るニューラルネットワーク回路５０が備える雑音畳込み演算回路１００のブロック図である。
図２に示すように、ニューラルネットワーク回路５０は、雑音畳込み演算を実行する雑音畳込み演算回路１００と、装置全体を制御するＣＰＵ６０と、認識用データを格納するオフチップメモリ７０と、各部を繋ぐシステムバス８０と、を備える。 FIG. 2 is a block diagram of the noise convolution arithmetic circuit 100 included in the neural network circuit 50 according to the embodiment of the invention.
As shown in FIG. 2, the neural network circuit 50 includes a noise convolution computation circuit 100 that executes noise convolution computation, a CPU 60 that controls the entire device, an off-chip memory 70 that stores recognition data, and various parts. and a system bus 80 for connecting.

雑音畳込み演算回路１００は、入力値Ｘ００～Ｘ４４を展開するバッファメモリ１０１と、雑音（ノイズ）ｎ_ｃを生成する雑音生成回路１１０と、１×１カーネル１１１の値Ｘに、雑音生成回路１１０からの雑音ｎ_ｃを加算する加算回路１２０と、重みＷを記憶する重みメモリ１３１と、雑音ｎ_ｃを加えた１×１カーネル１１１の値Ｘに、重みメモリ１３１から読み出した重みＷを乗算する乗算回路１３０と、雑音畳込み乗算値にバイアスＷ０を加算する加算回路１４０と、加算回路１４０からの累積加算結果Ｙを活性化関数ｆact(Y)で変換する活性化関数回路１５０と、を備える。
活性化関数回路１５０は、sigmoid関数やtanh関数などの非線形関数、ReLUである。 The noise convolution arithmetic circuit 100 includes a buffer memory 101 that develops input values X00 to X44, a noise generation circuit 110 that generates noise _nc , a value X of a 1×1 kernel 111, and a noise generation circuit 110 a weight memory 131 for storing the weight _W ; and the value X of the 1×1 kernel 111 added with the noise _nc is multiplied by the weight W read from the weight memory 131. A multiplication circuit 130, an addition circuit 140 that adds a bias W0 to the noise-convolved multiplication value, and an activation function circuit 150 that transforms the cumulative addition result Y from the addition circuit 140 with an activation function fact(Y). .
The activation function circuit 150 is a non-linear function such as a sigmoid function or a tanh function, ReLU.

雑音畳込み演算回路１００は、外部にＣＰＵ６０、オフチップメモリ７０、およびシステムバス８０を付加し、既存のＦＰＧＡ（field-programmable gate array）などで実現される。 The noise convolution arithmetic circuit 100 is implemented by an existing FPGA (field-programmable gate array) or the like by adding a CPU 60, an off-chip memory 70, and a system bus 80 to the outside.

ニューラルネットワーク回路５０は、オフチップメモリ７０に認識用データを格納する。ＣＰＵ６０は、ＤＭＡ（Direct Memory Access）コントローラ（図示省略）にＤＭＡ転送命令を指示し、ＤＭＡ転送によりオフチップメモリ７０からバッファメモリ１０１に認識用データを送る。ＣＰＵ６０は、雑音畳込み演算回路１００を起動させ、雑音畳込み演算回路１００は、雑音畳込みを行う。畳込み処理が終わると、バッファメモリ１０１からオフチップメモリ７０に認識データを転送して、ＣＰＵ６０で後処理（例えば、一般物体認識では物体検出処理、意味的領域分割では領域の推定処理、姿勢推定では骨格推定）を行う。 The neural network circuit 50 stores recognition data in the off-chip memory 70 . The CPU 60 issues a DMA transfer command to a DMA (Direct Memory Access) controller (not shown), and sends recognition data from the off-chip memory 70 to the buffer memory 101 by DMA transfer. The CPU 60 activates the noise convolution arithmetic circuit 100, and the noise convolution arithmetic circuit 100 performs noise convolution. When the convolution processing is completed, the recognition data is transferred from the buffer memory 101 to the off-chip memory 70, and post-processing is performed by the CPU 60 (for example, object detection processing in general object recognition, region estimation processing in semantic region segmentation, and pose estimation processing). Then, perform skeleton estimation).

以下、上述のように構成された雑音畳込み演算回路１００の作用効果について説明する。
（本発明の原理説明）
まず、本発明の基本原理について説明する。
本発明の着眼点は、ニューラルネットワークの学習後のパラメータ(重み)が、雑音と同じ統計的性質を持つことができることを発見したことである。 The effects of the noise convolution arithmetic circuit 100 configured as described above will be described below.
(Explanation of principle of the present invention)
First, the basic principle of the present invention will be explained.
The focal point of the present invention is the discovery that post-learning parameters (weights) of neural networks can have the same statistical properties as noise.

図３は、本発明の基本原理を説明するＣＮＮの概略構成図であり、図３（ａ）はそのＣＮＮのモジュール構成図、図３（ｂ）－（ｄ）はその学習後のパラメータ（重み）を統計的に示すヒストグラムである。図３（ａ）中のブロックは、畳込み層（プーリング層を含む）を示し、畳込み層に付された数値は、畳込み層のサイズを示している。また、ｋはカーネルのサイズ、ｓはカーネルをスライドさせるスライドサイズである。
図３（ａ）の符号ａに示すように、第１畳込み層Ｃ１における学習後のパラメータ（重み）を統計的に解析すると、図３（ｂ）に示すヒストグラムとなる。
図３（ａ）の符号ｂに示すように、第２畳込み層Ｃ２における学習後のパラメータ（重み）を統計的に解析すると、図３（ｃ）に示すヒストグラムとなる。
図３（ａ）の符号ｃに示すように、第５畳込み層Ｃ５における学習後のパラメータ（重み）を統計的に解析すると、図３（ｄ）に示すヒストグラムとなる。 FIG. 3 is a schematic configuration diagram of a CNN for explaining the basic principle of the present invention, FIG. 3(a) is a module configuration diagram of the CNN, and FIGS. ) statistically. Blocks in FIG. 3(a) indicate convolutional layers (including pooling layers), and numerical values attached to the convolutional layers indicate sizes of the convolutional layers. Also, k is the kernel size, and s is the slide size for sliding the kernel.
Statistically analyzing the parameters (weights) after learning in the first convolutional layer C1 as indicated by symbol a in FIG. 3(a) results in a histogram shown in FIG. 3(b).
Statistical analysis of the parameters (weights) after learning in the second convolutional layer C2, as indicated by symbol b in FIG. 3(a), results in a histogram shown in FIG. 3(c).
Statistical analysis of the parameters (weights) after learning in the fifth convolutional layer C5, as indicated by symbol c in FIG. 3(a), results in a histogram shown in FIG. 3(d).

図４は、本発明の基本原理を説明する雑音のヒストグラムを示す図であり、図４（ａ）は雑音を統計的に示す図、図４（ｂ）は図４（ａ）の雑音を統計的に解析したヒストグラムである。図４（ａ）の縦軸は発生した雑音の振れ幅（大きさ）、図４（ａ）の横軸は時間軸（雑音発生の時間経過）である。
図４（ａ）に示すように、雑音は、雑音なし（雑音０）を中心にランダムな振れ幅（０．５～－０．５）で発生する。
図４（ａ）に示す雑音を雑音発生の時間経過に従って発生頻度順に累積すると、図４（ｂ）に示すヒストグラムが得られる。 4A and 4B are diagrams showing histograms of noise for explaining the basic principle of the present invention, FIG. 4A is a diagram showing the noise statistically, and FIG. Histograms are systematically analyzed. The vertical axis of FIG. 4A is the amplitude (magnitude) of the generated noise, and the horizontal axis of FIG.
As shown in FIG. 4(a), noise is generated with a random amplitude (0.5 to -0.5) around no noise (noise 0).
A histogram shown in FIG. 4B is obtained by accumulating the noise shown in FIG. 4A in order of occurrence frequency over time.

本発明者は、図４（ｂ）に示す雑音のヒストグラムは、ニューラルネットワークの、学習後の重みパラメータのヒストグラム（図３（ｂ）－（ｄ）参照）とほぼ一致していることを発見した。すなわち、ニューラルネットワークの学習後のパラメータ(重み)は、雑音と同じ統計的性質を持つ（ただし、学習後のパラメータ(重み)のうち、ニューラルネットワークの最初のレイヤを除く）ことを発見した。
上記発見をもとに、ニューラルネットワークの学習後のパラメータ(重み)が、雑音と同じ統計的性質を持つ条件について下記の知見を得た。 The inventors have found that the histogram of noise shown in FIG. 4(b) approximately matches the histogram of weight parameters after training of the neural network (see FIGS. 3(b)-(d)). . In other words, we discovered that the parameters (weights) after training of the neural network have the same statistical properties as noise (except for the first layer of the neural network among the parameters (weights) after training).
Based on the above findings, the following findings were obtained regarding the conditions under which the parameters (weights) after training of the neural network have the same statistical properties as noise.

［雑音畳込み層の重みｖの導出］
まず、雑音畳込み層の重みｖの導出について述べる。
雑音畳込み層の重みｖとは、雑音を除いた既存の畳込みの重みそのもののことである。図１で示すと、図１のＷ（重み）のことである。下記式（１）は雑音の系列がわかってしまえば、それから重みを導出できることを示している。ただし、実用的には乱数を意図的に生成することは困難であるので、本実施形態に記載のようにハードウェアで生成可能な疑似乱数（雑音）に適した重みを学習させて求める。
雑音畳込み層の演算は、次式（１）で表すことができる。 [Derivation of weight v of noise convolution layer]
First, the derivation of the weight v of the noise convolution layer will be described.
The weight v of the noise convolution layer is the weight itself of the existing convolution without noise. In FIG. 1, it means W (weight) in FIG. Equation (1) below indicates that once the sequence of noise is known, the weight can be derived from it. However, since it is practically difficult to intentionally generate random numbers, weights suitable for pseudo-random numbers (noise) that can be generated by hardware are learned and obtained as described in this embodiment.
The operation of the noise convolution layer can be represented by the following equation (1).

ノイズフィルタの数Ｎと入力のチャネル数ｐが等しいときの重みｖは、次式（２）で表すことができ、ノイズフィルタの数Ｎが入力のチャネル数ｐより小さいときの重みｖは、次式（３）で表すことができる。 The weight v when the number N of noise filters and the number p of input channels are equal can be expressed by the following equation (2), and the weight v when the number N of noise filters is smaller than the number p of input channels is It can be expressed by the formula (3).

つまり、入力ｘ、ノイズフィルタＮ、出力ｙが与えられると重みｖを導出可能であるということがわかる。 In other words, given the input x, the noise filter N, and the output y, the weight v can be derived.

［雑音畳込み層と既存畳込み層との等価性］
次に、雑音畳込み層と既存畳込み層との等価性について述べる。
<前提>
・統計的に解析することが可能である。
・入力データが、
Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２
ただし、
ｘ_ｉ：畳込みが行われる入力テンソル内の各画素
Ｅ（ｘ_ｉ）：期待値
σ^２：分散
を満たすとき [Equivalence between noise convolution layer and existing convolution layer]
Next, the equivalence between the noise convolutional layer and the existing convolutional layer will be described.
<Assumption>
・Statistical analysis is possible.
・The input data is
E(x _i )=0, E(x _i ² )=σ ²
however,
x _i : each pixel in the input tensor to be convolved E(x _i ): expected value σ ² : when the variance is satisfied

<畳込み演算出力>
畳込み演算による出力ｙは、式（４）で示される。 <Convolution operation output>
The output y from the convolution operation is given by Equation (4).

<雑音（ノイズ）ｎ_ｃ>
期待値：Ｅ（ｎ_ｃ）＝Ｅ（Σ_Ｎｃε_ｉＷ_ｉ’）＝０、かつ、
分散：Ｅ（ｎ_ｃ ^２）＝Ｅ（Σ_Ｎｃε_ｉＷ_ｉ’）^２＝２σ^２δ’
となる雑音ｎ_ｃを用いる。 <noise n _c >
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
A noise n _c is used.

上記雑音（ノイズ）ｎ_ｃを用いると、式（５）に示すように雑音畳込み演算による出力ｙ’と既存畳込み演算による出力ｙ’とは、統計的に等価となる（詳細説明は後記）。
比較例の畳込み演算回路３０による畳込み演算の出力ｙ’と、雑音畳込み演算回路１００による雑音畳込み演算による出力ｙ’とが、統計的に等価であることを、式（５）中の枠囲みで示している。 When the above noise (noise) n _c is used, the output y' by the noise convolution operation and the output y' by the existing convolution operation are statistically equivalent as shown in Equation (5) (details will be described later. ).
In equation (5), the output y' of the convolution operation by the convolution operation circuit 30 of the comparative example and the output y' of the noise convolution operation by the noise convolution operation circuit 100 are statistically equivalent. is shown in a frame.

<雑音畳込み層と既存畳込み層の統計的に等価の式の導出の詳細>
次に、雑音畳込み層と既存畳込み層とが統計的に等価であることを示す途中式の導出について説明する。
確率分布の観点から雑音畳込み層と既存畳込み層の関係について考察する。畳込み層において、畳込みを行う入力の中心ピクセルｘ_ｃ、近傍ピクセルｘ_ｉ、相関関数、中心ピクセルρと近傍ピクセルの差ε＝ｘ_ｉ－ｘ_ｃ、近傍ピクセルの集合Ｎｃを用いる。ここで、前提条件としてＥ（ｘ_ｉ）＝０，Ｅ（ｘ_ｉ ^２）＝σ^２を仮定する。
まず、今後計算に使用するため、Ｅ（ε_ｉ），Ｅ（ε_ｉ ^２），Ｅ（ε_ｉε_ｊ）を次式（６）（７）（８）に従って求める。 <Details of derivation of statistically equivalent formulas for noise convolutional layers and existing convolutional layers>
Next, derivation of intermediate formulas showing that the noise convolution layer and the existing convolution layer are statistically equivalent will be described.
We consider the relationship between the noise convolutional layer and the existing convolutional layer from the viewpoint of probability distribution. The convolution layer uses the center pixel x _c , neighboring pixels x _i , the correlation function, the difference ε=x _i −x _c between the central pixel ρ and the neighboring pixels, and the set Nc of neighboring pixels of the input to be convolved. Here, as preconditions, E(x _i )=0 and E(x _i ² )=σ ² are assumed.
First, E(ε _i ), E(ε _i ² ), and E(ε _i ε _j ) are obtained according to the following equations (6), (7), and (8) for future use in calculations.

ここで、前記式（４）に示す畳込み演算は、式（９）のように式変形できる。 Here, the convolution operation shown in Equation (4) can be transformed into Equation (9).

ここで、Σ_ｉ∈Ｎｃε_ｉｗ_ｉ’に関して、Ｅ（Σ_ｉ∈Ｎｃε_ｉｗ_ｉ’），Ｅ（Σ_ｉ∈Ｎｃε_ｉｗ_ｉ’）^２を次式（１０）（１１）に従って求める。 Here, regarding Σ _iεNc _ε _iwi ', E(Σ _iεNc _ε _iwi '), E(Σ _iεNc _{ε iwi} _' ) ² according to the following equations (10) and (11): demand.

以上より、下記式（１２）となる雑音（ノイズ）ｎ_ｃを用いると、雑音畳込み層は畳込み層（既存畳込み層）と等価であると考察される。 From the above, it is considered that the noise convolutional layer is equivalent to the convolutional layer (existing convolutional layer) when the noise n _c given by the following equation (12) is used.

Ｅ（ｎ_ｃ）＝Ｅ（Σ_ｉ∈Ｎｃε_ｉｗ_ｉ’）＝０
Ｅ（ｎ_ｃ ^２）＝Ｅ（Σ_ｉ∈Ｎｃε_ｉｗ_ｉ’）^２＝２σ^２δ’ …（１２） E(n _c )=E(Σ _i∈Nc ε _i w _i ′)=0
E(n _c ² )=E(Σ _i∈Nc ε _i w _i ′) ² =2σ ² δ′ (12)

式（６）および式（７）と、式（１２）とを対比してわかるように、既存畳込みとノイズを加えた畳込みの「計算結果の平均と分散が一致する」ことを利用して、大量のメモリアクセスと積和演算を、軽量な回路で実現できるノイズ生成回路で生成した信号を乗せることに置き換えたこと、が本発明の特徴である。なお、計算結果の平均と分散が一致することであって、全く同じ計算結果になるということではない。 As can be seen by comparing Eq. A feature of the present invention is that a large amount of memory accesses and sum-of-products operations are replaced with signals generated by a noise generating circuit that can be realized by a lightweight circuit. It should be noted that the mean and the variance of the calculation results are the same, and it does not mean that the calculation results are exactly the same.

［雑音畳込み演算回路１００の動作］
次に、図１および図２に示す雑音畳込み演算回路１００の動作について説明する。
図５～図１３は、雑音畳込み演算回路１００の構成を示す図である。図１および図２と同一構成部分には同一符号を付している。また、図５～図１３を参照して、雑音畳込み演算回路１００の動作を説明する。
なお、図５～図１３に示す雑音畳込み演算回路１００の動作は、前記図２６～図３０に示す比較例の畳込み演算回路３０の動作に対応する。 [Operation of noise convolution arithmetic circuit 100]
Next, the operation of noise convolution arithmetic circuit 100 shown in FIGS. 1 and 2 will be described.
5 to 13 are diagrams showing the configuration of the noise convolution arithmetic circuit 100. FIG. The same components as those in FIGS. 1 and 2 are denoted by the same reference numerals. The operation of the noise convolution arithmetic circuit 100 will be described with reference to FIGS. 5 to 13. FIG.
The operation of the noise convolution circuit 100 shown in FIGS. 5-13 corresponds to the operation of the convolution circuit 30 of the comparative example shown in FIGS. 26-30.

まず、図５に示すように、雑音畳込み演算回路１００は、１×１カーネル１１１（図５網掛け参照）を、左上のＸ００に移動させる。図５の符号ｄに示すように、加算回路１２０は、１×１カーネル１１１の値Ｘ００に、雑音生成回路１１０からの雑音ｎ_ｃを加える。乗算回路１３０は、雑音ｎ_ｃを加えた１×１カーネル１１１の値Ｘ００に、重みメモリ１３１から読み出した重みＷを乗算する畳込み演算を行う。加算回路１４０は、雑音畳込み乗算値にバイアスＷ０を加算する。そして、活性化関数回路１５０は、加算回路１４０からの累積加算結果Ｙを活性化関数ｆact(Y)で変換する。 First, as shown in FIG. 5, the noise convolution arithmetic circuit 100 moves the 1×1 kernel 111 (see hatching in FIG. 5) to X00 on the upper left. Adder circuit 120 adds noise _nc from noise generator circuit 110 to value X00 of 1×1 kernel 111, as indicated by symbol d in FIG. The multiplication circuit 130 performs a convolution operation of multiplying the value X00 of the 1×1 kernel 111 to which noise _nc is added by the weight W read from the weight memory 131 . Adder circuit 140 adds a bias W0 to the noise-convolved product. The activation function circuit 150 converts the cumulative addition result Y from the addition circuit 140 with the activation function fact(Y).

次いで、図６に示すように、雑音畳込み演算回路１００は、１×１カーネル１１１を、左から右に１つずらし、Ｘ０１に移動させる。以下同様の雑音畳込み演算を行う。 Next, as shown in FIG. 6, the noise convolution arithmetic circuit 100 shifts the 1×1 kernel 111 from left to right by one to X01. Similar noise convolution calculations are performed thereafter.

以下同様して、図７～図９に示すように、１×１カーネル１１１を、右端までずらし、Ｘ０４に移動させ、同様の雑音畳込み演算を行う。
さらに、図１０に示すように、左下かつ右端に１×１カーネル１１１をスライドさせ、Ｘ１０に移動させる。同様の雑音畳込み演算を行う。さらに、図１１に示すように、１×１カーネル１１１を右にスライドさせ、Ｘ１１に移動させる。同様の雑音畳込み演算を行う。 Similarly, as shown in FIGS. 7 to 9, the 1×1 kernel 111 is shifted to the right end, moved to X04, and similar noise convolution calculations are performed.
Further, as shown in FIG. 10, the 1×1 kernel 111 is slid to the lower left and right end and moved to X10. A similar noise convolution operation is performed. Further, as shown in FIG. 11, the 1×1 kernel 111 is slid to the right and moved to X11. A similar noise convolution operation is performed.

図１２に示すように、１×１カーネル１１１をスライドさせ、右下のＸ４４で同様の雑音畳込み演算を行う。
１×１カーネル１１１を右下に移動後、図１３に示すように、１×１カーネル１１１をＸ００に戻し、再度計算を行う。図１３は、再度計算後の１×１カーネル１１１の位置であり、前記図５の１×１カーネル１１１の位置に戻る。 As shown in FIG. 12, the 1×1 kernel 111 is slid, and the same noise convolution operation is performed at the lower right X44.
After moving the 1×1 kernel 111 to the lower right, as shown in FIG. 13, the 1×1 kernel 111 is returned to X00 and calculation is performed again. FIG. 13 shows the position of the 1×1 kernel 111 after recalculation, returning to the position of the 1×1 kernel 111 in FIG.

このように、雑音畳込み演算回路１００は、３×３重みの代わりに雑音を乗せて１×１畳込み演算を実行する。雑音畳込み演算回路１００は、雑音により大部分の畳込みを代用できるので１×１畳込みで十分である。この場合も左上から右下に向かって畳込み演算を行う。 Thus, the noise convolution operation circuit 100 performs a 1×1 convolution operation with noise added instead of 3×3 weights. Since the noise convolution arithmetic circuit 100 can substitute most convolutions for noise, 1×1 convolution is sufficient. In this case also, the convolution operation is performed from the upper left to the lower right.

［雑音畳込み演算回路１００と比較例の畳込み演算回路３０との比較］
図１４は、雑音畳込み演算回路１００と比較例の畳込み演算回路３０とを比較して示す図であり、図１４（ａ）は比較例の畳込み演算回路３０の概略構成図、図１４（ｂ）は雑音畳込み演算回路１００の概略構成図である。
図１４（ａ）に示すように、比較例の畳込み演算回路３０は、３×３カーネル３３の値Ｘ（入力Ｘ）に、重みメモリ３２から読み出したパラメータ（重み）Ｗをそれぞれ乗算する積和演算回路（Matrix MAC）３５と、を備える。
比較例の畳込み演算回路３０は、全パラメータ（重み）を畳込み（通常は３×３畳込み）演算を行う。
図１４（ａ）の符号ｅに示すように、ＣＮＮでは、上記畳込み演算が約1千万回～１０億回行われる。計算量・メモリ量が爆発的に増加するため、軽量化したい切実な要望がある。 [Comparison between the noise convolution circuit 100 and the convolution circuit 30 of the comparative example]
14A and 14B are diagrams showing a comparison between the noise convolution circuit 100 and the convolution circuit 30 of the comparative example. FIG. 14A is a schematic configuration diagram of the convolution circuit 30 of the comparative example, and FIG. 3B is a schematic configuration diagram of the noise convolution arithmetic circuit 100. FIG.
As shown in FIG. 14A, the convolution circuit 30 of the comparative example multiplies the value X (input X) of the 3×3 kernel 33 by the parameter (weight) W read from the weight memory 32. and a sum calculation circuit (Matrix MAC) 35 .
The convolution circuit 30 of the comparative example performs convolution (usually 3×3 convolution) of all parameters (weights).
As indicated by symbol e in FIG. 14(a), in CNN, the convolution operation is performed about 10 million to 1 billion times. Since the amount of computation and memory increases explosively, there is an urgent need to reduce the weight.

図１４（ｂ）に示すように、雑音畳込み演算回路１００は、雑音（ノイズ）ｎ_ｃを生成する雑音生成回路１１０と、１×１カーネル１１１の値Ｘに、雑音生成回路１１０からの雑音（ノイズ）ｎ_ｃを加算する加算回路１２０と、雑音（ノイズ）ｎ_ｃを加えた１×１カーネル１１１の値Ｘに、パラメータ（重み）Ｗを乗算する乗算回路１３０と、を備える。
雑音畳込み演算回路１００は、１×１畳込み演算を実行する。つまり、比較例の畳込み演算回路３０は、全パラメータ（重み）を３×３畳込みを行うのに対し、雑音畳込み演算回路１００は、１×１畳込みのみを実行する。 As shown in FIG. 14(b), the noise convolution arithmetic circuit 100 includes a noise generation circuit 110 that generates noise _nc , a value X of a 1×1 kernel 111, and the noise from the noise generation circuit 110 An addition circuit 120 for adding (noise) _nc , and a multiplication circuit 130 for multiplying the value X of the 1×1 kernel 111 to which the noise (noise) _nc is added by a parameter (weight) W are provided.
The noise convolution operation circuit 100 performs a 1×1 convolution operation. That is, the convolution operation circuit 30 of the comparative example performs 3×3 convolution of all parameters (weights), whereas the noise convolution operation circuit 100 performs only 1×1 convolution.

図１４（ｂ）の符号ｆに示すように、雑音を乗せると１×１畳込みで等価になる（式（５）参照）。 As indicated by symbol f in FIG. 14(b), when noise is added, it becomes equivalent to 1×1 convolution (see equation (5)).

このように、雑音畳込み演算回路１００は、３×３重みの代わりに雑音を乗せて１×１畳込みを行う。これにより、重みを最大で３×３分の１に削減し、計算量も３×３分の１に削減する。これにより、計算量（約９０％の計算量）とメモリ量の削減ができる（詳細後記）。また、計算量とメモリ量を削減したにもかかわらず、認識精度の低下はないことが確認できた（詳細後記）。 In this manner, the noise convolution arithmetic circuit 100 performs 1×1 convolution by adding noise instead of 3×3 weights. As a result, the weight is reduced to 3×3/1 at maximum, and the computational complexity is also reduced to 3×3/1. As a result, the amount of calculation (approximately 90% of the amount of calculation) and the amount of memory can be reduced (details will be described later). In addition, it was confirmed that there was no decrease in recognition accuracy despite the reduction in the amount of calculation and memory (details will be described later).

［雑音畳込み演算回路１００と比較例の畳込み演算回路３０との作用効果比較］
図１５は、雑音畳込み演算回路１００と比較例の畳込み演算回路３０との作用効果を比較して示す図である。図１５（ａ）は比較例の畳込み演算回路３０の雑音を加えなかった場合の入力画像と畳込みを説明する図であり、図１５（ｂ）は雑音畳込み演算回路１００の雑音を加える場合の入力画像と１×１畳込みを説明する図である。
図１５（ａ）の符号ｇに示すように、比較例の畳込み演算回路３０では、３×３などの大きなカーネルを用いて雑音に近い重みを畳込む。 [Comparison of effects between the noise convolution circuit 100 and the convolution circuit 30 of the comparative example]
FIG. 15 is a diagram showing a comparison of effects of the noise convolution arithmetic circuit 100 and the convolution arithmetic circuit 30 of the comparative example. FIG. 15(a) is a diagram for explaining an input image and convolution when noise is not added by the convolution operation circuit 30 of the comparative example, and FIG. 15(b) is a diagram for adding noise by the noise convolution operation circuit 100. FIG. 10 is a diagram for explaining an input image and 1×1 convolution in a case;
As indicated by symbol g in FIG. 15A, the convolution circuit 30 of the comparative example convolves weights close to noise using a large kernel such as 3×3.

図１５（ｂ）に示すように、雑音畳込み演算回路１００は、加算回路１２０により１×１カーネルの値Ｘに、雑音生成回路１１０（図示省略）からの雑音を直接乗せる。図１５（ｂ）の符号ｈに示すように、入力画像に雑音が加えられている。雑音畳込み演算回路１００は、雑音を入力画像に直接乗せて、１×１畳込みで補正する。 As shown in FIG. 15B, the noise convolution operation circuit 100 directly adds noise from the noise generation circuit 110 (not shown) to the value X of the 1×1 kernel by the adder circuit 120 . As indicated by symbol h in FIG. 15(b), noise is added to the input image. The noise convolution arithmetic circuit 100 applies noise directly to the input image and corrects it by 1×1 convolution.

雑音を加えなくてもよい条件は、雑音の代わりになる３×３などの大きなカーネルを用いた既存畳込みと組み合わせる場合になる。 A condition where noise does not need to be added is when combining with existing convolution using a large kernel such as 3×3 instead of noise.

［雑音畳込み演算回路１００の特徴］
雑音畳込み演算回路１００の特徴について述べる。
雑音畳込み層と比較例の畳込み層（既存畳込み層）とが等価（統計的に等価）になる前提条件は、前述したように、入力データが、
Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２
ただし、
ｘ_ｉ：畳込みが行われる入力テンソル内の各画素
Ｅ（ｘ_ｉ）：期待値
σ^２：分散
を満たすときである。 [Features of noise convolution arithmetic circuit 100]
Features of the noise convolution arithmetic circuit 100 will be described.
The precondition that the noise convolutional layer and the convolutional layer of the comparative example (existing convolutional layer) are equivalent (statistically equivalent) is, as described above, that the input data is
E(x _i )=0, E(x _i ² )=σ ²
however,
x _i : each pixel in the input tensor to be convolved E(x _i ): expected value σ ² : when the variance is satisfied.

入力データが、
Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２
を満たすには、ＣＮＮにおいて少なくとも１回は畳込みが実行された後の畳込み層であることを確認した。 input data is
E(x _i )=0, E(x _i ² )=σ ²
To satisfy , we ensured that the convolutional layer was after convolution was performed at least once in the CNN.

図１６は、雑音畳込み演算回路１００の特徴を説明する図であり、図１６（ａ）は入力画像の各画素の値とその度数分布を示すヒストグラム、図１６（ｂ）はＣＮＮの１層目の畳込み層の出力の各画素の値とその度数分布を示すヒストグラムである。
図１６（ａ）に示すように、入力画像では、画像の各画素の値の分布が偏る。このため、上記畳込み層が等価となる前提条件：Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２を満たさず、入力画像については雑音畳込みは適用できない。 16A and 16B are diagrams for explaining the features of the noise convolution arithmetic circuit 100. FIG. 16A is a histogram showing the value of each pixel of the input image and its frequency distribution, and FIG. 10 is a histogram showing the value of each pixel in the output of the eye convolution layer and its frequency distribution;
As shown in FIG. 16A, in the input image, the distribution of the values of the pixels of the image is biased. Therefore, the preconditions for equivalent convolutional layers: E(x _i )=0, E(x _i ² )=σ ² are not satisfied, and noise convolution cannot be applied to the input image.

図１６（ｂ）に示すように、ＣＮＮの１層目の畳込み層の出力は、「雑音のヒストグラム」となる。このため、上記前提条件：Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２を満たすので、雑音畳込みを適用できる。
つまり、ＣＮＮの１層以上は、比較例の畳込み層（既存畳込み層）が必要である。 As shown in FIG. 16(b), the output of the first convolutional layer of the CNN is a "noise histogram". Therefore, since the above preconditions: E(x _i )=0, E(x _i ² )=σ ² are satisfied, noise convolution can be applied.
That is, one or more layers of the CNN require the convolutional layer (existing convolutional layer) of the comparative example.

［入力画像に適用した場合の雑音畳込みＣＮＮと既存畳込みＣＮＮの比較］
雑音畳込み演算回路１００を入力画像に適用した場合のＣＮＮ（以下、入力画像に適用した場合の雑音畳込みＣＮＮという）と比較例の畳込み演算回路３０を用いたＣＮＮ（以下、既存畳込みＣＮＮという）の比較について述べる。 [Comparison of noise convolution CNN and existing convolution CNN when applied to input image]
CNN when the noise convolution circuit 100 is applied to the input image (hereinafter referred to as noise convolution CNN when applied to the input image) and CNN using the convolution circuit 30 of the comparative example (hereinafter, existing convolution CNN) will be described.

図１７は、入力画像に適用した場合の雑音畳込みＣＮＮと既存畳込みＣＮＮのVGG16モデルを用いて画像認識タスクCIFAR-10およびCIFAR-100を学習させた結果の認識精度を説明する図である。
入力画像に適用した場合の雑音畳込みＣＮＮと既存畳込みＣＮＮの効果を確認するため、VGG16（隠れ層が１６層）ベンチマークＮＮを実装し、学習が成功するか確認した。VGG16は、良く使われているベンチマークで再現性があるものである。 FIG. 17 is a diagram for explaining the recognition accuracy of the result of learning the image recognition tasks CIFAR-10 and CIFAR-100 using the VGG16 model of the noise convolution CNN and the existing convolution CNN when applied to the input image. .
In order to confirm the effects of noise convolution CNN and existing convolution CNN when applied to input images, we implemented a VGG16 (16 hidden layers) benchmark NN and confirmed whether learning was successful. VGG16 is a popular and reproducible benchmark.

図１７に示すように、入力画像に適用した場合の雑音畳込みＣＮＮは、既存畳込みＣＮＮに大きく劣った認識精度を示している。すなわち、雑音畳込みＣＮＮは、上記前提条件：Ｅ（ｘ_ｉ）＝０、Ｅ（ｘ_ｉ ^２）＝σ^２を満たさない、入力画像に適用した場合には、既存畳込みＣＮＮに比べ大きく劣った認識精度となることが検証できた。 As shown in FIG. 17, the noise convolutional CNN when applied to the input image exhibits recognition accuracy significantly inferior to the existing convolutional CNN. That is, the noise convolutional CNN is significantly inferior to the existing convolutional CNN when applied to an input image that does not satisfy the above preconditions: E(x _i )=0, E(x _i ² )= ^σ2 . It was verified that the recognition accuracy was improved.

［雑音畳込みＣＮＮの構成］
上記検証結果をもとに、雑音畳込みＣＮＮを構成する場合、ＣＮＮの１層以上は、比較例の畳込み層（既存畳込み層）を備える必要がある。 [Configuration of noise convolution CNN]
Based on the above verification results, when constructing a noise convolutional CNN, one or more layers of the CNN must be provided with the convolutional layer of the comparative example (existing convolutional layer).

図１８は、雑音畳込みＣＮＮの構成を示すブロック図である。畳込み層が１３層の場合を例にとる。なお、畳込み層に組み合わされる全結合層は、３層であるとする。図１８中のｋは畳込み層の数である（図１８中のｋは、カーネルではない）。
雑音畳込みＣＮＮは、既存畳込み層と雑音畳込み層とから構成される。既存畳込み層は、例えば３×３畳込み層である。雑音畳込み層は、雑音を乗せた１×１畳込み層である。
図１８に示すように、雑音畳込みＣＮＮは、ｋ層である３×３畳込み層と、１３－ｋ層である雑音畳込み層と、３層の全結合層と、から構成される。
ｋの値は、認識精度と重みの数のトレードオフを考えて選択される。以下、ｋの値を変えた場合の実験結果について説明する。 FIG. 18 is a block diagram showing the configuration of the noise convolution CNN. Take the case of 13 convolution layers as an example. It is assumed that the number of fully connected layers combined with the convolutional layer is three. k in FIG. 18 is the number of convolutional layers (k in FIG. 18 is not the kernel).
A noise convolutional CNN consists of an existing convolutional layer and a noise convolutional layer. The existing convolutional layer is, for example, a 3x3 convolutional layer. A noise convolutional layer is a 1×1 convolutional layer with noise on it.
As shown in FIG. 18, the noise convolutional CNN is composed of k layers of 3×3 convolution layers, 13-k noise convolution layers, and three fully connected layers.
The value of k is chosen considering the trade-off between recognition accuracy and number of weights. Experimental results when the value of k is changed will be described below.

［雑音畳込みＣＮＮ層数と認識率］
図１９は、畳込みＣＮＮ層数ｋと、認識精度(%)、ratioおよび全層の重み(MB)との実験結果を表にして示す図である。全層の重み(MB)は、メモリ量に対応し、パラメータ（全層の重み）が大きい程、使用するメモリ量は多くなる。
図１９は、雑音畳込みＣＮＮのみ（図１９実線囲み参照）、「ＣＮＮ１層＋雑音畳込みＣＮＮ１２層（図１９一点鎖線囲み参照）」、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層（図１９二点鎖線囲み参照）」、「既存畳込みＣＮＮのみ（図１９破線囲み参照）」、それぞれの認識精度(%)、ratioおよび全層の重み(MB)を表している。 [Number of noise convolution CNN layers and recognition rate]
FIG. 19 is a table showing experimental results of the number of convolutional CNN layers k, recognition accuracy (%), ratio, and weight of all layers (MB). The weight of all layers (MB) corresponds to the amount of memory, and the larger the parameter (weight of all layers), the larger the amount of memory used.
FIG. 19 shows only noise convolution CNN (see FIG. 19 surrounded by solid line), “CNN 1 layer + noise convolution CNN 12 layer (see FIG. 19 surrounded by dashed line)”, “CNN k layer + noise convolution CNN (13-k) layer (see double-dot chain line in FIG. 19)”, “existing convolutional CNN only (see dashed line in FIG. 19)”, recognition accuracy (%), ratio, and weight of all layers (MB).

図１９に示すように、認識精度(%)は、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」の６３．８～７１．５％と、「既存畳込みＣＮＮのみ」の７０．１％とが高いことが分かる。一方、全層の重み(MB)は、「雑音畳込みＣＮＮのみ」の８．８、「ＣＮＮ１層＋雑音畳込みＣＮＮ１２層」の８．８、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」の畳込みＣＮＮ層数ｋ＝２～７の１５．０以下の場合に小さいことが分かる。認識精度(%)と全層の重み(MB)を双方考慮すると、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」の畳込みＣＮＮ層数ｋ＝４～６の場合に、認識精度(%)が向上し、全層の重み(MB)が小さい。 As shown in FIG. 19, the recognition accuracy (%) is 63.8 to 71.5% for "CNNk layer + noise convolutional CNN (13-k) layer" and 70% for "existing convolutional CNN only". It can be seen that 1% is high. On the other hand, the weights (MB) of all layers are 8.8 for "noise convolution CNN only", 8.8 for "CNN 1 layer + noise convolution CNN 12 layers", and "CNN k layer + noise convolution CNN (13-k ) layer” is small when the number of convolutional CNN layers k=2 to 7 is 15.0 or less. Considering both the recognition accuracy (%) and the weight of all layers (MB), the recognition accuracy is (%) is improved and all layer weights (MB) are small.

特に、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」の畳込みＣＮＮ層数ｋ＝６の場合、認識精度(%)は７０．５％と高く、かつ、全層の重み(MB)は１２．９と小さい。「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」の畳込みＣＮＮ層数ｋ＝６の場合と、「既存畳込みＣＮＮのみ」とを比較すると、認識精度(%)はほぼ同じ（７０．５％と７０．１％）でありながら、全層の重み(MB)は、格段に削減（１２．９と６１．１）される。すなわち、ｋ＝６のとき、認識精度が向上し、パラメータ（全層の重み）を約８０％削減することが確認できた。 In particular, when the number of convolutional CNN layers k = 6 in the “CNNk layer + noise convolutional CNN (13-k) layer”, the recognition accuracy (%) is as high as 70.5%, and the weight of all layers (MB ) is as small as 12.9. Comparing the case of “CNN k layer + noise convolution CNN (13-k) layer” with the number of convolutional CNN layers k = 6 and “existing convolutional CNN only”, the recognition accuracy (%) is almost the same (70 .5% and 70.1%), while the total layer weight (MB) is significantly reduced (12.9 and 61.1). That is, when k=6, it was confirmed that the recognition accuracy was improved and the parameters (weights of all layers) were reduced by about 80%.

［雑音畳込みＣＮＮの学習時間と推論時間］
図２０は、雑音畳込みＣＮＮの学習時間と推論時間を表にして示す図である。比較のため、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」と「雑音畳込みＣＮＮ」のみと「既存畳込みＣＮＮ」のみとの、１ｅｐｏｃｈ毎の学習時間(s)と推論時間(s)を示している。ＧＰＵを使用し、ｂａｔｃｈｓｉｚｅ：１０、学習のデータ数：５００００、推論のデータ数：１００００とした。 [Learning time and inference time of noise convolution CNN]
FIG. 20 is a table showing the learning time and inference time of the noise convolution CNN. For comparison, the learning time (s) and inference time ( s). A GPU was used, batchsize: 10, number of learning data: 50,000, and number of inference data: 10,000.

図２０に示すように、雑音畳込みＣＮＮの構成において、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」(図１９参照)の構成を採ると、「雑音畳込みＣＮＮ」のみ場合と「既存畳込みＣＮＮ」のみ場合と比較して、１ｅｐｏｃｈ毎の学習時間(s)と推論時間(s)がいずれも短縮された。特に、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」と「既存畳込みＣＮＮ」のみとを比較すると、学習時間(s)が７０．５から３５．５に、推論時間(s)が４．５０から２．８１にいずれも約半分に短縮された。このように、雑音畳込みＣＮＮの構成を「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」とすることで、学習時間および推論時間が削減されることが検証できた。 As shown in FIG. 20, in the configuration of the noise convolution CNN, if the configuration of “CNN k layer + noise convolution CNN (13-k) layer” (see FIG. 19) is adopted, the “noise convolution CNN” only Both the learning time (s) and the inference time (s) for each epoch were shortened compared to the case of the "existing convolutional CNN" alone. In particular, when comparing only “CNN k layer + noise convolution CNN (13-k) layer” and “existing convolution CNN”, the learning time (s) decreased from 70.5 to 35.5, and the inference time (s) decreased from 70.5 to 35.5. was shortened by about half from 4.50 to 2.81. Thus, it was verified that the learning time and the inference time were reduced by setting the configuration of the noise convolution CNN to "CNN k layer + noise convolution CNN (13-k) layer".

［雑音畳込みＣＮＮの実装結果］
図２１は、本実施形態の雑音畳込み演算回路１００をFPGA(ZCU102ボード)上に実装し、畳込み演算（１層，５６×５６サイズ６４チャネル）を既存畳込みＣＮＮと比較した結果を表にして示す図である。
ＦＰＧＡ実装の比較は、本実施形態（「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」）、比較例の既存畳込みＣＮＮである。ＦＰＧＡ実装のハードウェアの量は、ＦＦ(flip-flop)数、ＬＵＴ数、18Kb BRAM 数、およびＤＳＰ（digital signal processor） 48E 数で示される。 [Results of implementation of noise convolution CNN]
FIG. 21 shows the result of implementing the noise convolution operation circuit 100 of this embodiment on an FPGA (ZCU102 board) and comparing the convolution operation (one layer, 56×56 size 64 channels) with the existing convolution CNN. It is a figure shown by making into.
Comparison of FPGA implementation is the present embodiment (“CNN k layer + noise convolution CNN (13-k) layer”) and the existing convolution CNN of the comparative example. The amount of hardware in the FPGA implementation is expressed in flip-flop (FF) numbers, LUT numbers, 18Kb BRAM numbers, and digital signal processor (DSP) 48E numbers.

図２１の表の用語は下記の通りである。
ＬＵＴ数は、FPGAのLUT（Look-Up Table）消費量であり、面積を意味する。
ＦＦ数は、論理ゲート数であり、面積を意味する。
BRAM 18Kb数は、FPGAの内部メモリブロックの消費量であり、面積を意味する。
DSP Block数は、FPGAの内部積和演算ブロックの消費量であり、面積を意味する。
レイテンシ[Cycles]は、データ転送待ち時間／外部にメモリを付けた場合の転送速度である。
動作周波数[MHz]は、演算部の動作処理時間(処理速度)である。
この表において、特に注目すべき事項は下記の通りである。 The terms in the table of FIG. 21 are as follows.
The number of LUTs is the LUT (Look-Up Table) consumption of the FPGA and means the area.
The number of FFs is the number of logic gates and means the area.
The BRAM 18Kb number is the consumption of the internal memory block of the FPGA and means the area.
The number of DSP Blocks is the amount consumed by the internal sum-of-products operation block of the FPGA, and means the area.
Latency [Cycles] is data transfer wait time/transfer speed when an external memory is attached.
The operating frequency [MHz] is the operating processing time (processing speed) of the arithmetic unit.
Of particular note in this table are the following:

<メモリ量>
本実施形態の雑音畳込みＣＮＮは、表の既存畳込みＣＮＮと比較して、メモリ量（BRAM18K）を、１７３７から１９３に約１／９に低減することができた。 <memory amount>
The noise convolution CNN of this embodiment was able to reduce the amount of memory (BRAM18K) from 1737 to 193, about 1/9 compared to the existing convolution CNN shown in the table.

<演算部の面積>
演算部の面積は、表のLUT数、FF数、DSP48E数で表される。LUT数は、２６０７５から７７８４に、FF数は、９４４６から６４６７に、DSP48E数は、５３から４３にそれぞれ削減される。LUT数、FF数、DSP48E数の削減は、いずれも演算部の面積の削減に寄与し、高速化を達成することができる。すなわち、演算部の面積削減は、チップ面積削減に直結して、外付けメモリが不要となることから、レイテンシ[Cycles]が既存畳込みＣＮＮの１８８９７９２７から雑音畳込みＣＮＮの２８５３１２７に約１／６．６に低減することができるとともに、動作周波数[MHz]を７７．６から１１８．７に約１．５倍に高速化して動作させことができ、高速化を達成することができる。また、雑音畳込みＣＮＮは、上記したメモリ量の削減に加えて、外付けメモリが不要となることから、メモリコントローラが単純になることなどの効果がある。チップ面積は価格に比例するので、価格も２桁程度安くなることが期待できる。 <Area of calculation part>
The area of the arithmetic unit is represented by the number of LUTs, FFs, and DSP48Es in the table. The number of LUTs is reduced from 26075 to 7784, the number of FFs from 9446 to 6467, and the number of DSP48Es from 53 to 43. Reducing the number of LUTs, FFs, and DSP48Es contributes to a reduction in the area of the arithmetic unit and achieves speedup. That is, the area reduction of the arithmetic unit is directly linked to the chip area reduction, and since the external memory becomes unnecessary, the latency [Cycles] is reduced from 18897927 of the existing convolution CNN to 2853127 of the noise convolution CNN, which is about 1/6. 0.6, and the operating frequency [MHz] can be increased from 77.6 to 118.7 by about 1.5 times, and high speed can be achieved. In addition to the reduction in the amount of memory described above, the noise convolution CNN eliminates the need for an external memory, and thus has the effect of simplifying the memory controller. Since the chip area is proportional to the price, it can be expected that the price will be reduced by about two digits.

<性能等価>
本実施形態の雑音畳込みＣＮＮは、前記図１９の雑音畳込みＣＮＮ層数と認識率の表に示すように、「ＣＮＮｋ層＋雑音畳込みＣＮＮ（１３－ｋ）層」構成を採ることで、既存畳込みＣＮＮと同等の認識率を達成することができる。 <performance equivalent>
The noise convolution CNN of the present embodiment adopts a "CNN k layer + noise convolution CNN (13-k) layer" configuration, as shown in the table of the number of noise convolution CNN layers and the recognition rate in FIG. , it can achieve the same recognition rate as the existing convolutional CNN.

本実施形態によれば、既存畳込みＣＮＮと比較して、重みを最大で３×３分の１に削減し、計算量も３×３分の１に削減しつつ、図１９に示したように認識精度はほぼ等価なＣＮＮを構成できることが判明した。ディープラーニングを用いたＡＤＡＳ（Advanced Driver Assistance System：先進運転支援システム）カメラ画像認識用のエッジ組み込み装置ハードウェア方式として実用化が期待される。特にＡＤＡＳでは、車載する上で高信頼性と低発熱が要求される。本実施形態に係る雑音畳込み演算回路１００は、外付けメモリが不要であるので、メモリを冷却する冷却ファンや冷却フィンも不要である。ＡＤＡＳカメラに搭載して好適である。 According to this embodiment, compared to the existing convolutional CNN, the weights are reduced to 3×3/1 at maximum, and the computational complexity is also reduced to 3×3/1, as shown in FIG. It was found that a CNN with almost the same recognition accuracy can be constructed. It is expected to be put to practical use as an edge-embedded device hardware method for ADAS (Advanced Driver Assistance System) camera image recognition using deep learning. ADAS, in particular, requires high reliability and low heat generation when mounted on a vehicle. Since the noise convolution arithmetic circuit 100 according to this embodiment does not require an external memory, it does not require a cooling fan or cooling fins for cooling the memory. It is suitable to be mounted on an ADAS camera.

［実装例］
図２２は、本発明の実施形態に係る雑音畳込みＣＮＮの実装例を説明する図である。
<STEP1>
まず、与えられたデータセット（今回はImageNet、画像認識タスク用にデータ）を既存のディープニューラルネットワーク用のフレームワークソフトウェアであるChainer （登録商標）を用いてＧＰＵ（Graphics Processing Unit）を有するコンピュータ２０１上で学習を行った。学習は、ＧＰＵ上で実行する。このコンピュータ２０１は、ＡＲＭプロセッサなどのＣＰＵ（Central Processing Unit）と、メモリと、ハードディスクなどの記憶手段（記憶部）と、ネットワークインタフェースを含むＩ／Ｏポートとを有する。このコンピュータは、ＣＰＵ２０１が、メモリ上に読み込んだプログラム（雑音畳込み演算の実行プログラム）を実行することにより、後記する各処理部により構成される制御部（制御手段）を動作させる。 [Example of implementation]
FIG. 22 is a diagram illustrating an implementation example of the noise convolution CNN according to the embodiment of the present invention.
<STEP1>
First, a computer 201 having a GPU (Graphics Processing Unit) uses Chainer (registered trademark), which is framework software for existing deep neural networks, to process a given data set (ImageNet this time, data for image recognition tasks). learned above. Learning is performed on the GPU. The computer 201 has a CPU (Central Processing Unit) such as an ARM processor, memory, storage means (storage section) such as a hard disk, and an I/O port including a network interface. In this computer, the CPU 201 executes a program (execution program for noise convolution calculation) read into the memory to operate a control section (control means) composed of each processing section described later.

<STEP2>
次に、自動生成ツールを用いて、本実施形態の雑音畳込み演算回路１００と等価なＣ++コードを自動生成し、Ｃ++コード２０２を得た。 <STEP2>
Next, an automatic generation tool was used to automatically generate a C++ code equivalent to the noise convolution arithmetic circuit 100 of this embodiment, and a C++ code 202 was obtained.

<STEP3>
次に、FPGA ベンダの高位合成ツール(Xilinx 社SDSoC) （登録商標）を用いて、ＦＰＧＡ（field-programmable gate array）合成用にＨＤＬ（hardware description language）２０３を生成した。例えば、高位合成ツール(Xilinx 社SDSoC)では、実現したい論理回路をハードウェア記述言語（Verilog HDL/VHDL）を用いて記述し、提供されたＣＡＤツールでビットストリームに合成する。そして、FPGAにこのビットストリームを送信するとFPGAに回路が実現する。 <STEP3>
Next, an FPGA vendor's high-level synthesis tool (Xilinx SDSoC) (registered trademark) was used to generate HDL (hardware description language) 203 for FPGA (field-programmable gate array) synthesis. For example, in a high-level synthesis tool (Xilinx SDSoC), a logic circuit to be implemented is described using a hardware description language (Verilog HDL/VHDL) and synthesized into a bitstream using a provided CAD tool. Then, when this bitstream is sent to the FPGA, the circuit is realized in the FPGA.

<STEP4>
次に、従来のＦＰＧＡ合成ツールVivado （登録商標）を用いて、ＦＰＧＡ上に実現（ＦＰＧＡ合成２０４）して画像認識タスクの検証を行った。 <STEP4>
Next, using the conventional FPGA synthesis tool Vivado (registered trademark), the image recognition task was verified by implementing it on the FPGA (FPGA synthesis 204).

<STEP5>
検証後、基板２０５を完成させた。基板２０５には、雑音畳込み演算回路１００がハードウェア化されて実装されている。 <STEP5>
After verification, the substrate 205 was completed. The noise convolution arithmetic circuit 100 is implemented as hardware on the substrate 205 .

以上説明したように、本実施形態に係るニューラルネットワーク回路装置５０は、中間層が、雑音畳込み演算を実行する雑音畳込み演算回路１００を備え、雑音畳込み演算回路１００は、雑音（ノイズ）ｎ_ｃを生成する雑音生成回路１１０と、１×１畳込みを実行するための入力値Ｘを受け取り、入力値Ｘに雑音生成回路１１０で生成された雑音ｎ_ｃを加算する加算回路１２０と、重みＷを受け取り、雑音を乗せた加算回路１２０の雑音畳込み演算値と重みＷを乗算する乗算回路１３０と、を備える。 As described above, in the neural network circuit device 50 according to the present embodiment, the intermediate layer includes the noise convolution arithmetic circuit 100 that performs noise convolution arithmetic. a noise generator circuit 110 that generates _nc ; a summing circuit 120 that receives an input value X for performing a 1×1 convolution and adds the noise _nc generated by the noise generator circuit 110 to the input value X; a multiplier circuit 130 which receives the weight W and multiplies the weight W by the noise convoluted value of the adder circuit 120 carrying noise.

雑音（ノイズ）ｎ_ｃは、
期待値：Ｅ（ｎ_ｃ）＝Ｅ（Σ_Ｎｃε_ｉＷ_ｉ’）＝０、かつ、
分散：Ｅ（ｎ_ｃ ^２）＝Ｅ（Σ_Ｎｃε_ｉＷ_ｉ’）^２＝２σ^２δ’
となる雑音ｎ_ｃを用いる場合、
比較例の畳込み演算回路３０による畳込み演算の出力ｙ’と、雑音畳込み演算回路１００による雑音畳込み演算による出力ｙ’とは、前記式（５）で示され、統計的に等価である。 The noise (noise) n _c is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
The output y′ of the convolution operation by the convolution operation circuit 30 of the comparative example and the output y′ of the noise convolution operation by the noise convolution operation circuit 100 are expressed by the above equation (5) and are statistically equivalent. be.

また、本実施形態に係るニューラルネットワーク処理方法では、中間層は、雑音畳込み演算を実行する雑音畳込み演算ステップを有し、雑音畳込み演算ステップでは、畳込みを行う入力値Ｘと、雑音を受け取り、入力値Ｘに雑音を乗せる演算ステップと、重みＷを受け取り、演算回路の雑音畳込み演算値と重みＷを乗算する乗算ステップと、を実行する。 Further, in the neural network processing method according to the present embodiment, the intermediate layer has a noise convolution operation step for executing a noise convolution operation. , a calculation step of adding noise to the input value X, and a multiplication step of receiving a weight W and multiplying the noise convolution calculation value of the calculation circuit by the weight W are executed.

これにより、ＣＮＮにおいて必要となる膨大な数の畳込み計算の一部を、乱数に重みを乗せて加算（統計的なふるまいは雑音）することで代用する。その結果、畳込み計算の時間短縮、重みマトリクスを乱数発生器で代用することによるメモリの削減が可能となる。 As a result, some of the enormous number of convolutional calculations required in CNN are substituted by adding weighted random numbers (statistical behavior is noise). As a result, it is possible to shorten the convolution calculation time and reduce the memory by substituting the random number generator for the weight matrix.

このように、雑音畳込み演算回路１００は、３×３重みの代わりに雑音を乗せて１×１畳込みを行う（図１４および図１５参照）。これにより、重みを最大で３×３分の１に削減し、計算量も３×３分の１に削減することができる。その結果、計算量とメモリ量を削減することができる。また、計算量とメモリ量を削減したにもかかわらず、認識精度の低下はない（図１９の表参照）。計算量とメモリ量を削減できるので、組込み機器へ適用することができる。 In this way, the noise convolution arithmetic circuit 100 performs 1×1 convolution by adding noise instead of 3×3 weights (see FIGS. 14 and 15). As a result, the weight can be reduced to 3×3/1 at maximum, and the computational complexity can also be reduced to 3×3/1/3. As a result, the amount of calculation and memory can be reduced. Moreover, even though the amount of calculation and the amount of memory are reduced, there is no decrease in recognition accuracy (see the table in FIG. 19). Since the amount of calculation and memory can be reduced, it can be applied to embedded devices.

本実施形態では、畳込み演算回数とパラメータ量（重みマトリクスを格納するメモリ量）を共に削減することができた。予備実験では、畳込み演算と同等の認識精度を達成できている。また、雑音畳込み演算回路１００は、比較例の畳込み演算回路３０と比較して、演算時間を約４０％削減することに成功している。雑音という回路にとって邪魔な成分を畳込み演算に応用した点に特徴がある。 In this embodiment, both the number of convolution operations and the amount of parameters (the amount of memory for storing the weight matrix) can be reduced. Preliminary experiments have achieved recognition accuracy equivalent to that of convolution operations. Moreover, the noise convolution arithmetic circuit 100 has succeeded in reducing the arithmetic time by about 40% compared to the convolution arithmetic circuit 30 of the comparative example. It is characterized by applying noise, which is an obstacle to circuits, to the convolution operation.

本実施形態では、統計的な解析により、雑音畳込み演算回路１００の構成および作用効果を明らかにした。上述したように、本発明は、理論的な解析から、雑音畳込み層の入力統計的に偏りがない仮定を満たすときのみ成立する（図１６参照）。したがって、本実施形態に係るニューラルネットワーク回路装置５０は、中間層において、少なくとも第１層が通常の畳込み層（既存の畳込み層）で構成される。すなわち、本実施形態に係るニューラルネットワーク回路装置５０は、通常の畳込み層と雑音畳込み層との組み合わせが必須となる。 In this embodiment, the configuration and effects of the noise convolution arithmetic circuit 100 are clarified through statistical analysis. As described above, the present invention holds from theoretical analysis only when the input statistically unbiased assumption of the noise convolution layer is satisfied (see FIG. 16). Therefore, in the neural network circuit device 50 according to the present embodiment, at least the first layer in the intermediate layer is composed of normal convolutional layers (existing convolutional layers). That is, the neural network circuit device 50 according to the present embodiment requires a combination of a normal convolutional layer and a noise convolutional layer.

かかる観点から、本実施形態のニューラルネットワーク回路装置５０は、中間層が、ｎ層を有し、ｋ層において、畳込み演算を実行する比較例の畳込み演算回路３０と、（ｎ－ｋ）層において、雑音畳込み演算を実行する雑音畳込み演算回路１００と、を備える。 From this point of view, the neural network circuit device 50 of the present embodiment includes the convolution operation circuit 30 of the comparative example in which the intermediate layer has n layers and the k layers perform convolution operations, and (n−k) a noise convolution circuit 100 for performing noise convolution in the layer.

また、本実施形態に係るニューラルネットワーク処理方法では、中間層は、ｎ（ｎは任意の自然数）層を有し、ｋ（ｋは、ｎ－ｋを満たす自然数）層において、畳込み演算を実行する畳込み演算ステップと、（ｎ－ｋ）層において、雑音畳込み演算を実行する雑音畳込み演算ステップと、を有し、雑音畳込み演算ステップでは、畳込みを行う入力値Ｘと、雑音を受け取り、入力値Ｘに雑音を乗せる演算ステップと、重みＷを受け取り、演算回路の雑音畳込み演算値と重みＷを乗算する乗算ステップと、を実行する。 Further, in the neural network processing method according to the present embodiment, the intermediate layer has n layers (n is any natural number), and k layers (k is a natural number that satisfies nk) execute convolution operations. and a noise convolution operation step for executing noise convolution operation in the (nk) layer, wherein the noise convolution operation step includes an input value X to be convoluted and noise , a calculation step of adding noise to the input value X, and a multiplication step of receiving a weight W and multiplying the noise convolution calculation value of the calculation circuit by the weight W are executed.

これにより、図１９の表に示すように、１３層のＣＮＮ（図１８参照）において、ｋ層を既存ＣＮＮ（（図２６～図２０に示す比較例の畳込み演算回路３０）、残りを雑音畳込み演算回路１００（図２，図５～図１３参照）で実行した場合に、ｋ＝６以上で全て既存ＣＮＮの場合と比べて認識精度を落とさずに重みのメモリ（回路規模）を削減することができている。特に、ｋ＝６の場合には、図１９の表に示すように、計算時間が約１／２になっている。
以上のように、認識精度をほとんど落とすことなく回路と計算時間の削減が可能となることが確認できた。 As a result, as shown in the table of FIG. 19, in the 13-layer CNN (see FIG. 18), the k layer is the existing CNN ((convolution operation circuit 30 of the comparative example shown in FIGS. 26 to 20), and the rest is noise. When executed by the convolution operation circuit 100 (see FIGS. 2 and 5 to 13), the weight memory (circuit scale) is reduced without reducing the recognition accuracy compared to the existing CNN with k=6 or more. In particular, when k=6, the computation time is reduced to about 1/2, as shown in the table of FIG.
As described above, it has been confirmed that it is possible to reduce the circuit and calculation time without degrading the recognition accuracy.

本発明は上記の実施形態例に限定されるものではなく、特許請求の範囲に記載した本発明の要旨を逸脱しない限りにおいて、他の変形例、応用例を含む。
例えば、乗算回路としての論理ゲートに代えて、ＬＵＴ（Look-Up Table）を用いてもよい。ＬＵＴは、ＦＰＧＡの基本構成要素であり、ＦＰＧＡ合成の際の親和性が高く、ＦＰＧＡによる実装が容易である。 The present invention is not limited to the above-described embodiments, and includes other modifications and applications without departing from the gist of the present invention described in the claims.
For example, a LUT (Look-Up Table) may be used instead of a logic gate as a multiplication circuit. A LUT is a basic building block of an FPGA, has a high affinity for FPGA synthesis, and is easy to implement with an FPGA.

また、図１８および図１９に示す畳込みＣＮＮ層数ｋや、既存ＣＮＮと雑音畳込みＣＮＮの組み合わせ方法、組み合わせ順序、組み合わせに伴う各パラメータはどのようなものでもよい。 Moreover, the convolutional CNN layer number k shown in FIGS. 18 and 19, the method of combining the existing CNN and the noise convolutional CNN, the order of combination, and each parameter associated with the combination may be arbitrary.

また、上記した実施形態例は本発明をわかりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施形態例の構成の一部を他の実施形態例の構成に置き換えることが可能であり、また、ある実施形態例の構成に他の実施形態例の構成を加えることも可能である。また、実施形態例は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形例は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Moreover, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to those having all the described configurations. Further, it is possible to replace part of the configuration of one embodiment with the configuration of another embodiment, or to add the configuration of another embodiment to the configuration of one embodiment. . In addition, the embodiment can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上述文書中や図面中に示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, among the processes described in the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed manually. All or part of this can also be done automatically by known methods. In addition, information including processing procedures, control procedures, specific names, and various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.
Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行するためのソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、または、ＩＣ（Integrated Circuit）カード、ＳＤ（Secure Digital）カード、光ディスク等の記録媒体に保持することができる。
また、上記実施の形態では、装置は、ニューラルネットワーク回路装置という名称を用いたが、これは説明の便宜上であり、名称はディープニューラルネットワーク回路、ニューラルネットワーク装置、パーセプトロン等であってもよい。また、方法およびプログラムは、ニューラルネットワーク処理方法という名称を用いたが、ニューラルネットワーク演算方法、ニューラルネットプログラム等であってもよい。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing them in an integrated circuit. Further, each configuration, function, etc. described above may be realized by software for a processor to interpret and execute a program for realizing each function. Information such as programs, tables, files, etc. that realize each function can be stored in recording devices such as memory, hard disks, SSD (Solid State Drives), IC (Integrated Circuit) cards, SD (Secure Digital) cards, optical discs, etc. It can be held on a recording medium.
In the above embodiments, the device is called a neural network circuit device, but this is for the convenience of explanation, and the names may be deep neural network circuit, neural network device, perceptron, or the like. Moreover, although the method and program are named "neural network processing method", they may also be called "neural network operation method", "neural network program", or the like.

１ディープニューラルネットワーク
２ニューラルネットワーク回路
１１入力層
１２隠れ層（中間層）
１３出力層
３０比較例の畳込み演算回路（畳込み演算回路手段）
５０ニューラルネットワーク回路
６０ＣＰＵ
７０オフチップメモリ
８０システムバス
１００雑音畳込み演算回路（雑音畳込み演算回路手段）
１１０雑音生成回路
１２０加算回路（演算回路，演算手段）
１３０乗算回路（乗算手段）
１３１重みメモリ
１４０加算回路
１５０活性化関数回路
ｎ_ｃ雑音（ノイズ）
Ｘ入力値
Ｗ重み 1 deep neural network 2 neural network circuit 11 input layer 12 hidden layer (intermediate layer)
13 Output layer 30 Convolutional operation circuit of comparative example (convolutional operation circuit means)
50 neural network circuit 60 CPU
70 off-chip memory 80 system bus 100 noise convolution arithmetic circuit (noise convolution arithmetic circuit means)
110 noise generation circuit 120 addition circuit (arithmetic circuit, arithmetic means)
130 Multiplication circuit (multiplication means)
131 weight memory 140 adder circuit 150 activation function circuit n _c noise (noise)
X input value W weight

Claims

A neural network circuit device comprising at least an input layer, one or more intermediate layers, and an output layer,
The intermediate layer includes a convolution circuit that performs a convolution operation;
a noise convolution operation circuit that performs noise convolution operation ,
The noise convolution arithmetic circuit,
an input value X to be convolved, an arithmetic circuit that receives noise and adds the noise to the input value X;
a multiplier circuit that receives a weight W and multiplies the weight W by a noise convolution calculation value of the arithmetic circuit ;
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y′ of the convolution operation by the convolution operation circuit;
The output y' by the noise convolution operation by the noise convolution operation circuit is represented by the following formula and is statistically equivalent

A neural network circuit device characterized by:

A neural network circuit device comprising at least an input layer, one or more intermediate layers, and an output layer,
The intermediate layer has n (n is any natural number) layers,
A convolution circuit that performs a convolution operation in k layers (k is a natural number that satisfies nk);
a noise convolution circuit that performs noise convolution computation in the (nk) layer;
The noise convolution arithmetic circuit,
an input value X to be convolved, an arithmetic circuit that receives noise and adds the noise to the input value X;
a multiplier circuit that receives a weight W and multiplies the weight W by a noise convolution calculation value of the arithmetic circuit ;
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y′ of the convolution operation by the convolution operation circuit;
The output y' by the noise convolution operation by the noise convolution operation circuit is represented by the following formula and is statistically equivalent

A neural network circuit device characterized by:

The noise convolution arithmetic circuit,
3. The noise according to claim 1 or claim 2 , wherein the noise that makes the mean and variance of the calculation result by the convolution operation and the mean and variance of the calculation result by the noise convolution operation match, respectively. neural network circuit device.

The weights used in the noise convolution circuit are derived according to the following equation based on the noise

3. The neural network circuit device according to claim 1, wherein:

The arithmetic circuit is
3. The neural network circuit device according to claim 1, wherein the adder circuit adds the noise to the input value X.

The noise convolution arithmetic circuit,
5. The neural network circuit device according to claim 1 , further comprising a noise generation circuit that generates the noise.

7. The neural network circuit device according to claim 6 , wherein the noise is a pseudo-random number generated by a random number generator including XOR-Shift, LFSR (linear feedback shift register), and linear congruential method.

The noise convolution arithmetic circuit,
3. The neural network circuit device according to claim 1, wherein a 1*1 convolution operation using a 1*1 kernel is executed.

The noise convolution arithmetic circuit,
characterized by comprising a summation circuit that sums each convoluted operation value and a bias W0, and an activation function circuit that converts the summed signal Y by an activation function f(u). 3. The neural network circuit device according to claim 1 or claim 2.

A neural network processing method including at least an input layer, one or more intermediate layers, and an output layer,
The intermediate layer includes a convolution operation step for performing a convolution operation;
a noise convolution operation step of performing a noise convolution operation;
In the noise convolution operation step,
an input value X to be convoluted, a computing step of receiving noise and multiplying the input value X with the noise;
a multiplication step of receiving a weight W and multiplying the weight W by the noise convoluted value of the operation step ;
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y' of the convolution operation by the convolution operation step;
The output y' by the noise convolution operation in the noise convolution operation step is represented by the following formula and is statistically equivalent

A neural network processing method characterized by:

A neural network processing method including at least an input layer, one or more intermediate layers, and an output layer,
The intermediate layer has n (n is any natural number) layers,
A convolution operation step of performing a convolution operation in k (k is a natural number satisfying nk) layers;
a noise convolution operation step of performing a noise convolution operation in the (nk) layer;
In the noise convolution operation step,
an input value X to be convoluted, a computing step of receiving noise and multiplying the input value X with the noise;
a multiplication step of receiving a weight W and multiplying the weight W by the noise convoluted value of the operation step ;
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y' of the convolution operation by the convolution operation step;
The output y' by the noise convolution operation in the noise convolution operation step is represented by the following formula and is statistically equivalent

A neural network processing method characterized by:

A computer as a neural network circuit device including at least an input layer, one or more intermediate layers, and an output layer,
the intermediate layer includes convolution circuit means for performing a convolution operation;
noise convolution circuit means for performing noise convolution computation;
The noise convolution arithmetic circuit means is
Arithmetic circuit means for receiving an input value X to be convoluted and noise and multiplying the input value X with the noise;
Multiplication circuit means for receiving the weight W and multiplying the weight W by the noise convoluted value of the arithmetic circuit means;
function as
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y' of the convolution operation by the convolution operation step;
The output y' by the noise convolution operation in the noise convolution operation step is represented by the following formula and is statistically equivalent

Neural network execution program.

A computer as a neural network circuit device including at least an input layer, one or more intermediate layers, and an output layer,
The intermediate layer has n (n is any natural number) layers,
convolution circuit means for executing a convolution operation in k layers (k is a natural number satisfying nk);
noise convolution circuit means for performing noise convolution computation in the (nk) layer;
The noise convolution arithmetic circuit means is
Arithmetic circuit means for receiving an input value X to be convoluted and noise and multiplying the input value X with the noise;
Multiplication circuit means for receiving the weight W and multiplying the weight W by the noise convoluted value of the arithmetic circuit means;
function as
The noise is
Expected value: E(n _c )=E(Σ _Nc ε _i W _i ′)=0, and
Variance: E(n _c ² )=E(Σ _Nc ε _i W _i ') ² = 2σ ² δ'
When using a noise n _c such that
Output y' of the convolution operation by the convolution operation step;
The output y' by the noise convolution operation in the noise convolution operation step is represented by the following formula and is statistically equivalent

Neural network execution program.