JP2020021208A

JP2020021208A - Neural network processor, neural network processing method, and program

Info

Publication number: JP2020021208A
Application number: JP2018143466A
Authority: JP
Inventors: 松本　真人; Masato Matsumoto; 真人松本; 康史石尾; Yasushi Ishio
Original assignee: A Sum Tech LLC; MegaChips Corp
Current assignee: A Sum Tech LLC; MegaChips Corp
Priority date: 2018-07-31
Filing date: 2018-07-31
Publication date: 2020-02-06
Anticipated expiration: 2038-07-31
Also published as: JP7033507B2; WO2020026475A1

Abstract

To realize a neural network processor capable of mounting a high-performance compact model in a low-specification device such as an embedded device or a mobile device without requiring learning.SOLUTION: The neural network processor configured to make a part common where similar processing is executed in a process of a convolution layer and a process of a fully connected layer, executes a combining process of a norm mode and an inner product operation mode, so that it can execute the process of a convolution layer and the process of a fully connected layer. Thus, the neural network processor can execute multi-valued neural network processing at high speed while being configured to suppress an increase in hardware scale.SELECTED DRAWING: Figure 1

Description

本発明は、ニューラルネットワークの技術に関する。 The present invention relates to neural network technology.

近年、ニューラルネットワーク技術の１つである、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いた多様な技術が開発されている（例えば、特許文献１を参照）。ＣＮＮの中でも、中間層を多く設けたＤＣＮＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いた技術が、多様な分野で成果を上げているため、特に注目を集めている。 In recent years, various techniques using CNN (Convolutional Neural Network), which is one of neural network techniques, have been developed (for example, see Patent Document 1). Among CNNs, a technology using DCNN (Deep Convolutional Neural Network) having a large number of intermediate layers has attracted particular attention because it has achieved results in various fields.

特開２０１５−１９７７０２号公報JP 2015-197702 A

ＤＣＮＮは、一般物体認識やセマンティックセグメンテーション等の様々なタスクにおいて高い認識性能を実現している。その一方で、ＤＣＮＮは、処理を実行するために必要な計算量とパラメータ数が非常に多いため、処理を実行するときに、膨大な処理時間と多大なメモリ量が必要となる。 DCNN achieves high recognition performance in various tasks such as general object recognition and semantic segmentation. On the other hand, the DCNN requires an extremely large amount of calculation and a large number of parameters to execute the processing, and therefore requires a huge processing time and a large memory when executing the processing.

また、ＤＣＮＮでは、層をより深くすることで認識精度が向上する傾向が見られ、これに伴い識別時間（処理時間）に加えモデルサイズも増加するという問題が発生する。組み込み機器やモバイル機器等の低スペックのデバイスでＤＣＮＮを使用するには、識別計算の高速化とモデルサイズの圧縮が大きな課題となる。 In the DCNN, the recognition accuracy tends to be improved by making the layer deeper, which causes a problem that the model size increases in addition to the identification time (processing time). In order to use DCNN in a low-spec device such as an embedded device or a mobile device, high-speed identification calculation and compression of a model size are major issues.

つまり、組み込み機器やモバイル機器等の低スペックのデバイス（例えば、エッジ端末）において、大規模システムで学習し取得した学習済みモデルをそのまま搭載することは困難であり、低スペックのデバイスにおいて、コンパクト化したモデルを構築する必要がある。 That is, it is difficult to directly mount a learned model acquired by learning in a large-scale system on a low-spec device (for example, an edge terminal) such as an embedded device or a mobile device. It is necessary to build a model that has been developed.

組み込み機器やモバイル機器等の低スペックのデバイス（例えば、エッジ端末）に、大規模システムで学習し取得した学習済みモデルを搭載するためには、低スペックのデバイスにおいて、当該学習済みモデルのコンパクト化したモデルを構築し、当該学習済みモデルに用いた学習用データを用いて、再度、コンパクト化したモデルにおいて、学習させる必要がある（この学習を「再学習」という）。 In order to mount a learned model learned and acquired by a large-scale system on a low-spec device such as an embedded device or a mobile device (for example, an edge terminal), the size of the learned model must be reduced in the low-spec device. It is necessary to construct a model that has been trained, and use the learning data used for the trained model to learn again in the compact model (this learning is referred to as “re-learning”).

つまり、組み込み機器やモバイル機器等の低スペックのデバイス（例えば、エッジ端末）で、大規模システムで学習し取得した学習済みモデルを搭載するためには、再学習が必要となるという課題がある。 In other words, there is a problem that re-learning is required to mount a learned model learned and acquired by a large-scale system on a low-spec device (for example, an edge terminal) such as an embedded device or a mobile device.

そこで、本発明は、上記課題に鑑み、再学習を必要とせず、組み込み機器やモバイル機器等の低スペックのデバイス（例えば、エッジ端末）において、高性能なコンパクト化したモデルを搭載することができるニューラルネットワーク用プロセッサ、ニューラルネットワーク用データ処理方法、および、プログラムを実現することを目的とする。 In view of the above problem, the present invention can mount a high-performance compact model in a low-spec device (eg, edge terminal) such as an embedded device or a mobile device without the need for re-learning. It is an object to realize a neural network processor, a neural network data processing method, and a program.

上記課題を解決するために、第１の発明は、畳み込み層の処理と全結合層の処理とを含む多値化ニューラルネットワーク用処理を実行するためのニューラルネットワーク用プロセッサであって、制御部と、量子化処理部と、内積処理部と、を備える。 In order to solve the above problem, a first invention is a neural network processor for executing a multivalued neural network process including a convolutional layer process and a fully connected layer process, comprising: , A quantization processing unit and an inner product processing unit.

制御部は、実数ベクトルデータであるスケーリング係数ベクトルを設定するとともに、多値データを要素とする多値基底行列を設定する。 The control unit sets a scaling coefficient vector that is real number vector data, and sets a multi-value basis matrix having multi-value data as elements.

量子化処理部は、畳み込み層に入力される特徴マップおよび全結合層に入力される特徴ベクトルに対して量子化処理を実行する。また、量子化処理部は、特徴マップの最小値および特徴ベクトルの最小値が所定の値よりも小さい値となるようにオフセット値を設定し、特徴マップおよび特徴ベクトルの最大値および最小値に基づいて取得される量子化幅を用いて量子化処理を実行する。 The quantization processing unit performs a quantization process on the feature map input to the convolutional layer and the feature vector input to the fully connected layer. Further, the quantization processing unit sets an offset value such that the minimum value of the feature map and the minimum value of the feature vector are smaller than predetermined values, and based on the maximum value and the minimum value of the feature map and the feature vector. The quantization processing is executed using the quantization width acquired by the above.

内積処理部は、（１）特徴マップおよび特徴ベクトルのノルムを算出するノルムモードと、（２）多値基底行列と量子化処理後の特徴マップまたは特徴ベクトルとを用いた内積演算処理を実行する内積演算モードとを有する。内積処理部は、ノルムモードの処理、および、内積演算モードを組み合わせた処理を実行することで、畳み込み層の処理と全結合層の処理とを実行する。 The inner product processing unit executes an inner product operation process using (1) a norm mode for calculating the norm of the feature map and the feature vector, and (2) a multi-valued base matrix and the feature map or the feature vector after the quantization process. And an inner product operation mode. The inner product processing unit executes the process of the convolutional layer and the process of the fully connected layer by executing the process of the norm mode and the process of combining the inner product operation mode.

このニューラルネットワーク用プロセッサでは、畳み込み層の処理と、全結合層の処理において、同様の処理が実行される部分を共通化し、２つのモード（（１）ノルムモード、（２）内積演算モード）の処理を組み合わせた処理を実行することで、畳み込み層の処理と全結合層の処理とを実行することができる。したがって、このニューラルネットワーク用プロセッサでは、ハードウェア規模の増大を抑制しつつ、ニューラルネットワーク用処理を高速に実行することができる。 In this neural network processor, in the processing of the convolutional layer and the processing of the fully connected layer, a part where the same processing is executed is shared, and two modes ((1) norm mode and (2) inner product operation mode) are used. By executing the processing in which the processing is combined, it is possible to execute the processing of the convolutional layer and the processing of the fully connected layer. Therefore, this neural network processor can execute neural network processing at high speed while suppressing an increase in hardware scale.

第２の発明は、第１の発明であって、内積処理部は、ノルムモード用マイクロコードと、内積演算モード用マイクロコードとを取得するマイクロコード取得部と、マイクロコードに基づいて、算術演算処理を実行する算術演算処理部と、を備える。
（１）ノルムモードに設定されている場合、
マイクロコード取得部は、ノルムモード用マイクロコードを取得し、算術演算処理部は、ノルムモード用マイクロコードに基づいて、算術演算処理を実行する。
（２）内積演算モードに設定されている場合、
マイクロコード取得部は、内積演算モード用マイクロコードを取得し、算術演算処理部は、内積演算モード用マイクロコードに基づいて、算術演算処理を実行する。 A second invention is the first invention, wherein the inner product processing unit includes a microcode acquisition unit that acquires a norm mode microcode and an inner product operation mode microcode, and an arithmetic operation based on the microcode. An arithmetic operation processing unit that executes processing.
(1) When set to norm mode,
The microcode acquisition unit acquires the norm mode microcode, and the arithmetic operation processing unit executes an arithmetic operation process based on the norm mode microcode.
(2) When the inner product calculation mode is set,
The microcode acquisition unit acquires the inner product operation mode microcode, and the arithmetic operation processing unit executes an arithmetic operation process based on the inner product operation mode microcode.

このニューラルネットワーク用プロセッサでは、畳み込み層の処理と、全結合層の処理において、同様の処理が実行される部分を共通化し、２つのモード（（１）ノルムモード、（２）内積演算モード）の処理を、各モードに対応するマイクロコードにより処理することで実行する。そして、このニューラルネットワーク用プロセッサでは、畳み込み層の処理と全結合層の処理とにおいて、相違する部分の処理を、上記の２つのモードの処理を適切な順序で組み合わせることで実現する。したがって、このニューラルネットワーク用プロセッサでは、ハードウェア規模の増大を抑制しつつ、ニューラルネットワーク用処理を高速に実行することができる。 In this neural network processor, in the processing of the convolutional layer and the processing of the fully connected layer, a part where the same processing is executed is shared, and two modes ((1) norm mode and (2) inner product operation mode) are used. The processing is executed by processing with microcode corresponding to each mode. In this neural network processor, the processing of different parts in the processing of the convolutional layer and the processing of the fully connected layer is realized by combining the above two modes of processing in an appropriate order. Therefore, this neural network processor can execute neural network processing at high speed while suppressing an increase in hardware scale.

第３の発明は、第１または第２の発明であって、内積処理部は、畳み込み層の処理を実行する場合、
（１）ノルムモードの処理を、処理対象の畳み込み層の特徴マップ数分繰り返し実行し、
（２）内積演算モードの処理を、各特徴マップについてノルムモードの処理が実行されるごとに、処理対象の畳み込み層の出力数分繰り返し実行する。 A third invention is the first or the second invention, wherein the inner product processing unit executes processing of the convolutional layer,
(1) The norm mode processing is repeatedly executed for the number of feature maps of the convolutional layer to be processed,
(2) Each time the norm mode processing is performed for each feature map, the process in the inner product calculation mode is repeatedly executed for the number of outputs of the convolutional layer to be processed.

これにより、このニューラルネットワーク用プロセッサでは、２つのモードによる処理を組み合わせることで、畳み込み層の処理を実行することができる。 This allows the neural network processor to execute the processing of the convolutional layer by combining the processing in the two modes.

第４の発明は、第１から第３のいずれかの発明であって、内積処理部は、全結合層の処理を実行する場合、
（１）処理対象の全結合層につき、ノルムモードの処理を１回実行し、
（２）内積演算モードの処理を、処理対象の全結合層の出力数分繰り返し実行する。 A fourth invention is the invention according to any one of the first to third inventions, wherein the inner product processing unit executes a process of a fully connected layer,
(1) The norm mode process is executed once for all connected layers to be processed,
(2) The process in the inner product calculation mode is repeatedly executed for the number of outputs of all the connected layers to be processed.

これにより、このニューラルネットワーク用プロセッサでは、２つのモードによる処理を組み合わせることで、全結合層の処理を実行することができる。 As a result, the neural network processor can execute the processing of the fully connected layer by combining the processing in the two modes.

第５の発明は、畳み込み層の処理と全結合層の処理とを含む多値化ニューラルネットワーク用処理を実行するためのニューラルネットワーク用処理方法であって、制御ステップと、量子化処理ステップと、内積処理ステップと、を備える。 A fifth invention is a neural network processing method for executing a multivalued neural network process including a convolutional layer process and a fully connected layer process, comprising: a control step; a quantization processing step; Inner product processing step.

制御ステップは、実数ベクトルデータであるスケーリング係数ベクトルを設定するとともに、多値データを要素とする多値基底行列を設定する。 The control step sets a scaling coefficient vector, which is real vector data, and sets a multi-value basis matrix having multi-value data as elements.

量子化処理ステップは、畳み込み層に入力される特徴マップおよび全結合層に入力される特徴ベクトルに対して量子化処理を実行する。また、量子化処理ステップは、特徴マップの最小値および特徴ベクトルの最小値が所定の値よりも小さい値となるようにオフセット値を設定し、前記特徴マップおよび前記特徴ベクトルの最大値および最小値に基づいて取得される量子化幅を用いて量子化処理を実行する。 In the quantization step, a quantization process is performed on the feature map input to the convolutional layer and the feature vector input to the fully connected layer. Further, the quantization processing step sets an offset value such that a minimum value of the feature map and a minimum value of the feature vector are smaller than predetermined values, and sets a maximum value and a minimum value of the feature map and the feature vector. Is performed using the quantization width acquired based on the.

内積処理ステップは、（１）特徴マップおよび特徴ベクトルのノルムを算出するノルムモードと、（２）多値基底行列と量子化処理後の特徴マップまたは特徴ベクトルとを用いた内積演算処理を実行する内積演算モードとを有する。内積処理ステップは、ノルムモードの処理、および、内積演算モードを組み合わせた処理を実行することで、畳み込み層の処理と全結合層の処理とを実行する。 The inner product processing step executes (1) a norm mode for calculating the norm of the feature map and the feature vector, and (2) an inner product calculation process using the multi-valued basis matrix and the feature map or the feature vector after the quantization process. And an inner product operation mode. The inner product processing step executes the process of the norm mode and the process of combining the inner product operation mode, thereby executing the processing of the convolutional layer and the processing of the fully connected layer.

これにより、第１の発明と同様の効果を奏するニューラルネットワーク用処理方法を実現させることができる。 This makes it possible to realize a neural network processing method having the same effects as the first invention.

第６の発明は、第５の発明であるニューラルネットワーク用処理方法をコンピュータに実行させるためのプログラムである。 A sixth invention is a program for causing a computer to execute the neural network processing method according to the fifth invention.

これにより、第１の発明と同様の効果を奏するニューラルネットワーク用処理方法をコンピュータに実行させるためのプログラムを実現させることができる。 Thereby, it is possible to realize a program for causing a computer to execute a neural network processing method having the same effects as the first invention.

本発明によれば、再学習を必要とせず、組み込み機器やモバイル機器等の低スペックのデバイス（例えば、エッジ端末）において、高性能なコンパクト化したモデルを搭載することができるニューラルネットワーク用プロセッサ、ニューラルネットワーク用処理方法、および、プログラムを実現することができる。 According to the present invention, a neural network processor capable of mounting a high-performance compact model in a low-spec device (for example, an edge terminal) such as an embedded device or a mobile device without requiring re-learning, A neural network processing method and a program can be realized.

第１実施形態に係る二値化ニューラルネットワーク用プロセッサ１００の概略構成図。FIG. 1 is a schematic configuration diagram of a binary neural network processor 100 according to a first embodiment. 第１実施形態に係る内積処理部３の概略構成図。The schematic block diagram of the inner product processing part 3 which concerns on 1st Embodiment. Ｏｆｆｓｅｔモードの処理を説明するための図。FIG. 9 is a diagram for explaining processing in an Offset mode. Ｎｏｒｍモードの処理を説明するための図。FIG. 7 is a diagram for explaining processing in a Norm mode. ＤＰモード（内積演算処理モード）の処理を説明するための図。FIG. 4 is a diagram for explaining processing in a DP mode (inner product operation processing mode). ＣＰＵバス構成を示す図。FIG. 2 is a diagram showing a CPU bus configuration.

［第１実施形態］
第１実施形態について、図面を参照しながら、以下、説明する。 [First Embodiment]
The first embodiment will be described below with reference to the drawings.

＜１．１：二値化ニューラルネットワーク用プロセッサの構成＞
図１は、第１実施形態に係る二値化ニューラルネットワーク用プロセッサ１００の概略構成図である。 <1.1: Configuration of Binary Neural Network Processor>
FIG. 1 is a schematic configuration diagram of a binary neural network processor 100 according to the first embodiment.

図２は、第１実施形態に係る内積処理部３の概略構成図である。 FIG. 2 is a schematic configuration diagram of the inner product processing unit 3 according to the first embodiment.

二値化ニューラルネットワーク用プロセッサ１００は、図１に示すように、入出力インターフェースＩＦ１と、制御部ＣＰＵ１と、演算処理部ＰＬ１と、バスＢ１とを備える。入出力インターフェースＩＦ１と、制御部ＣＰＵ１と、演算処理部ＰＬ１とは、図１に示すように、バスＢ１により接続されており、必要なデータ、コマンド等を、バスＢ１を介して、入出力することができる。なお、上記機能部の一部または全部は、バス接続ではなく、必要に応じて、直接接続されるものであってもよい。 As shown in FIG. 1, the binary neural network processor 100 includes an input / output interface IF1, a control unit CPU1, an arithmetic processing unit PL1, and a bus B1. The input / output interface IF1, the control unit CPU1, and the arithmetic processing unit PL1 are connected by a bus B1, as shown in FIG. 1, and input and output necessary data, commands, and the like via the bus B1. be able to. Note that some or all of the functional units may be directly connected as necessary, instead of being connected to a bus.

入出力インターフェースＩＦ１は、外部から処理対象となるデータＤｉｎを入力し、二値化ニューラルネットワーク用プロセッサにより処理結果を含むデータをデータＤｏｕｔとして外部に出力する。 The input / output interface IF1 receives data Din to be processed from the outside, and outputs data including a processing result by the binary neural network processor to the outside as data Dout.

制御部ＣＰＵ１は、二値化ニューラルネットワーク用プロセッサ１００の全体制御、各機能部の制御および二値化ニューラルネットワーク用処理に必要な処理を行う。制御部ＣＰＵ１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やＣＰＵコアにより実現される。 The control unit CPU1 performs the overall control of the binarized neural network processor 100, the control of each function unit, and the processing necessary for the binarized neural network processing. The control unit CPU1 is realized by a CPU (Central Processing Unit) or a CPU core.

制御部ＣＰＵ１は、例えば、大規模システムでの学習済みモデルのパラメータ（重み付けデータ）を近似するスケーリング係数ベクトルｖ＿ｃおよび二値基底行列Ｍを取得（設定）し、取得（設定）したスケーリング係数ベクトルｖ＿ｃおよび二値基底行列Ｍを、それぞれ、内部ＲＡＭＲ１の領域ＣＶ、および領域ＢｉｎＭｔｘ０／１に記憶保持させる。 For example, the control unit CPU1 acquires (sets) a scaling coefficient vector v_c and a binary basis matrix M that approximate parameters (weighted data) of a learned model in a large-scale system, and acquires (sets) the acquired (set) scaling coefficient vector v_c. And the binary basis matrix M are stored and held in the area CV and the area BinMtx0 / 1 of the internal RAM R1, respectively.

なお、上記スケーリング係数ベクトルｖ＿ｃおよび二値基底行列Ｍは、入出力インターフェースＩＦ１を介して、外部から、二値化ニューラルネットワーク用プロセッサ１００に入力されるものであってもよい。 The scaling coefficient vector v_c and the binary basis matrix M may be input from the outside to the binary neural network processor 100 via the input / output interface IF1.

演算処理部ＰＬ１は、図１に示すように、ＤＭＡ制御部１と、量子化処理部２と、内積処理部３と、内部ＲＡＭＲ１と、を備える。 As shown in FIG. 1, the arithmetic processing unit PL1 includes a DMA control unit 1, a quantization processing unit 2, an inner product processing unit 3, and an internal RAM R1.

ＤＭＡ制御部１は、ＤＭＡ転送処理（ＤＭＡ：ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）を行う。 The DMA control unit 1 performs a DMA transfer process (DMA: Direct Memory Access).

量子化処理部２は、ＤＣＮＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）の畳み込み層の入力である特徴マップのデータに対して、量子化処理を行う。また、量子化処理部２は、ＤＣＮＮの全結合層の入力データに対して、量子化処理を行う。 The quantization processing unit 2 performs a quantization process on data of a feature map which is an input of a convolutional layer of DCNN (Deep Convolution Neural Network). Further, the quantization processing unit 2 performs a quantization process on the input data of the DCNN in all the connection layers.

内積処理部３は、図２に示すように、ＡＮＤ処理部３１と、セレクタ３２と、カウント処理部３３と、マイクロコード取得部３４と、ＡＬＵ３５（ＡＬＵ：ＡｒｉｔｈｍｅｔｉｃＬｏｇｉｃＵｎｉｔ）と、を備える。 As shown in FIG. 2, the inner product processing unit 3 includes an AND processing unit 31, a selector 32, a count processing unit 33, a microcode acquisition unit 34, and an ALU 35 (ALU: Arithmetic Logic Unit).

ＡＮＤ処理部３１は、データＤ２（例えば、内部ＲＡＭＲ１（領域ＢｉｎＭｔｘ０／１）から取得される重みベクトルの整数部のデータ）とデータＤ３（例えば、内部ＲＡＭＲ１（領域ＢｉｎＩｎＴ）から取得されるデータであって、ビット分解され、さらに量子化処理が実行されたデータ）とを入力し、データＤ２とデータＤ３とに対してＡＮＤ処理を実行し、実行結果を含むデータをデータＤ４としてセレクタ３２に出力する。 The AND processing unit 31 includes data D2 (for example, data of an integer part of a weight vector obtained from the internal RAM R1 (area BinMtx0 / 1)) and data D3 (for example, data obtained from the internal RAM R1 (area BinInT)). , Which is bit-separated and further subjected to a quantization process), performs an AND process on the data D2 and the data D3, and outputs the data including the execution result to the selector 32 as data D4. Output.

セレクタ３２は、データＤ２とデータＤ４と、モードを指示する信号ｄｐ＿ｍｏｄｅとを入力する。セレクタ３２は、信号ｄｐ＿ｍｏｄｅに基づいて、データＤ２およびデータＤ４のいずれか一方を選択し、選択したデータをデータＤ５としてカウント処理部３３に出力する。 The selector 32 inputs data D2 and data D4 and a signal dp_mode indicating a mode. The selector 32 selects one of the data D2 and the data D4 based on the signal dp_mode, and outputs the selected data to the count processing unit 33 as data D5.

カウント処理部３３は、セレクタ３２から出力されるデータＤ５を入力し、データＤ５に対してカウント処理を実行する。そして、カウント処理部３３は、処理結果を含むデータをデータＤ６として、ＡＬＵ３５に出力する。 The count processing unit 33 receives the data D5 output from the selector 32 and performs a count process on the data D5. Then, the count processing unit 33 outputs the data including the processing result to the ALU 35 as data D6.

マイクロコード取得部３４は、マイクロコードμＣｏｄｅ（例えば、モードに応じたマイクロコード）を取得し、取得したマイクロコードμＣｏｄｅをＡＬＵ３５に出力する。モードとして、例えば、（１）Ｏｆｆｓｅｔモード、（２）Ｎｏｒｍモード、（３）ＤＰモードが設定される。
なお、「Ｏｆｆｓｅｔモード」は、畳み込み層に入力される特徴マップおよび全結合層に入力される特徴ベクトルに対して量子化処理を実行する場合において、特徴マップの最小値および特徴ベクトルの最小値が所定の値よりも小さい値となるように設定されるオフセット値を取得する処理を実行するためのモードである。
「Ｎｏｒｍモード」は、特徴マップおよび特徴ベクトルのノルムを算出する処理を実行するためのモードである。
「ＤＰモード」は、多値基底行列と量子化処理後の特徴マップまたは特徴ベクトルとを用いた内積演算処理を実行するためのモードである。 The microcode acquisition unit 34 acquires a microcode μCode (for example, a microcode corresponding to a mode), and outputs the acquired microcode μCode to the ALU 35. As the mode, for example, (1) Offset mode, (2) Norm mode, and (3) DP mode are set.
In the “Offset mode”, when performing a quantization process on a feature map input to the convolutional layer and a feature vector input to the fully connected layer, the minimum value of the feature map and the minimum value of the feature vector are This is a mode for executing a process of acquiring an offset value set to be smaller than a predetermined value.
The “Norm mode” is a mode for executing a process of calculating a norm of a feature map and a feature vector.
The “DP mode” is a mode for executing an inner product calculation process using a multivalued basis matrix and a feature map or a feature vector after the quantization process.

ＡＬＵ３５は、データＤ１（例えば、内部ＲＡＭＲ１（領域ＣＶ）から取得される重みベクトルの実数部のデータ（スケール係数ベクトル））と、カウント処理部３３から出力されるデータＤ６と、マイクロコード取得部３４から出力されるマイクロコードμＣｏｄｅとを入力する。ＡＬＵ３５は、マイクロコードμＣｏｄｅに基づいて、算術演算を行い、当該算術演算の結果を含むデータをデータＤｏとして出力する。 The ALU 35 includes data D1 (for example, data of a real part (scale coefficient vector) of a weight vector obtained from the internal RAM R1 (area CV)), data D6 output from the count processing unit 33, and a microcode obtaining unit. The microcode μCode output from 34 is input. The ALU 35 performs an arithmetic operation based on the microcode μCode, and outputs data including the result of the arithmetic operation as data Do.

内部ＲＡＭＲ１は、二値化ニューラルネットワーク用処理を実行するために必要なデータを記憶保持するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。 The internal RAM R1 is a RAM (Random Access Memory) for storing and holding data necessary for executing the processing for the binarized neural network.

＜１．２：二値化ニューラルネットワーク用プロセッサの動作＞
以上のように構成された二値化ニューラルネットワーク用プロセッサ１００の動作について、以下、説明する。 <1.2: Operation of Binary Neural Network Processor>
The operation of the binary neural network processor 100 configured as described above will be described below.

一般に、ＣＮＮでは、入力層と、畳み込み層（コンボリューション層）と、全結合層とを含む。例えば、二値化ニューラルネットワーク用プロセッサ１００の入出力インターフェースＩＦ１に、入力データＤｉｎとして、画像データが入力され、ＣＮＮによる画像認識処理が実行され、画像認識処理結果が出力データＤｏｕｔとして外部に出力される。 Generally, the CNN includes an input layer, a convolution layer (convolution layer), and a fully connected layer. For example, image data is input as input data Din to the input / output interface IF1 of the binarized neural network processor 100, image recognition processing is performed by the CNN, and the image recognition processing result is output to the outside as output data Dout. You.

ＣＮＮでは、畳み込み層の処理、あるいは、全結合層の処理において、入力データに対して重み演算処理が実行され、当該処理結果に対して活性化関数（例えば、ランプ関数（ＲｅＬＵ：ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）、シグモイド関数、Ｓｏｆｔｍａｘ関数等）により処理が実行されることで、畳み込み層あるいは全結合層の出力が得られる。 In the processing of the convolutional layer or the processing of the fully connected layer, the CNN performs a weight calculation process on input data, and activates a function (eg, a ramp function (ReLU: Restricted Linear Unit)) on the processing result. , Sigmoid function, Softmax function, etc.), the output of the convolutional layer or the fully connected layer is obtained.

また、下記先行技術文献Ａに開示されているように、Ｂｉｎａｒｉｚｅｄ−ＤＣＮＮ（ＤＣＮＮ：ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）（以下、「ＢＮＮ」という）では、Ｑｕａｎｔｉｚａｔｉｏｎｓｕｂ−ｌａｙｅｒと結合係数の二値分解を導入し、実数同士の内積計算を二値同士の内積計算に置換することで、既存のネットワークモデルに対して再学習なしに識別計算の高速化およびモデルサイズの圧縮を実現することができる。ＢＮＮの二値同士の演算は、ＸＯＲやＡＮＤ等の論理演算とビットカウントにより高速な演算が可能となる。
（先行技術文献Ａ）：
神谷龍司等 “Binarized-DCNNによる識別計算の高速化とモデル圧縮” 信学技報 116(366), 47-52, 2016-12-15 電子情報通信学会
上記先行技術文献Ａの開示に基づいて、ＢＮＮの識別計算の基本式を、下記（数式１）のように導出することができる。
（数式１）：
ｙ_ｉｊｎ＝ｃ_ｎ ^ＴＭ_ｎ ^ＴＢ_ｉｊｒ_ｉｊ＋ｍｉｎ（ｘ）Ｏｆｆｓｅｔ
ｙ_ｉｊｎ：ｎ番目の特徴マップの出力（特徴マップの座標（ｉ，ｊ）の出力値）
ｃ_ｎ ^Ｔ：ｎ番目の特徴マップのスケール係数ベクトルｃ_ｎの転置行列
Ｍ_ｎ ^Ｔ：ｎ番目の特徴マップの二値基底行列の転置行列
Ｂ_ｉｊｒ_ｉｊ：二値特徴マップ（量子化後の二値特徴マップ）
ｍｉｎ（ｘ）：ｎ番目の特徴マップの各要素の値のうちの最小値
Ｏｆｆｓｅｔ：Ｏｆｆｓｅｔモードで取得される結果
また、Ｍ_ｎ ^Ｔ∈｛−１，１｝とＢ_ｉｊｒ_ｉｊ∈｛−１，１｝とは二値であるため、下記（数式２）を用いて論理演算とビットカウントで計算することができる。
（数式２）：
Ｍ_ｎ ^ＴＢ_ｉｊｒ_ｉｊ
＝２×ＢＩＴＣＮＴ（ＡＮＤ（Ｍ_ｎ ^Ｔ，Ｂ_ｉｊｒ_ｉｊ））−Ｎｏｒｍ（ｚ）
ｚ＝Ｂ_ｉｊｒ_ｉｊ
Ｎｏｒｍ（ｚ）：ｚのノルムを取得する関数
ＢＩＴＣＮＴ（ｘ）：バイナリコードｘにおいて、「１」であるビット数をカウントする関数
二値化ニューラルネットワーク用プロセッサ１００では、畳み込み層の処理と、全結合層の処理において、同様の処理が実行される部分を共通化することで、ハードウェア規模の増大を抑制しつつ、高速処理を実現させる。 Also, as disclosed in the following prior art document A, Binarized-DCNN (Deep Convolution Neutral Network (DCNN)) (hereinafter referred to as “BNN”) introduces a binary decomposition of a Quantization sub-layer and a coupling coefficient. By replacing the inner product calculation between real numbers with the inner product calculation between binary values, it is possible to speed up identification calculation and reduce the model size without re-learning an existing network model. The operation between the binary values of BNN can be performed at high speed by a logical operation such as XOR or AND and a bit count.
(Prior art document A):
Ryuji Kamiya “High-speed Classification Calculation and Model Compression Using Binarized-DCNN” IEICE Technical Report 116 (366), 47-52, 2016-12-15 A basic expression for the BNN identification calculation can be derived as in the following (Equation 1).
(Equation 1):
y _ijn = c _n ^T M _n ^T B _ij r _ij + min (x) Offset
y _ijn : output of the n-th feature map (output value of coordinates (i, j) of the feature map)
c _n ^T : transposed matrix of scale coefficient vector c _n of n-th feature map M _n ^T : transposed matrix of binary basis matrix of n-th feature map B _ij r _ij : binary feature map (two- _dimensional map after quantization) Value feature map)
min (x): the minimum value of the values of the elements of the n-th feature map Offset: the result obtained in the Offset mode Also, M _n ^T {-1,1} and B _ij r _ij −1 , 1} are binary, and can be calculated by a logical operation and a bit count using the following (Equation 2).
(Equation 2):
M _n ^T B _ij r _ij
= 2 × BITCNT (AND (M _n ^T , B _ij r _ij )) − Norm (z)
z = B _ij r _ij
Norm (z): a function for acquiring the norm of z BITCNT (x): a function for counting the number of bits of “1” in the binary code x In the binary neural network processor 100, the processing of the convolutional layer In the processing of the connection layer, by sharing a part where the same processing is executed, high-speed processing is realized while suppressing an increase in hardware scale.

以下では、「畳み込み層の処理」と「全結合層の処理」とに分けて、二値化ニューラルネットワーク用プロセッサ１００の動作について説明する。 Hereinafter, the operation of the binary neural network processor 100 will be described by dividing into “processing of the convolutional layer” and “processing of the fully connected layer”.

図３は、Ｏｆｆｓｅｔモードの処理を説明するための図である。 FIG. 3 is a diagram for explaining the process in the Offset mode.

図４は、Ｎｏｒｍモードの処理を説明するための図である。 FIG. 4 is a diagram for explaining processing in the Norm mode.

図５は、ＤＰモード（内積演算処理モード）の処理を説明するための図である。 FIG. 5 is a diagram for explaining processing in the DP mode (inner product operation processing mode).

二値化ニューラルネットワーク用プロセッサ１００では、（１）Ｏｆｆｓｅｔモード、（２）Ｎｏｒｍモード、（３）ＤＰモードの３つのモードを用いて、処理が実行される。 In the binarized neural network processor 100, processing is performed using three modes: (1) Offset mode, (2) Norm mode, and (3) DP mode.

（１．２．１：畳み込み層の処理）
まず、畳み込み層の処理について、説明する。 (1.2.1: Convolutional layer processing)
First, the processing of the convolution layer will be described.

二値化ニューラルネットワーク用プロセッサ１００の量子化処理部２は、第ｌ層（ｌ：自然数）におけるｍ番目（ｍ：自然数）の特徴マップｚ^ｌ _ｉｊｍにおける最大値−最小値間の量子化幅Δｄを、
Δｄ＝｛ｍａｘ（ｚ^ｌ _ｉｊｍ）−ｍｉｎ（ｚ^ｌ _ｉｊｍ）｝／（２^Ｑ−１）
ｍａｘ（ｘ）：ｘの最大値を取得する関数
ｍｉｎ（ｘ）：ｘの最小値を取得する関数
Ｑ：量子化ビット数
として取得する。 The quantization processing unit 2 of the binarized neural network processor 100 _calculates the quantization width Δd between the maximum value and the minimum value in the m-th (m: natural number) feature map z ^l _ijm in the l-th layer (1: natural number). To
^{_{^{Δd = {max (z l ijm}}} ) -min (z l ijm)} / (2 Q -1)
max (x): function for acquiring the maximum value of x min (x): function for acquiring the minimum value of x Q: acquired as the number of quantization bits.

そして、量子化処理部２は、特徴マップの最小値が０となるように値をシフトさせる。つまり、量子化処理部２は、
ｚ^ｌ _ｉｊｍ’＝｛ｚ^ｌ _ｉｊｍ−ｍｉｎ（ｚ^ｌ _ｉｊｍ）｝／Ｑ
に相当する処理を実行し、さらに、上記数式により取得された値を四捨五入して整数値に丸め量子化する。さらに、量子化処理部２は、丸め量子化により取得された値に対して、二値化処理をすることで、バイナリコードｚ^ｌ _ｉｊｍ ^（ｂ）∈｛０，１｝を取得する。 Then, the quantization processing unit 2 shifts the value so that the minimum value of the feature map becomes 0. That is, the quantization processing unit 2
^{_{^{_{z l ijm '= {z l}}}} ijm -min (z l ijm)} / Q
Is performed, and the value obtained by the above equation is rounded off, rounded to an integer value, and quantized. Further, the quantization processing unit 2 obtains a binary code z ^l _ijm ^(b) {0, 1} by _{performing a binarization} process on the value obtained by the rounding quantization.

上記のようにして取得されたバイナリコードｚ^ｌ _ｉｊｍ ^（ｂ）∈｛０，１｝（量子化処理後の特徴マップＢ_ｉｊｒ_ｉｊ）は、内部ＲＡＭの領域ＢｉｎＩｎＴに記憶保持される。 The binary code z ^l _ijm ^(b) {0, 1} (the feature map B _ij r _ij after the quantization process ⁾ acquired as described above is stored and held in the area BinInT of the internal RAM.

畳み込み層の処理において、以下のことが成り立つ。
（１）量子化処理後の特徴マップＢ_ｉｊｒ_ｉｊは、特徴マップごとに変化する（入れ替わる）。
（２）上記（数式１）の右辺の第２項、すなわち、ｍｉｎ（ｘ）Ｏｆｆｓｅｔの値は、特徴マップに関わらず、一定である。 In the processing of the convolutional layer, the following holds.
(1) The feature map B _ij r _ij after the quantization process changes (replaces) for each feature map.
(2) The second term on the right side of (Equation 1), that is, the value of min (x) Offset is constant regardless of the feature map.

二値化ニューラルネットワーク用プロセッサ１００では、上記を考慮して、畳み込み層の処理を以下の疑似コードに相当する処理により実行する。
≪畳み込み層の処理の擬似コード≫
For (出力数)
Operate_offset(); // オフセット復元処理
For (特徴マップ数)
Operate_Norm(); // ノルムの計算（数式２）の右辺の第２項に相当する処理
For (出力数)
Operate_dp(); // 内積計算
二値化ニューラルネットワーク用プロセッサ１００は、
（１）上記のオフセット復元処理をＯｆｆｓｅｔモードの処理で実行し、
（２）上記のノルム計算の処理をＮｏｒｍモードの処理で実行し、
（３）上記の内積計算の処理をＤＰモード（内積演算処理モード）の処理で実行する。 In consideration of the above, the binarized neural network processor 100 executes the processing of the convolutional layer by processing corresponding to the following pseudo code.
擬似 Pseudo code for processing of convolutional layer≫
For (number of outputs)
Operate_offset (); // Offset restoration processing
For (number of feature maps)
Operate_Norm (); // Calculation of norm Processing equivalent to the second term on the right side of (Equation 2)
For (number of outputs)
Operate_dp (); // Dot product calculation Binary neural network processor 100
(1) The above offset restoration process is executed in the process of the Offset mode,
(2) The above-described norm calculation process is executed in a Norm mode process,
(3) The above inner product calculation process is executed in the DP mode (inner product operation processing mode) process.

以下、これについて、説明する。 Hereinafter, this will be described.

（１．２．１．１：Ｏｆｆｓｅｔモードの処理（畳み込み層の処理））
Ｏｆｆｓｅｔモードの処理について、説明する。 (1.2.1.1: Offset mode processing (convolution layer processing))
The processing in the Offset mode will be described.

図３に示すように、データＤ２が、セレクタ３２に入力される。 As shown in FIG. 3, the data D2 is input to the selector 32.

セレクタ３２では、信号値が「０」に設定されたモード信号ｄｐ＿ｍｏｄｅが入力されており、セレクタ３２は、当該モード信号ｄｐ＿ｍｏｄｅに基づいて、データＤ２を選択し、データＤ５としてカウント処理部３３に出力する。 The selector 32 receives the mode signal dp_mode whose signal value is set to “0”. The selector 32 selects the data D2 based on the mode signal dp_mode and outputs the data D2 to the count processing unit 33 as data D5. I do.

カウント処理部３３は、Ｏｆｆｓｅｔモードにおいて、入力データＤ５を、そのまま、データＤ６として、ＡＬＵ３５に出力する。 The count processing unit 33 outputs the input data D5 as it is to the ALU 35 as the data D6 in the Offset mode.

マイクロコード取得部３４は、Ｏｆｆｓｅｔモード用のマイクロコードμＣｏｄｅ（Ｏｆｆｓｅｔ＿ｍｏｄｅ）を取得し、ＡＬＵ３５に出力する。なお、Ｏｆｆｓｅｔモード用のマイクロコードμＣｏｄｅ（Ｏｆｆｓｅｔ＿ｍｏｄｅ）は、例えば、以下の処理をＡＬＵ３５に実行させるコードである。
（１）ｍｉｎ（ｘ）のロード（読み出し）
（２）データＤ１（＝ｃ_ｎ ^Ｔ）と、データＤ６（＝Ｍ_ｎ ^Ｔ）と、ｍｉｎ（ｘ）との乗算処理
なお、ｍｉｎ（ｘ）は、量子化処理が実行されるときに取得した値を、例えば、内部ＲＡＭＲ１に記憶保持しておき、マイクロコード取得部３４が、ｍｉｎ（ｘ）のデータを、内部ＲＡＭＲ１から読み出すようにしてもよい。 The microcode acquisition unit 34 acquires the microcode μCode (Offset_mode) for the Offset mode, and outputs it to the ALU 35. Note that the microcode μCode (Offset_mode) for the Offset mode is, for example, a code that causes the ALU 35 to execute the following processing.
(1) Loading (reading) of min (x)
(2) Multiplication process of data D1 (= c _n ^T ), data D6 (= M _n ^T ), and min (x) Note that min (x) was obtained when the quantization process was performed. For example, the value may be stored and held in the internal RAM R1, and the microcode acquisition unit 34 may read the min (x) data from the internal RAM R1.

ＡＬＵ３５は、図３に示すように、データＤ１（＝ｃ_ｎ ^Ｔ）とデータＤ６（＝Ｍ_ｎ ^Ｔ）とを入力する。なお、データＤ１（＝ｃ_ｎ ^Ｔ）は、内部ＲＡＭの領域ＣＶに記憶保持されているスケール係数ベクトルのデータｃ_ｎ ^Ｔである。 The ALU 35 receives data D1 (= c _n ^T ) and data D6 (= M _n ^T ), as shown in FIG. The data D1 (= c _n ^T ) is the scale coefficient vector data c _n ^T stored and held in the area CV of the internal RAM.

また、ＡＬＵ３５は、マイクロコード取得部３４から出力されるＯｆｆｓｅｔモード用のマイクロコードμＣｏｄｅ（Ｏｆｆｓｅｔ＿ｍｏｄｅ）を入力し、当該Ｏｆｆｓｅｔモード用のマイクロコードμＣｏｄｅ（Ｏｆｆｓｅｔ＿ｍｏｄｅ）に従って演算を行う。 Also, the ALU 35 receives the microcode μCode (Offset_mode) for the Offset mode output from the microcode acquisition unit 34 and performs an operation according to the microcode μCode (Offset_mode) for the Offset mode.

つまり、ＡＬＵ３５は、
（１）ｍｉｎ（ｘ）のロード（読み出し）
（２）データＤ１（＝ｃ_ｎ ^Ｔ）と、データＤ６（＝Ｍ_ｎ ^Ｔ）と、ｍｉｎ（ｘ）との乗算処理
を実行することで、出力データＤｏ（＝ｍｉｎ（ｘ）Ｏｆｆｓｅｔ）を取得する。 In other words, ALU 35
(1) Loading (reading) of min (x)
(2) Output data Do (= min (x) Offset) is obtained by performing a multiplication process of data D1 (= c _n ^T ), data D6 (= M _n ^T ), and min (x). I do.

以上のように処理することで、上記（数式１）の右辺の第２項、すなわち、ｍｉｎ（ｘ）Ｏｆｆｓｅｔの値（オフセット値）を取得することができる。 By performing the processing as described above, the second term on the right side of (Equation 1), that is, the value (offset value) of min (x) Offset can be obtained.

畳み込み層の処理では、上記処理（オフセット復元処理）が、畳み込み層の出力数分、実行される。 In the processing of the convolutional layer, the above processing (offset restoration processing) is executed for the number of outputs of the convolutional layer.

（１．２．１．２：Ｎｏｒｍモードの処理（畳み込み層の処理））
Ｎｏｒｍモードの処理について、説明する。 (1.2.1.2: Norm mode processing (convolutional layer processing))
The processing in the Norm mode will be described.

図４に示すように、データＤ３（＝Ｂ_ｉｊｒ_ｉｊ）が、ＡＮＤ処理部３１に入力される。なお、データＤ３は、量子化処理後の特徴マップＢ_ｉｊｒ_ｉｊであり、内部ＲＡＭの領域ＢｉｎＩｎＴに記憶保持されている。 As shown in FIG. 4, data D3 (= B _ij r _ij ) is input to the AND processing unit 31. The data D3 is a feature map B _ij r _ij after the quantization process, and is stored and held in the area BinInT of the internal RAM.

ＡＮＤ処理部３１は、Ｎｏｒｍモードでは、入力データＤ３を、そのまま、データＤ４として、セレクタ３２に出力する。 In the Norm mode, the AND processing unit 31 outputs the input data D3 to the selector 32 as it is as data D4.

セレクタ３２では、信号値が「１」に設定されたモード信号ｄｐ＿ｍｏｄｅが入力されており、セレクタ３２は、当該モード信号ｄｐ＿ｍｏｄｅに基づいて、データＤ４を選択し、データＤ５としてカウント処理部３３に出力する。 The selector 32 receives the mode signal dp_mode whose signal value is set to “1”. The selector 32 selects the data D4 based on the mode signal dp_mode, and outputs the data D4 to the count processing unit 33 as data D5. I do.

カウント処理部３３は、入力データＤ５に対してカウント処理（ＢＩＴＣＮＴ関数による処理）を実行し、処理結果をデータＤ６（＝ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ））としてＡＬＵ３５に出力する。 Counting processing unit 33 executes a counting process (process by BITCNT function) with respect to the input data D5, and outputs the processing result data _{_{D6 (= BITCNT (B ij r}} ij)) as the ALU35.

マイクロコード取得部３４は、Ｎｏｒｍモード用のマイクロコードμＣｏｄｅ（Ｎｏｒｍ＿ｍｏｄｅ）を取得し、ＡＬＵ３５に出力する。なお、Ｎｏｒｍモード用のマイクロコードμＣｏｄｅ（Ｎｏｒｍ＿ｍｏｄｅ）は、カウント処理部３３から入力されたデータをそのまま出力させる処理をＡＬＵ３５に実行させるコードである。 The microcode acquisition unit 34 acquires the microcode μCode (Norm_mode) for the Norm mode, and outputs it to the ALU 35. Note that the Norm mode microcode μCode (Norm_mode) is a code that causes the ALU 35 to execute a process of directly outputting the data input from the count processing unit 33.

ＡＬＵ３５は、図４に示すように、カウント処理部３３から出力されるデータＤ６（＝ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ））を入力する。 The ALU 35 receives the data D6 (= BITCNT (B _ij r _ij )) output from the count processing unit 33, as shown in FIG.

また、ＡＬＵ３５は、マイクロコード取得部３４から出力されるＮｏｒｍモード用のマイクロコードμＣｏｄｅ（Ｎｏｒｍ＿ｍｏｄｅ）を入力し、当該Ｎｏｒｍモード用のマイクロコードμＣｏｄｅ（Ｎｏｒｍ＿ｍｏｄｅ）に従って演算を行う。 Further, the ALU 35 receives the Norm mode microcode μCode (Norm_mode) output from the microcode acquisition unit 34 and performs an operation according to the Norm mode microcode μCode (Norm_mode).

つまり、ＡＬＵ３５は、カウント処理部３３から入力されたデータをそのまま出力させる処理を行い、データＤｏ（＝ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ））を出力する。なお、ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ）は、量子化処理後の特徴マップＢ_ｉｊｒ_ｉｊのノルムに相当する。 That is, the ALU 35 performs a process of directly outputting the data input from the count processing unit 33, and outputs data Do (= BITCNT (B _ij r _ij )). BITCNT (B _ij r _ij ) corresponds to the norm of the feature map B _ij r _ij after the quantization processing.

以上のように処理することで、上記（数式２）の右辺の第２項、すなわち、Ｎｏｒｍ（ｚ）（ｚ＝Ｂ_ｉｊｒ_ｉｊ）の値（ノルム）を取得することができる。 By performing the processing as described above, the second term on the right side of (Equation 2), that is, the value (norm) of Norm (z) (z = B _ij r _ij ) can be obtained.

畳み込み層の処理では、上記処理（ノルム算出処理）が、処理対象となっている畳み込み層の特徴マップ数分、実行される。 In the processing of the convolutional layer, the above processing (norm calculation processing) is executed for the number of feature maps of the convolutional layer to be processed.

（１．２．１．３：ＤＰモードの処理（畳み込み層の処理））
ＤＰモードの処理について、説明する。 (1.2.1.3: DP mode processing (convolutional layer processing))
The processing in the DP mode will be described.

図５に示すように、データＤ２（＝Ｍ_ｎ ^Ｔ）およびデータＤ３（＝Ｂ_ｉｊｒ_ｉｊ）が、ＡＮＤ処理部３１に入力される。 As shown in FIG. 5, data D2 (= M _n ^T ) and data D3 (= B _ij r _ij ) are input to the AND processing unit 31.

データＤ２は、内部ＲＡＭの領域ＢｉｎＭｔｘ０／１に記憶保持されている二値基底行列のデータＭ_ｎ ^Ｔである。 The data D2 is data M _n ^T of a binary basis matrix stored and held in the area BinMtx0 / 1 of the internal RAM.

データＤ３は、量子化処理後の特徴マップＢ_ｉｊｒ_ｉｊであり、内部ＲＡＭの領域ＢｉｎＩｎＴに記憶保持されている。 The data D3 is a feature map B _ij r _ij after the quantization processing, and is stored and held in the area BinInT of the internal RAM.

ＡＮＤ処理部３１は、データＤ２およびデータＤ３に対してＡＮＤ処理を実行し、処理結果を含むデータをデータＤ４（＝ＡＮＤ（Ｍ_ｎ ^Ｔ，Ｂ_ｉｊｒ_ｉｊ））として、セレクタ３２に出力する。なお、ＡＮＤ処理は、要素の値が「−１」である場合、当該「−１」を「０」に置換して論理積をとる処理である。 The AND processing unit 31 performs an AND process on the data D2 and the data D3, and outputs the data including the processing result to the selector 32 as data D4 (= AND (M _n ^T , B _ij r _ij )). Note that the AND process is a process in which, when the value of an element is “−1”, the “−1” is replaced with “0” to obtain a logical product.

カウント処理部３３は、入力データＤ５に対してカウント処理（ＢＩＴＣＮＴ関数による処理）を実行し、処理結果をデータＤ６（＝ＢＩＴＣＮＴ（ＡＮＤ（Ｍ_ｎ ^Ｔ，Ｂ_ｉｊｒ_ｉｊ）））としてＡＬＵ３５に出力する。 Counting processing unit 33 executes a counting process (process by BITCNT function) with respect to the input data D5, the processing result data _{^{_{D6 (= BITCNT (AND (M}}} n T, B ij r ij))) as an output in ALU35 I do.

マイクロコード取得部３４は、ＤＰモード用のマイクロコードμＣｏｄｅ（ＤＰ＿ｍｏｄｅ）を取得し、ＡＬＵ３５に出力する。なお、ＤＰモード用のマイクロコードμＣｏｄｅ（ＤＰ＿ｍｏｄｅ）は、例えば、以下の処理をＡＬＵ３５に実行させるコードである。
（１）Ｄ６×２の処理（１ビット左にシフトさせる処理）
（２）上記（１）の結果からノルムを減算する処理
（３）上記（２）の結果に、データＤ１（＝ｃ_ｎ ^Ｔ）を乗算する処理
ＡＬＵ３５は、図５に示すように、データＤ１（＝ｃ_ｎ ^Ｔ）とデータＤ６（＝ＢＩＴＣＮＴ（ＡＮＤ（Ｍ_ｎ ^Ｔ，Ｂ_ｉｊｒ_ｉｊ）））とを入力する。なお、データＤ１（＝ｃ_ｎ ^Ｔ）は、内部ＲＡＭの領域ＣＶに記憶保持されているスケール係数ベクトルのデータｃ_ｎ ^Ｔである。 The microcode obtaining unit 34 obtains the microcode μCode (DP_mode) for the DP mode and outputs it to the ALU 35. The microcode μCode (DP_mode) for the DP mode is, for example, a code for causing the ALU 35 to execute the following processing.
(1) D6 × 2 processing (processing to shift one bit to the left)
(2) Processing of subtracting the norm from the result of (1) (3) Processing of multiplying the result of (2) by data D1 (= c _n ^T ) The ALU 35, as shown in FIG. _{(= c} ^{n T)} and data _{^{_{D6 (= BITCNT (aND (M}}} n T, B ij r ij))) inputs the. The data D1 (= c _n ^T ) is the scale coefficient vector data c _n ^T stored and held in the area CV of the internal RAM.

また、ＡＬＵ３５は、マイクロコード取得部３４から出力されるＤＰモード用のマイクロコードμＣｏｄｅ（ＤＰ＿ｍｏｄｅ）を入力し、当該ＤＰモード用のマイクロコードμＣｏｄｅ（ＤＰ＿ｍｏｄｅ）に従って演算を行う。 Further, the ALU 35 receives the microcode μCode (DP_mode) for the DP mode output from the microcode acquisition unit 34 and performs an operation according to the microcode μCode (DP_mode) for the DP mode.

つまり、ＡＬＵ３５は、
（１）Ｄ６×２の処理（１ビット左にシフトさせる処理）
（２）上記（１）の結果からノルムを減算する処理（２×Ｄ６―Ｎｏｒｍ（ｚ））
（３）上記（２）の結果に、データＤ１（＝ｃ_ｎ ^Ｔ）を乗算する処理
を実行することで、出力データＤｏ（＝ｃ_ｎ ^ＴＭ_ｎ ^ＴＢ_ｉｊｒ_ｉｊ）を取得する。 In other words, ALU 35
(1) D6 × 2 processing (processing to shift one bit to the left)
(2) Processing for subtracting the norm from the result of (1) (2 × D6-Norm (z))
(3) A process of multiplying the result of (2) by data D1 (= c _n ^T ) to obtain output data Do (= c _n ^T M _n ^T B _ij r _ij ).

つまり、上記により、下記に相当する処理が実行される。
Ｄｏ＝ｃ_ｎ ^ＴＭ_ｎ ^ＴＢ_ｉｊｒ_ｉｊ
Ｍ_ｎ ^ＴＢ_ｉｊｒ_ｉｊ＝２×ＢＩＴＣＮＴ（ＡＮＤ（Ｍ_ｎ ^Ｔ，Ｂ_ｉｊｒ_ｉｊ））−Ｎｏｒｍ（ｚ）
以上のように処理することで、上記（数式１）の右辺の第１項、すなわち、ｃ_ｎ ^ＴＭ_ｎ ^ＴＢ_ｉｊｒ_ｉｊの値を取得することができる。 That is, according to the above, processing corresponding to the following is executed.
Do = c _n ^T M _n ^T B _ij r _ij
_{^{_{_{M n T B ij r ij =}}}} 2 × BITCNT (AND (M n T, B ij r ij)) - Norm (z)
By performing the processing as described above, the first term on the right side of the above (Equation 1), that is, the value of c _n ^T M _n ^T B _ij r _ij can be obtained.

畳み込み層の処理では、上記処理（内積演算処理）が、処理対象となっている畳み込み層の特徴マップごとに、当該畳み込み層の出力数分、実行される。上記の処理結果は、例えば、内部ＲＡＭＲ１の所定の領域に記憶保持される、あるいは、制御部ＣＰＵ１へ出力され、制御部ＣＰＵ１が当該処理結果を用いて所定の処理を実行する。 In the processing of the convolutional layer, the above-described processing (inner product calculation processing) is executed for each feature map of the convolutional layer to be processed by the number of outputs of the convolutional layer. The above processing result is, for example, stored and held in a predetermined area of the internal RAM R1, or output to the control unit CPU1, and the control unit CPU1 executes a predetermined process using the processing result.

以上のように処理することで、二値化ニューラルネットワーク用プロセッサ１００では、畳み込み層の処理を実行することができる。すなわち、二値化ニューラルネットワーク用プロセッサ１００では、上記の３つのモードによる処理により、（数式１）のｙ_ｉｊｎを取得するために必要なデータを取得することができ、その結果、畳み込み層の処理を実行することができる。 By performing the processing as described above, the binarized neural network processor 100 can execute the processing of the convolutional layer. That is, in the binarized neural network processor 100, the data required to obtain y _ijn of (Equation 1) can be obtained by the processing in the above three modes, and as a result, the processing of the convolutional layer Can be performed.

（１．２．２：全結合層の処理）
次に、全結合層の処理について、説明する。 (1.2.2: Processing of all bonding layers)
Next, the processing of the entire bonding layer will be described.

二値化ニューラルネットワーク用プロセッサ１００の量子化処理部２は、ｌ番目の全結合層への入力ベクトルｚ^ｌ _ｉにおける最大値−最小値間の量子化幅Δｄを、
Δｄ＝｛ｍａｘ（ｚ^ｌ _ｉ）−ｍｉｎ（ｚ^ｌ _ｉ）｝／（２^Ｑ−１）
ｍａｘ（ｘ）：ｘの最大値を取得する関数
ｍｉｎ（ｘ）：ｘの最小値を取得する関数
Ｑ：量子化ビット数
として取得する。 The quantization processing unit 2 of the binarized neural network processor 100 calculates the quantization width Δd between the maximum value and the minimum value in the input vector z ^l _i to the l-th fully connected layer,
^{_{Δd = {max (z l i}} ) -min (z l i)} / (2 Q -1)
max (x): function for acquiring the maximum value of x min (x): function for acquiring the minimum value of x Q: acquired as the number of quantization bits.

そして、量子化処理部２は、全結合層への入力ベクトルの最小値が０となるように値をシフトさせる。つまり、量子化処理部２は、
ｚ^ｌ _ｉ’＝｛ｚ^ｌ _ｉ−ｍｉｎ（ｚ^ｌ _ｉ）｝／Ｑ
に相当する処理を実行し、さらに、上記数式により取得された値を四捨五入して整数値に丸め量子化する。さらに、量子化処理部２は、丸め量子化により取得された値に対して、二値化処理をすることで、バイナリコードｚ^ｌ _ｉ ^（ｂ）∈｛０，１｝を取得する。 Then, the quantization processing unit 2 shifts the value so that the minimum value of the input vector to the fully connected layer becomes zero. That is, the quantization processing unit 2
^{_{^{_{z l i '= {z l}}}} i -min (z l i)} / Q
Is performed, and the value obtained by the above equation is rounded off, rounded to an integer value, and quantized. Furthermore, the quantization processing unit 2, to the obtained value by the quantization rounding, by binarization processing, to obtain a binary code ^{_{^{z l i (b) ∈ {}}} 0,1}.

上記のようにして取得されたバイナリコードｚ^ｌ _ｉ ^（ｂ）∈｛０，１｝（量子化処理後の特徴ベクトルＢ_ｉｊｒ_ｉｊ）は、内部ＲＡＭの領域ＢｉｎＩｎＴに記憶保持される。 The acquired binary code ^{_{^{z l i (b) ∈ {}}} 0,1} ( feature vector _B ij _{r ij} after quantization process) as is stored and held in the area BinInT internal RAM.

全結合層の処理において、以下のことが成り立つ。
（１）量子化処理後の特徴ベクトルは１つだけである。 In the processing of the entire bonding layer, the following holds.
(1) There is only one feature vector after the quantization processing.

二値化ニューラルネットワーク用プロセッサ１００では、上記を考慮して、全結合層の処理を以下の疑似コードに相当する処理により実行する。
≪全結合層の処理の擬似コード≫
Operate_Norm(); // ノルムの計算（数式２）の右辺の第２項に相当する処理
For (出力数)
Operate_offset(); // オフセット復元処理
Operate_dp(); // 内積計算
二値化ニューラルネットワーク用プロセッサ１００は、
（１）上記のノルム計算の処理をＮｏｒｍモードの処理で実行し、
（２）上記のオフセット復元処理をＯｆｆｓｅｔモードの処理で実行し、
（３）上記の内積計算の処理をＤＰモード（内積演算処理モード）の処理で実行する。 In consideration of the above, the binarized neural network processor 100 executes the processing of all the connected layers by processing corresponding to the following pseudo code.
擬似 Pseudo code for processing all connected layers≫
Operate_Norm (); // Calculation of norm Processing equivalent to the second term on the right side of (Equation 2)
For (number of outputs)
Operate_offset (); // Offset restoration processing
Operate_dp (); // Dot product calculation Binary neural network processor 100
(1) The above-described norm calculation process is executed in a Norm mode process,
(2) The above-described offset restoration processing is executed as processing in the Offset mode,
(3) The above inner product calculation process is executed in the DP mode (inner product operation processing mode) process.

（１．２．２．１：Ｎｏｒｍモードの処理（全結合層の処理））
Ｎｏｒｍモードの処理について、説明する。 (1.2.2.1: Norm mode processing (processing of all connected layers))
The processing in the Norm mode will be described.

図４に示すように、データＤ３（＝Ｂ_ｉｊｒ_ｉｊ）が、ＡＮＤ処理部３１に入力される。なお、データＤ３は、量子化処理後の特徴ベクトルＢ_ｉｊｒ_ｉｊであり、内部ＲＡＭの領域ＢｉｎＩｎＴに記憶保持されている。 As shown in FIG. 4, data D3 (= B _ij r _ij ) is input to the AND processing unit 31. Note that the data D3 is the feature vector B _ij r _ij after the quantization process, and is stored and held in the area BinInT of the internal RAM.

つまり、ＡＬＵ３５は、カウント処理部３３から入力されたデータをそのまま出力させる処理を行い、データＤｏ（＝ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ））を出力する。なお、ＢＩＴＣＮＴ（Ｂ_ｉｊｒ_ｉｊ）は、量子化処理後の特徴ベクトルＢ_ｉｊｒ_ｉｊのノルムに相当する。 That is, the ALU 35 performs a process of directly outputting the data input from the count processing unit 33, and outputs data Do (= BITCNT (B _ij r _ij )). BITCNT (B _ij r _ij ) corresponds to the norm of the feature vector B _ij r _ij after the quantization processing.

全結合層の処理では、上記処理（ノルム算出処理）が、処理対象となっている全結合層につき１回実行される。 In the processing of the all connected layers, the above processing (norm calculation processing) is executed once for all the connected layers to be processed.

（１．２．２．２：Ｏｆｆｓｅｔモードの処理（全結合層の処理））
Ｏｆｆｓｅｔモードの処理について、説明する。 (1.2.2.2: Offset mode processing (processing of all connected layers))
The processing in the Offset mode will be described.

全結合層の処理では、上記処理（オフセット復元処理）が、全結合層の出力数分、実行される。 In the process of the fully connected layer, the above process (offset restoration process) is executed for the number of outputs of the fully connected layer.

（１．２．２．３：ＤＰモードの処理（全結合層の処理））
ＤＰモードの処理について、説明する。 (1.2.2.3: DP mode processing (processing of all coupled layers))
The processing in the DP mode will be described.

全結合層の処理では、上記処理（内積演算処理）が、全結合層の出力数分、実行される。上記の処理結果は、例えば、内部ＲＡＭＲ１の所定の領域に記憶保持される、あるいは、制御部ＣＰＵ１へ出力され、制御部ＣＰＵ１が当該処理結果を用いて所定の処理を実行する。 In the processing of the fully connected layer, the above processing (inner product calculation processing) is executed for the number of outputs of the fully connected layer. The above processing result is, for example, stored and held in a predetermined area of the internal RAM R1, or output to the control unit CPU1, and the control unit CPU1 executes a predetermined process using the processing result.

以上のように処理することで、二値化ニューラルネットワーク用プロセッサ１００では、全結合層の処理を実行することができる。すなわち、二値化ニューラルネットワーク用プロセッサ１００では、上記の３つのモードによる処理により、（数式１）のｙ_ｉｊｎを取得するために必要なデータを取得することができ、その結果、全結合層の処理を実行することができる。 By performing the processing as described above, the binarized neural network processor 100 can execute the processing of all the connected layers. That is, in the binarized neural network processor 100, data necessary for obtaining y _ijn of (Equation 1) can be obtained by the processing in the above three modes, and as a result, Processing can be performed.

以上のように、二値化ニューラルネットワーク用プロセッサ１００では、畳み込み層の処理と、全結合層の処理において、同様の処理が実行される部分を共通化し、３つのモード（（１）Ｏｆｆｓｅｔモード、（２）Ｎｏｒｍモード、（３）ＤＰモード）の処理を、各モードに対応するマイクロコードにより処理することで実行する。そして、二値化ニューラルネットワーク用プロセッサ１００では、畳み込み層の処理と全結合層の処理とにおいて、相違する部分の処理を、上記の３つのモードの処理を適切な順序で組み合わせることで実現する。したがって、二値化ニューラルネットワーク用プロセッサ１００では、ハードウェア規模の増大を抑制しつつ、ＢＮＮの処理を高速に実行することができる。 As described above, in the binarized neural network processor 100, in the processing of the convolutional layer and the processing of the fully connected layer, the part where the same processing is executed is shared, and the three modes ((1) Offset mode, The processing of (2) Norm mode and (3) DP mode) is executed by processing using microcode corresponding to each mode. In the binarized neural network processor 100, the processing of the different parts in the processing of the convolutional layer and the processing of the fully connected layer is realized by combining the above three modes of processing in an appropriate order. Therefore, the binary neural network processor 100 can execute BNN processing at high speed while suppressing an increase in hardware scale.

［他の実施形態］
上記実施形態では、二値化ニューラルネットワーク用プロセッサ１００が二値化データをjsよりする場合について、説明したが、本発明はこれに限定されることなく、本発明の手法を多値化データに適用し、多値化ニューラルネットワーク用プロセッサを実現するようにしてもよい。 [Other embodiments]
In the above embodiment, the case where the binarized neural network processor 100 converts the binarized data from js has been described, but the present invention is not limited to this, and the method of the present invention is applied to multi-valued data. The present invention may be applied to realize a multi-valued neural network processor.

また、上記実施形態では、二値化ニューラルネットワーク用プロセッサ１００が、３つのモード（（１）Ｏｆｆｓｅｔモード、（２）Ｎｏｒｍモード、（３）ＤＰモード）により処理を実行する場合について説明したが、これに限定されることはない。例えば、二値化ニューラルネットワーク用プロセッサ１００は、（１）Ｎｏｒｍモード、（２）ＤＰモードにより処理を実行するようにし、このＤＰモードの処理に、上記実施形態で説明したＯｆｆｓｅｔモードの処理を含めるようにしてもよい。また、二値化ニューラルネットワーク用プロセッサ１００は、予め、Ｏｆｆｓｅｔモードで得られる値を演算により取得し、取得した値を保持しておき、ＤＰモード実行時にその値を使用して処理を実行するようにしてもよい。これにより、二値化ニューラルネットワーク用プロセッサ１００において、Ｎｏｒｍモードの処理とＤＰモードの処理とをＣＰＵの制御を介在せずに連続して実行することができる。 Also, in the above embodiment, the case where the binarized neural network processor 100 executes processing in three modes ((1) Offset mode, (2) Norm mode, and (3) DP mode) has been described. It is not limited to this. For example, the binarized neural network processor 100 executes processing in (1) Norm mode and (2) DP mode, and the processing in the DP mode includes the processing in the Offset mode described in the above embodiment. You may do so. In addition, the binarized neural network processor 100 obtains a value obtained in the Offset mode by calculation in advance, holds the obtained value, and executes processing using the value when executing the DP mode. It may be. Thereby, in the binarized neural network processor 100, the processing in the Norm mode and the processing in the DP mode can be executed continuously without intervention of the CPU.

上記実施形態では、内積処理部がＢＮＮの処理の一部を実行する場合について説明したが、これに限定されることはなく、例えば、演算処理部ＰＬ１の内積処理部３において、活性化関数の処理（例えば、ＲｅＬＵ関数の処理）を実行するようにしてもよい。また、活性化関数の処理（例えば、ＲｅＬＵ関数の処理）は、内積処理部３および制御部ＣＰＵ１で実行されるものであってもよい。 In the above embodiment, the case where the inner product processing unit executes a part of the process of the BNN has been described. However, the present invention is not limited to this. For example, in the inner product processing unit 3 of the arithmetic processing unit PL1, the activation function Processing (for example, processing of a ReLU function) may be executed. The processing of the activation function (for example, the processing of the ReLU function) may be executed by the inner product processing unit 3 and the control unit CPU1.

上記実施形態では、内部ＲＡＭの個数については特に限定せず説明したが、内部ＲＡＭは、複数個のＲＡＭにより構成されるものであってもよいし、また、二値化ニューラルネットワーク用プロセッサの外部に設けたＲＡＭ（例えば、ＤＲＡＭ）等を用いて、上記実施形態の処理を実行するようにしてもよい。 In the above embodiment, the number of internal RAMs has been described without particular limitation. However, the internal RAM may be configured by a plurality of RAMs, or may be an external RAM of a binary neural network processor. The processing of the above-described embodiment may be executed using a RAM (for example, a DRAM) provided in the CPU.

上記実施形態において、スカラー、ベクトル、行列で表現したデータについては、一例であり、上記に限定されるものではない。ＢＮＮの処理に応じて、スカラー、ベクトル、テンソルのデータとして、二値化ニューラルネットワーク用プロセッサ１００が、上記と同様の処理を実行してもよい。 In the above embodiment, the data represented by the scalar, the vector, and the matrix are examples, and are not limited to the above. In accordance with the processing of the BNN, the binarized neural network processor 100 may execute the same processing as described above as scalar, vector, and tensor data.

上記実施形態で説明した二値化ニューラルネットワーク用プロセッサ１００の各ブロック（各機能部）は、ＬＳＩなどの半導体装置により個別に１チップ化されても良いし、一部又は全部を含むように１チップ化されても良い。また、上記実施形態で説明した二値化ニューラルネットワーク用プロセッサ１００の各ブロック（各機能部）は、複数のＬＳＩなどの半導体装置により実現されるものであってもよい。 Each block (each functional unit) of the binarized neural network processor 100 described in the above embodiment may be individually formed into a single chip by a semiconductor device such as an LSI, or may be configured to include a part or the entirety. It may be made into a chip. Each block (each functional unit) of the binary neural network processor 100 described in the above embodiment may be realized by a semiconductor device such as a plurality of LSIs.

なお、ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Here, the LSI is used, but depending on the degree of integration, it may be called an IC, a system LSI, a super LSI, or an ultra LSI.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路又は汎用プロセサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用しても良い。 Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure connection and setting of circuit cells inside the LSI may be used.

また、上記各実施形態の各機能ブロックの処理の一部または全部は、プログラムにより実現されるものであってもよい。そして、上記各実施形態の各機能ブロックの処理の一部または全部は、コンピュータにおいて、中央演算装置（ＣＰＵ）により行われる。また、それぞれの処理を行うためのプログラムは、ハードディスク、ＲＯＭなどの記憶装置に格納されており、ＲＯＭにおいて、あるいはＲＡＭに読み出されて実行される。 Further, a part or all of the processing of each functional block in each of the above embodiments may be realized by a program. Part or all of the processing of each functional block in each of the above embodiments is performed by a central processing unit (CPU) in a computer. Further, programs for performing the respective processes are stored in a storage device such as a hard disk or a ROM, and are executed by being read from the ROM or from the RAM.

また、上記実施形態の各処理をハードウェアにより実現してもよいし、ソフトウェア（ＯＳ（オペレーティングシステム）、ミドルウェア、あるいは、所定のライブラリとともに実現される場合を含む。）により実現してもよい。さらに、ソフトウェアおよびハードウェアの混在処理により実現しても良い。 Further, each process of the above embodiment may be realized by hardware, or may be realized by software (including a case where it is realized together with an OS (Operating System), middleware, or a predetermined library). Further, it may be realized by mixed processing of software and hardware.

例えば、上記実施形態（変形例を含む）の各機能部を、ソフトウェアにより実現する場合、図６に示したハードウェア構成（例えば、ＣＰＵ、ＲＯＭ、ＲＡＭ、入力部、出力部等をバスＢｕｓにより接続したハードウェア構成）を用いて、各機能部をソフトウェア処理により実現するようにしてもよい。 For example, when each functional unit of the above-described embodiment (including modified examples) is implemented by software, the hardware configuration (for example, a CPU, a ROM, a RAM, an input unit, and an output unit) illustrated in FIG. Each functional unit may be realized by software processing using a connected hardware configuration).

また、上記実施形態における処理方法の実行順序は、必ずしも、上記実施形態の記載に制限されるものではなく、発明の要旨を逸脱しない範囲で、実行順序を入れ替えることができるものである。 Further, the execution order of the processing method in the above embodiment is not necessarily limited to the description of the above embodiment, and the execution order can be changed without departing from the gist of the invention.

前述した方法をコンピュータに実行させるコンピュータプログラム及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体は、本発明の範囲に含まれる。ここで、コンピュータ読み取り可能な記録媒体としては、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、大容量ＤＶＤ、次世代ＤＶＤ、半導体メモリを挙げることができる。 A computer program that causes a computer to execute the above-described method and a computer-readable recording medium that records the program are included in the scope of the present invention. Here, examples of the computer-readable recording medium include a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, large-capacity DVD, next-generation DVD, and semiconductor memory. .

上記コンピュータプログラムは、上記記録媒体に記録されたものに限られず、電気通信回線、無線又は有線通信回線、インターネットを代表とするネットワーク等を経由して伝送されるものであってもよい。 The computer program is not limited to one recorded on the recording medium, and may be transmitted via a telecommunication line, a wireless or wired communication line, a network represented by the Internet, or the like.

また、文言「部」は、「サーキトリー（ｃｉｒｃｕｉｔｒｙ）」を含む概念であってもよい。サーキトリーは、ハードウェア、ソフトウェア、あるいは、ハードウェアおよびソフトウェアの混在により、その全部または一部が、実現されるものであってもよい。 In addition, the word “part” may be a concept including “circuitry”. The circuit may be realized in whole or in part by hardware, software, or a mixture of hardware and software.

なお、本発明の具体的な構成は、前述の実施形態に限られるものではなく、発明の要旨を逸脱しない範囲で種々の変更および修正が可能である。 The specific configuration of the present invention is not limited to the above embodiment, and various changes and modifications can be made without departing from the spirit of the invention.

１００二値化ニューラルネットワーク用プロセッサ
ＰＬ１演算処理部
１ＤＭＡ制御部
２量子化処理部
Ｒ１内部ＲＡＭ
３内積処理部
３４マイクロコード取得部
３５ＡＬＵ 100 Binary Neural Network Processor PL1 Arithmetic Processing Unit 1 DMA Control Unit 2 Quantization Processing Unit R1 Internal RAM
3 Inner product processing unit 34 Microcode acquisition unit 35 ALU

Claims

A neural network processor for executing a multivalued neural network process including a convolutional layer process and a fully connected layer process,
A control unit that sets a scaling coefficient vector that is real number vector data, and sets a multi-value basis matrix having multi-value data as an element;
A quantization processing unit that performs a quantization process on the feature map input to the convolutional layer and the feature vector input to the fully connected layer, the minimum value of the feature map and the minimum value of the feature vector Is set to an offset value smaller than a predetermined value, and the quantization process is performed using a quantization width obtained based on the maximum value and the minimum value of the feature map and the feature vector. The quantization processing unit;
(1) executing a norm mode for calculating a norm of the feature map and the feature vector; and (2) performing an inner product calculation process using the multi-valued base matrix and the feature map or the feature vector after the quantization process. An inner product processing unit that performs processing of the norm mode, and processing of the combination of the inner product operation modes, thereby performing the processing of the convolutional layer and the processing of the fully connected layer. When,
A neural network processor comprising:

The inner product processing unit,
A microcode acquisition unit for acquiring the norm mode microcode and the inner product operation mode microcode,
An arithmetic operation processing unit that performs arithmetic operation processing based on microcode;
With
(1) When set to norm mode,
The microcode acquisition unit acquires the norm mode microcode,
The arithmetic processing unit executes the arithmetic processing based on the norm mode microcode,
(2) When the inner product calculation mode is set,
The microcode acquisition unit acquires the inner product operation mode microcode,
The arithmetic operation processing unit executes the arithmetic operation processing based on the inner product operation mode microcode,
The neural network processor according to claim 1.

The inner product processing unit,
When performing the processing of the convolution layer,
(1) The norm mode processing is repeatedly executed for the number of feature maps of the convolutional layer to be processed,
(2) The process of the inner product calculation mode is repeatedly executed by the number of outputs of the convolutional layer to be processed each time the process of the norm mode is performed for each feature map.
The neural network processor according to claim 1.

The inner product processing unit,
When performing the processing of the fully coupled layer,
(1) The process in the norm mode is executed once for all the connected layers to be processed,
(2) The process of the inner product calculation mode is repeatedly executed for the number of outputs of all the connected layers to be processed.
The neural network processor according to claim 1.

A neural network processing method for performing a multi-valued neural network process including a convolutional layer process and a fully connected layer process,
A control step of setting a scaling coefficient vector which is real vector data, and setting a multi-value basis matrix having multi-value data as elements;
A quantization processing step of performing a quantization process on the feature map input to the convolutional layer and the feature vector input to the fully connected layer, the minimum value of the feature map and the minimum value of the feature vector Is set to an offset value smaller than a predetermined value, and the quantization process is performed using a quantization width obtained based on the maximum value and the minimum value of the feature map and the feature vector. Said quantization processing step;
(1) executing a norm mode for calculating a norm of the feature map and the feature vector; and (2) performing an inner product calculation process using the multi-valued base matrix and the feature map or the feature vector after the quantization process. An inner product processing step of performing the processing of the norm mode, and the processing of the combination of the inner product calculation modes, thereby performing the processing of the convolutional layer and the processing of the fully connected layer. When,
A neural network processing method comprising:

A program for causing a computer to execute the neural network processing method according to claim 5.