JP2021022087A

JP2021022087A - Processor for neural network, processing method for neural network, and program

Info

Publication number: JP2021022087A
Application number: JP2019137468A
Authority: JP
Inventors: 松本　真人; Masato Matsumoto; 真人松本
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2021-02-18
Anticipated expiration: 2039-07-26
Also published as: JP7253468B2; WO2021019815A1

Abstract

To provide a processor for neural network which executes high-performance neural network processing while suppressing increase in hardware scale.SOLUTION: A processor for neural network 100 writes data for processing a feature extraction layer which is fixed when a model of a neural network is decided into a ROM which requires small hardware scale, and holds data for processing a determination layer which is likely to be modified in a data-rewritable random access memory. The processor for neural network 100 executes processing using the neural network in the above environment. Thus, high-performance neural network processing can be executed while suppressing increase in hardware scale.SELECTED DRAWING: Figure 1

Description

本発明は、ニューラルネットワークの技術に関する。 The present invention relates to a neural network technique.

近年、ニューラルネットワーク技術の１つである、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いた多様な技術が開発されている（例えば、特許文献１を参照）。ＣＮＮの中でも、中間層を多く設けたＤＣＮＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いた技術が、多様な分野で成果を上げているため、特に注目を集めている。 In recent years, various techniques using CNN (Convolutional Neural Network), which is one of the neural network techniques, have been developed (see, for example, Patent Document 1). Among the CNNs, the technology using DCNN (Deep Convolutional Neural Network) with many intermediate layers has been attracting particular attention because it has achieved results in various fields.

特開２０１５−１９７７０２号公報JP 2015-197702

ＤＣＮＮは、一般物体認識やセマンティックセグメンテーション等の様々なタスクにおいて高い認識性能を実現している。その一方で、ＤＣＮＮは、処理を実行するために必要なパラメータ数が非常に多いため、ＤＣＮＮをハードウェアにより実現する場合、そのハードウェア規模が大きくなる。特に、パラメータ（例えば、重み係数のデータ）を記憶保持するためのメモリを実装するためのハードウェア規模が大きくなるため、ＤＣＮＮをハードウェアにより実現する場合のコストを低減させることが困難である。 DCNN realizes high recognition performance in various tasks such as general object recognition and semantic segmentation. On the other hand, since the number of parameters required for executing the process of DCNN is very large, the hardware scale becomes large when DCNN is realized by hardware. In particular, since the scale of hardware for mounting a memory for storing and holding parameters (for example, weight coefficient data) becomes large, it is difficult to reduce the cost when DCNN is realized by hardware.

そこで、本発明は、上記課題に鑑み、ハードウェア規模の増大を抑えつつ、高性能なニューラルネットワーク処理を実行するニューラルネットワーク用プロセッサ、ニューラルネットワーク用データ処理方法、および、プログラムを実現することを目的とする。 Therefore, in view of the above problems, it is an object of the present invention to realize a neural network processor, a neural network data processing method, and a program that execute high-performance neural network processing while suppressing an increase in hardware scale. And.

上記課題を解決するために、第１の発明は、特徴抽出層の処理と判定層の処理とを含むニューラルネットワーク用処理を実行するためのニューラルネットワーク用プロセッサであって、読み出し専用メモリと、ランダムアクセスメモリと、制御部と、内積処理部と、を備える。 In order to solve the above problems, the first invention is a neural network processor for executing neural network processing including processing of a feature extraction layer and processing of a determination layer, and is a read-only memory and random. It includes an access memory, a control unit, and an internal product processing unit.

読み出し専用メモリは、特徴抽出層の処理用のデータを記憶保持する。 The read-only memory stores and holds data for processing in the feature extraction layer.

ランダムアクセスメモリは、データの読み出し、および、データ書き込みを行うことができ、判定層の処理用のデータを記憶保持する。 The random access memory can read data and write data, and stores and holds data for processing in the determination layer.

制御部は、読み出し専用メモリと、ランダムアクセスメモリとを制御する。 The control unit controls the read-only memory and the random access memory.

内積処理部は、読み出し専用メモリから特徴抽出層の処理用のデータを第１データとして読み出し、第１データを用いて、特徴抽出層の処理を実行するとともに、ランダムアクセスメモリから判定層の処理用のデータを第２データとして読み出し、第２データを用いて、判定層の処理を実行する。 The inner product processing unit reads the data for processing the feature extraction layer from the read-only memory as the first data, executes the processing of the feature extraction layer using the first data, and processes the determination layer from the random access memory. Data is read out as the second data, and the processing of the determination layer is executed using the second data.

このニューラルネットワーク用プロセッサでは、ニューラルネットワークのモデルを決定したら確定する特徴抽出層の処理用データ（例えば、重み係数データ）を、ハードウェア化したときのハード規模が小さくてすむ読み出し専用メモリ（ＲＯＭ）に書き込み、データが変更される可能性のある判定層の処理用データ（例えば、重み係数データ）を、データ書き換えができるランダムアクセスメモリ（ＲＡＭ）に保持する。そして、このニューラルネットワーク用プロセッサでは、上記の状態により、ニューラルネットワークによる処理を実行する。つまり、このニューラルネットワーク用プロセッサでは、変更されることがない特徴抽出層の処理用データ（例えば、重み係数データ）を、ハードウェア規模が小さくてすむ読み出し専用メモリで保持し、変更する可能性がある判定層の処理用データ（例えば、重み係数データ）をランダムアクセスメモリで保持して、ニューラルネットワーク用処理を実行するので、ハードウェア規模の増大を抑えつつ、高性能なニューラルネットワーク処理を実行することができる。 In this neural network processor, read-only memory (ROM) that requires a small hardware scale when the processing data (for example, weighting coefficient data) of the feature extraction layer, which is determined once the model of the neural network is determined, is converted into hardware. The processing data (for example, weighting coefficient data) of the determination layer whose data may be changed by writing to is held in a random access memory (RAM) capable of rewriting the data. Then, in this neural network processor, the processing by the neural network is executed according to the above state. That is, in this neural network processor, there is a possibility that the processing data (for example, weight coefficient data) of the feature extraction layer, which is not changed, is held and changed in the read-only memory which requires a small hardware scale. Since the processing data (for example, weight coefficient data) of a certain judgment layer is held in the random access memory and the neural network processing is executed, the high-performance neural network processing is executed while suppressing the increase in the hardware scale. be able to.

第２の発明は、第１の発明であって、読み出し専用メモリは、特徴抽出層の処理用のデータの読み出し時間を特定するためのデータを含むヘッダデータと、特徴抽出層の処理を実行するときに必要となる順番に、特徴抽出層の処理用のデータとを出力できるように、特徴抽出層の処理用のデータを記憶保持している。 The second invention is the first invention, in which the read-only memory executes the processing of the feature extraction layer and the header data including the data for specifying the read time of the data for the processing of the feature extraction layer. The data for processing of the feature extraction layer is stored and retained so that the data for processing of the feature extraction layer can be output in the order required.

そして、制御部が読み出し専用メモリに対して、読み出し指令信号を出力した場合、読み出し専用メモリは、特徴抽出層の処理用のデータを、特徴抽出層の処理を実行するときに必要となる順番に出力する。 Then, when the control unit outputs a read command signal to the read-only memory, the read-only memory supplies the data for processing of the feature extraction layer in the order required when executing the processing of the feature extraction layer. Output.

このニューラルネットワーク用プロセッサでは、特徴抽出層のネットワーク構成にしたがって確定する重み係数データを、予測処理を実行するときに必要となる順番に（予測処理のときの処理順に）読み出されるように、読み出し専用メモリに記憶している。したがって、このニューラルネットワーク用プロセッサでは、予測処理実行時において、読み出し指令信号（例えば、単発のトリガー信号）を読み出し専用メモリに入力するだけで、内積処理部は、予測処理を実行するときに必要となる順番に、特徴抽出層の処理のための重み係数データを取得することができる。つまり、このニューラルネットワーク用プロセッサでは、読み出し指令信号（例えば、単発のトリガー信号）を読み出し専用メモリに入力するだけで、必要な特徴抽出層の処理のための重み係数データが、処理順に取得できるため、従来技術のように、ＲＯＭに対する複雑なアドレス指定を行う必要がない。 In this neural network processor, the weighting coefficient data determined according to the network configuration of the feature extraction layer is read-only so that it is read out in the order required when executing the prediction processing (in the processing order at the time of the prediction processing). It is stored in the memory. Therefore, in this neural network processor, only a read command signal (for example, a single trigger signal) is input to the read-only memory at the time of executing the prediction processing, and the internal product processing unit is required when executing the prediction processing. The weight coefficient data for processing the feature extraction layer can be acquired in this order. That is, in this neural network processor, the weight coefficient data for processing the necessary feature extraction layer can be acquired in the processing order only by inputting the read command signal (for example, a single trigger signal) to the read-only memory. , It is not necessary to specify a complicated address for the ROM as in the prior art.

第３の発明は、第２の発明であって、制御部は、読み出し専用メモリに記憶されているヘッダデータを読み出し、当該ヘッダデータに基づいて、読み出し専用メモリから、特徴抽出層の処理用のデータの出力処理が完了する時刻を特定し、特定した時刻よりも前の時刻において、判定層の処理用のデータがランダムアクセスメモリに記憶保持されている状態となるように制御する。 The third invention is the second invention, in which the control unit reads the header data stored in the read-only memory, and based on the header data, from the read-only memory for processing the feature extraction layer. The time when the data output processing is completed is specified, and the data for the processing of the determination layer is controlled to be stored and held in the random access memory at the time before the specified time.

このニューラルネットワーク用プロセッサでは、読み出し専用メモリに記憶されているヘッダデータにより、特徴抽出層の処理用のデータ（例えば、重み係数データ）を読み出す時間を特定することができる。したがって、このニューラルネットワーク用プロセッサでは、特定した特徴抽出層の処理用のデータ（例えば、重み係数データ）を読み出す時間に基づいて、判定層の処理用のデータ（重み係数データ）を、例えば、所定のＲＯＭからランダムアクセスメモリに予め転送しておくことができる。これにより、このニューラルネットワーク用プロセッサでは、特徴抽出層の処理が完了すると、すぐに、判定層の処理用のデータ（例えば、重み係数データ）をランダムアクセスメモリから取得できる（読み出すことができる）状態にすることができる。つまり、このニューラルネットワーク用プロセッサでは、特徴抽出層の処理が完了すると、すぐに、判定層の処理が実行できる状態にできる。その結果、このニューラルネットワーク用プロセッサでは、ニューラルネットワーク用処理を高速化することができる（判定層の処理が完了するまでの時間を短くすることができる）。 In this neural network processor, it is possible to specify the time to read the data for processing the feature extraction layer (for example, the weighting coefficient data) from the header data stored in the read-only memory. Therefore, in this neural network processor, data for processing (weight coefficient data) of the determination layer is, for example, predetermined based on the time for reading data for processing (for example, weight coefficient data) of the specified feature extraction layer. It can be transferred in advance from the ROM of the above to the random access memory. As a result, in this neural network processor, as soon as the processing of the feature extraction layer is completed, the data for processing the determination layer (for example, the weighting coefficient data) can be acquired (read) from the random access memory. Can be. That is, in this neural network processor, as soon as the processing of the feature extraction layer is completed, the processing of the determination layer can be executed. As a result, in this neural network processor, the processing for the neural network can be speeded up (the time until the processing of the determination layer is completed can be shortened).

第４の発明は、第１から第３のいずれかの発明であって、圧縮されたデータに対して伸張処理を実行する伸張部をさらに備える。 The fourth invention is any one of the first to third inventions, further comprising a decompression unit that executes decompression processing on the compressed data.

読み出し専用メモリは、特徴抽出層の処理用のデータを圧縮されたデータとして記憶保持している。 The read-only memory stores and holds the data for processing of the feature extraction layer as compressed data.

伸張部は、読み出しメモリに記憶保持されている圧縮されたデータに対して伸張処理を実行する。 The decompression unit executes decompression processing on the compressed data stored and held in the read memory.

内積処理部は、伸張部により伸張されたデータを用いて、特徴抽出層の処理を実行する。 The inner product processing unit executes the processing of the feature extraction layer using the data expanded by the expansion unit.

このニューラルネットワーク用プロセッサでは、特徴抽出層のデータを、圧縮したデータとして、読み出し専用メモリに記憶保持するため、さらに、読み出し専用メモリで必要となるメモリ容量が少なくなる。したがって、このニューラルネットワーク用プロセッサでは、読み出し専用メモリのハードウェア規模をさらに小さくすることができる。その結果、このニューラルネットワーク用プロセッサのハードウェア規模を、さらに小さくできる。 In this neural network processor, the data of the feature extraction layer is stored and held in the read-only memory as compressed data, so that the memory capacity required for the read-only memory is further reduced. Therefore, in this neural network processor, the hardware scale of the read-only memory can be further reduced. As a result, the hardware scale of this neural network processor can be further reduced.

第５の発明は、特徴抽出層の処理と判定層の処理とを含むニューラルネットワーク用処理を実行するためのニューラルネットワーク用プロセッサであって、
特徴抽出層の処理用のデータを記憶保持する読み出し専用メモリと、
データの読み出し、および、データ書き込みを行うことができ、判定層の処理用のデータを記憶保持するランダムアクセスメモリと、
読み出し専用メモリと、ランダムアクセスメモリとを制御する制御部と、
を備えるニューラルネットワーク用プロセッサを用いて実行するニューラルネットワーク用処理方法である。ニューラルネットワーク用処理方法は、第１ステップと、第２ステップとを備える。 A fifth invention is a neural network processor for executing neural network processing including processing of a feature extraction layer and processing of a determination layer.
A read-only memory that stores and holds data for processing in the feature extraction layer,
A random access memory that can read and write data and stores and holds data for processing in the judgment layer,
A control unit that controls read-only memory and random access memory,
It is a processing method for a neural network executed by using a processor for a neural network provided with. The processing method for a neural network includes a first step and a second step.

第１ステップは、読み出し専用メモリから特徴抽出層の処理用のデータを第１データとして読み出し、第１データを用いて、特徴抽出層の処理を実行する。 In the first step, the data for processing the feature extraction layer is read from the read-only memory as the first data, and the processing of the feature extraction layer is executed using the first data.

第２ステップは、ランダムアクセスメモリから判定層の処理用のデータを第２データとして読み出し、第２データを用いて、判定層の処理を実行する。 In the second step, the data for processing the determination layer is read from the random access memory as the second data, and the processing of the determination layer is executed using the second data.

これにより、第１の発明と同様の効果を奏するニューラルネットワーク用処理方法を実現することができる。 As a result, it is possible to realize a processing method for a neural network that has the same effect as that of the first invention.

第６の発明は、第５の発明であるニューラルネットワーク用処理方法をコンピュータに実行させるためのプログラムである。 The sixth invention is a program for causing a computer to execute the processing method for a neural network according to the fifth invention.

これにより、第１の発明と同様の効果を奏するニューラルネットワーク用処理方法をコンピュータに実行させるためのプログラムを実現することができる。 Thereby, it is possible to realize a program for causing a computer to execute a processing method for a neural network having the same effect as that of the first invention.

本発明によれば、ハードウェア規模の増大を抑えつつ、高性能なニューラルネットワーク処理を実行するニューラルネットワーク用プロセッサ、ニューラルネットワーク用データ処理方法、および、プログラムを実現することができる。 According to the present invention, it is possible to realize a neural network processor, a neural network data processing method, and a program that execute high-performance neural network processing while suppressing an increase in hardware scale.

第１実施形態に係るニューラルネットワーク用プロセッサ１００の概略構成図。The schematic block diagram of the neural network processor 100 which concerns on 1st Embodiment. ニューラルネットワーク用プロセッサ１００で実行される処理のタイミングチャート。A timing chart of processing executed by the neural network processor 100. ニューラルネットワーク用プロセッサ１００で実行される処理のフローチャート。The flowchart of the process executed by the neural network processor 100. 第１実施形態の第１変形例に係るニューラルネットワーク用プロセッサ１００Ａの概略構成図。The schematic block diagram of the neural network processor 100A which concerns on 1st modification of 1st Embodiment. ＣＰＵバス構成を示す図。The figure which shows the CPU bus configuration.

［第１実施形態］
第１実施形態について、図面を参照しながら、以下、説明する。 [First Embodiment]
The first embodiment will be described below with reference to the drawings.

＜１．１：ニューラルネットワーク用プロセッサの構成＞
図１は、第１実施形態に係るニューラルネットワーク用プロセッサ１００の概略構成図である。 <1.1: Configuration of processor for neural network>
FIG. 1 is a schematic configuration diagram of a neural network processor 100 according to the first embodiment.

ニューラルネットワーク用プロセッサ１００は、図１に示すように、第１インターフェース部ＩＦ１と、制御部１と、第１ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２と、第２ＲＯＭ３と、量子化処理部４と、第１ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）５と、第２ＲＡＭ６と、内積処理部７と、バスＢ１とを備える。図１に示すように、ニューラルネットワーク用プロセッサ１００の各機能部は、バスＢ１により接続されており、必要なデータ、コマンド等を、バスＢ１を介して、入出力することができる。なお、上記機能部の一部または全部は、バス接続ではなく、必要に応じて、直接接続されるものであってもよい。 As shown in FIG. 1, the neural network processor 100 includes a first interface unit IF1, a control unit 1, a first ROM (Read Only Memory) 2, a second ROM 3, a quantization processing unit 4, and a first RAM (1st RAM). It includes a Random Access Memory) 5, a second RAM 6, an inner product processing unit 7, and a bus B1. As shown in FIG. 1, each functional unit of the neural network processor 100 is connected by a bus B1, and necessary data, commands, and the like can be input / output via the bus B1. It should be noted that a part or all of the above functional parts may be directly connected, if necessary, instead of being connected by bus.

第１インターフェース部ＩＦ１は、外部から処理対象となるデータＤｉｎを入力し、ニューラルネットワーク用プロセッサ１００による処理結果を含むデータをデータＤｏｕｔとして外部に出力する。第１インターフェース部ＩＦ１は、入力したデータＤｉｎを、バスＢ１を介して、第１ＲＡＭ５に出力し（書き込み）、ニューラルネットワーク用プロセッサ１００による処理結果を含むデータを、バスＢ１を介して、第１ＲＡＭ５から取得する（読み出す）。 The first interface unit IF1 inputs the data Din to be processed from the outside, and outputs the data including the processing result by the neural network processor 100 to the outside as the data Dout. The first interface unit IF1 outputs (writes) the input data Din to the first RAM 5 via the bus B1, and outputs (writes) the data including the processing result by the neural network processor 100 from the first RAM 5 via the bus B1. Get (read).

制御部１は、ニューラルネットワーク用プロセッサ１００の全体制御、各機能部の制御およびニューラルネットワーク用処理に必要な処理を行う。制御部ＣＰＵ１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）やＣＰＵコアにより実現される。 The control unit 1 performs overall control of the neural network processor 100, control of each functional unit, and processing necessary for neural network processing. The control unit CPU 1 is realized by a CPU (Central Processing Unit) or a CPU core.

第１ＲＯＭ２は、読み出し専用メモリであって、特徴抽出層の処理に用いられるデータ（例えば、パラメータのデータ、ニューラルネットワークの重み係数のデータ）を記憶保持する。第１ＲＯＭ２は、図１に示すように、バスＢ１に接続されており、また、内積処理部７に接続されている。なお、第１ＲＯＭ２は、バスＢ１を介して、内積処理部７に接続されるものであってもよい。第１ＲＯＭ２は、制御部１からのトリガー信号Ｓｉｇ＿ｔｒｇを受信すると、例えば、ヘッダデータを、バスＢ１を介して、制御部１に出力するともに、ヘッダデータおよび特徴抽出層の処理に用いられるデータを内積処理部７に連続的に出力する。 The first ROM 2 is a read-only memory that stores and holds data used for processing the feature extraction layer (for example, parameter data and neural network weighting coefficient data). As shown in FIG. 1, the first ROM 2 is connected to the bus B1 and is also connected to the inner product processing unit 7. The first ROM 2 may be connected to the inner product processing unit 7 via the bus B1. When the first ROM 2 receives the trigger signal Sig_trg from the control unit 1, for example, the header data is output to the control unit 1 via the bus B1, and the header data and the data used for the processing of the feature extraction layer are inner products. It is continuously output to the processing unit 7.

第２ＲＯＭ３は、読み出し専用メモリであって、バスＢ１に接続されている。第２ＲＯＭ３は、判定層の処理に用いられるデータ（例えば、パラメータのデータ、ニューラルネットワークの重み係数のデータ）を記憶保持する。第２ＲＯＭ３は、制御部１からの指令に従い、判定層の処理に用いられるデータ（例えば、パラメータのデータ、ニューラルネットワークの重み係数のデータ）を、バスＢ１を介して、第２ＲＡＭ６に転送する。 The second ROM 3 is a read-only memory and is connected to the bus B1. The second ROM 3 stores and holds data (for example, parameter data, neural network weighting coefficient data) used for processing of the determination layer. The second ROM 3 transfers data used for processing of the determination layer (for example, parameter data, neural network weighting coefficient data) to the second RAM 6 via the bus B1 in accordance with a command from the control unit 1.

量子化処理部４は、第１ＲＡＭとデータの送受信ができるように接続されている。量子化処理部４は、特徴抽出層（例えば、ＤＣＮＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）の畳み込み層）の入力である特徴マップのデータに対して、量子化処理を行う。また、量子化処理部４は、判定層（例えば、ＤＣＮＮの全結合層）の入力データに対して、量子化処理を行う。なお、量子化処理部４は、量子化処理の対象のデータを、第１ＲＡＭ５の所定の領域から読み出し、量子化処理後のデータを、第１ＲＡＭ５の所定の領域に書き込む。また、量子化処理部４は、図１に示すように、第１ＲＡＭ５と直接接続されるものであってもよいし、バス（例えば、バスＢ１）を介して接続されるものであってもよい。 The quantization processing unit 4 is connected to the first RAM so that data can be transmitted and received. The quantization processing unit 4 performs quantization processing on the data of the feature map which is the input of the feature extraction layer (for example, the convolutional layer of the DCNN (Deep Convolution Natural Network)). In addition, the quantization processing unit 4 performs quantization processing on the input data of the determination layer (for example, the fully connected layer of DCNN). The quantization processing unit 4 reads the data to be quantized from the predetermined area of the first RAM 5, and writes the data after the quantization processing to the predetermined area of the first RAM 5. Further, as shown in FIG. 1, the quantization processing unit 4 may be directly connected to the first RAM 5 or may be connected via a bus (for example, bus B1). ..

第１ＲＡＭ５は、ニューラルネットワーク用処理を実行するために必要なデータを記憶保持するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。第１ＲＡＭ５は、図１に示すように、バスＢ１に接続されており、また、量子化処理部４と接続されている。 The first RAM 5 is a RAM (Random Access Memory) for storing and holding data necessary for executing a neural network process. As shown in FIG. 1, the first RAM 5 is connected to the bus B1 and is also connected to the quantization processing unit 4.

第２ＲＡＭ６は、ランダムアクセスメモリである。第２ＲＡＭ６は、図１に示すように、バスＢ１に接続されており、また、内積処理部７に接続されている。第２ＲＡＭ６は、第２ＲＯＭ３から転送される、判定層の処理に用いられるデータ（例えば、パラメータのデータ、ニューラルネットワークの重み係数のデータ）を記憶保持する。第２ＲＡＭ６は、制御部１からの指令に従い、所定のタイミングで、記憶保持している判定層の処理に用いられるデータを内積処理部７に出力する。 The second RAM 6 is a random access memory. As shown in FIG. 1, the second RAM 6 is connected to the bus B1 and is also connected to the inner product processing unit 7. The second RAM 6 stores and holds data (for example, parameter data, neural network weighting coefficient data) transferred from the second ROM 3 and used for processing the determination layer. The second RAM 6 outputs the data used for the processing of the determination layer stored and held to the inner product processing unit 7 at a predetermined timing in accordance with the command from the control unit 1.

内積処理部７は、図１に示すように、第２インターフェース部７１と、内積演算処理部７２とを備える。内積処理部７は、図１に示すように、第１ＲＯＭ２、第１ＲＡＭ５、第２ＲＡＭ６、および、バスＢ１に接続されている。内積処理部７は、制御部１から、例えば、バスＢ１を介して、制御信号Ｃｔｒ１を受信し、受信した制御信号Ｃｔｒ１に従い、所定の処理を実行する。 As shown in FIG. 1, the inner product processing unit 7 includes a second interface unit 71 and an inner product calculation processing unit 72. As shown in FIG. 1, the inner product processing unit 7 is connected to the first ROM2, the first RAM5, the second RAM6, and the bus B1. The inner product processing unit 7 receives the control signal Ctr1 from the control unit 1 via, for example, the bus B1, and executes a predetermined process according to the received control signal Ctr1.

第２インターフェース部７１は、第１ＲＯＭ２、第１ＲＡＭ５、第２ＲＡＭ６とのインターフェースである。
（１）特徴抽出層の処理が実行される場合、第２インターフェース部７１は、第１ＲＯＭ２から、特徴抽出層の処理用のデータ（例えば、パラメータ、重み係数）Ｄ＿ｗｉｊ＿ｆｅａｔｕｒｅを読み出し、読み出したデータを、データＤ＿ｗｉｊとして、内積演算処理部７２に出力する。また、第２インターフェース部７１は、第１ＲＡＭ５の所定の領域から、量子化処理後のデータＤ＿Ｑｉｎを読み出し、読み出したデータを、データＤ＿Ｑｉｎとして、内積演算処理部７２に出力する。
（２）判定層の処理が実行される場合、第２インターフェース部７１は、第２ＲＡＭ６から、判定層の処理用のデータ（例えば、パラメータ、重み係数）Ｄ＿ｗｉｊ＿ｄｅｃｉを読み出し、読み出したデータを、データＤ＿ｗｉｊとして、内積演算処理部７２に出力する。また、第２インターフェース部７１は、第１ＲＡＭ５の所定の領域から、量子化処理後のデータＤ＿Ｑｉｎを読み出し、読み出したデータを、データＤ＿Ｑｉｎとして、内積演算処理部７２に出力する。 The second interface unit 71 is an interface with the first ROM 2, the first RAM 5, and the second RAM 6.
(1) When the processing of the feature extraction layer is executed, the second interface unit 71 reads out the data for processing the feature extraction layer (for example, parameters, weighting factors) D_wij_feature from the first ROM2, and reads the read data. It is output as data D_wij to the inner product calculation processing unit 72. Further, the second interface unit 71 reads the data D_Qin after the quantization process from the predetermined area of the first RAM 5, and outputs the read data as the data D_Qin to the inner product calculation processing unit 72.
(2) When the processing of the determination layer is executed, the second interface unit 71 reads the data for processing the determination layer (for example, parameter, weighting coefficient) D_wij_deci from the second RAM 6, and reads the read data into the data D_wij. Is output to the inner product calculation processing unit 72. Further, the second interface unit 71 reads the data D_Qin after the quantization process from the predetermined area of the first RAM 5, and outputs the read data as the data D_Qin to the inner product calculation processing unit 72.

内積演算処理部７２は、第２インターフェース部７１から出力されるデータＤ＿Ｑｉｎ、および、データＤ＿ｗｉｊを入力する。内積演算処理部７２は、データＤ＿Ｑｉｎ、および、データＤ＿ｗｉｊを用いて、内積演算処理を実行し、内積演算処理結果のデータをデータＤｏとして取得する。そして、内積演算処理部７２は、取得したデータＤｏを第１ＲＡＭ５に出力する。 The inner product calculation processing unit 72 inputs the data D_Qin and the data D_wij output from the second interface unit 71. The inner product calculation processing unit 72 executes the inner product calculation process using the data D_Qin and the data D_wij, and acquires the data of the inner product calculation processing result as the data Do. Then, the inner product calculation processing unit 72 outputs the acquired data Do to the first RAM 5.

＜１．２：ニューラルネットワーク用プロセッサの動作＞
以上のように構成されたニューラルネットワーク用プロセッサ１００の動作について、以下、説明する。 <1.2: Operation of neural network processor>
The operation of the neural network processor 100 configured as described above will be described below.

図２は、ニューラルネットワーク用プロセッサ１００で実行される処理のタイミングチャートである。 FIG. 2 is a timing chart of processing executed by the neural network processor 100.

図３は、ニューラルネットワーク用プロセッサ１００で実行される処理のフローチャートである。 FIG. 3 is a flowchart of processing executed by the neural network processor 100.

以下では、図１〜図３を参照しながら、ニューラルネットワーク用プロセッサ１００の動作について、説明する。 Hereinafter, the operation of the neural network processor 100 will be described with reference to FIGS. 1 to 3.

一般に、ＣＮＮでは、入力層と、畳み込み層（コンボリューション層）と、全結合層とを含む。例えば、ニューラルネットワーク用プロセッサ１００の第１インターフェース部ＩＦ１に、入力データＤｉｎとして、画像データが入力され、ＣＮＮによる画像認識処理が実行され、画像認識処理結果が出力データＤｏｕｔとして外部に出力される。 Generally, a CNN includes an input layer, a convolutional layer, and a fully connected layer. For example, image data is input as input data Din to the first interface unit IF1 of the neural network processor 100, image recognition processing by CNN is executed, and the image recognition processing result is output to the outside as output data Dout.

ＣＮＮでは、畳み込み層の処理、あるいは、全結合層の処理において、入力データに対して重み演算処理が実行され、当該処理結果に対して活性化関数（例えば、ランプ関数（ＲｅＬＵ：ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）、シグモイド関数、Ｓｏｆｔｍａｘ関数等）により処理が実行されることで、畳み込み層あるいは全結合層の出力が得られる。 In CNN, in the processing of the convolution layer or the processing of the fully connected layer, a weighting operation process is executed on the input data, and an activation function (for example, a ramp function (ReLU: Rectifier Liner Unit)) is performed on the processing result. , Sigmoid function, Softmax function, etc.), the output of the convolution layer or the fully connected layer can be obtained.

説明便宜のため、ニューラルネットワーク用プロセッサ１００において、ＣＮＮによる処理が実行される場合（一例）について、以下説明する。なお、ＣＮＮの畳み込み層がＮ層（Ｎ：自然数）からなるものとして、以下、説明する。 For convenience of explanation, a case where processing by CNN is executed in the neural network processor 100 (an example) will be described below. It should be noted that the convolutional layer of CNN will be described below assuming that it is composed of N layers (N: natural numbers).

（ステップＳ１）：
ステップＳ１において、ニューラルネットワーク用プロセッサ１００において、処理対象となるデータ（例えば、画像認識処理を行う場合、処理対象の画像データ）の入力処理を実行する。具体的には、ニューラルネットワーク用プロセッサ１００において、外部から処理対象となるデータＤｉｎを第１インターフェース部ＩＦ１により取得（入力）する。そして、データＤｉｎは、第１インターフェース部ＩＦ１から、バスＢ１を介して、第１ＲＡＭ５に転送され、データＤｉｎは、第１ＲＡＭ５の所定の領域に記憶保持される。 (Step S1):
In step S1, the neural network processor 100 executes input processing of data to be processed (for example, when performing image recognition processing, image data to be processed). Specifically, in the neural network processor 100, the data Din to be processed is acquired (input) from the outside by the first interface unit IF1. Then, the data Din is transferred from the first interface unit IF1 to the first RAM 5 via the bus B1, and the data Din is stored and held in a predetermined area of the first RAM 5.

また、ニューラルネットワーク用プロセッサ１００では、量子化処理が実行される。具体的には、量子化処理部４は、データＤｉｎを、第１ＲＡＭ５の所定の領域から読み出し、読み出したデータに対して、量子化処理を実行し、量子化後のデータをデータＤ＿Ｑｉｎとして取得する。そして、量子化処理部４は、取得した量子化後のデータＤ＿Ｑｉｎを第１ＲＡＭ５の所定の領域に書き込む。 Further, in the neural network processor 100, the quantization process is executed. Specifically, the quantization processing unit 4 reads data Din from a predetermined area of the first RAM 5, executes quantization processing on the read data, and acquires the data after quantization as data D_Qin. .. Then, the quantization processing unit 4 writes the acquired data D_Qin after quantization into a predetermined area of the first RAM 5.

なお、量子化処理は、データを量子化する処理に加えて、例えば、ダイナミックレンジ調整、二値化処理、オフセット調整処理を含むものであってもよい。 The quantization process may include, for example, a dynamic range adjustment, a binarization process, and an offset adjustment process in addition to the process of quantizing the data.

（ステップＳ２、ステップＳ３）：
ステップＳ２において、制御部１は、バスＢ１を介して、第１ＲＯＭ２に対して、トリガー信号Ｓｉｇ＿ｔｒｉｇを送信する。 (Step S2, Step S3):
In step S2, the control unit 1 transmits a trigger signal Sig_trig to the first ROM 2 via the bus B1.

第１ＲＯＭ２は、制御部１からのトリガー信号Ｓｉｇ＿ｔｒｉｇを受信すると、第１ＲＯＭ２に記憶保持されているヘッダデータＤ＿ｈｅａｄを読み出し、当該ヘッダデータＤ＿ｈｅａｄを、バスＢ１を介して、制御部１および内積処理部７の第２インターフェース部７１に出力する（ステップＳ３）。 When the first ROM 2 receives the trigger signal Sig_trig from the control unit 1, the first ROM 2 reads the header data D_head stored and held in the first ROM 2, and the header data D_head is transmitted to the control unit 1 and the inner product processing unit 7 via the bus B1. Is output to the second interface unit 71 (step S3).

ヘッダデータＤ＿ｈｅａｄには、例えば、特徴抽出層を構成する層の数、各層のデータ長、データ数等のデータが含まれている。 The header data D_head includes, for example, data such as the number of layers constituting the feature extraction layer, the data length of each layer, and the number of data.

（ステップＳ４１〜Ｓ４５）：
ステップＳ４１〜Ｓ４５では、特徴抽出層の層ごとの処理が、特徴抽出層の層数分、繰り返し実行される。 (Steps S41 to S45):
In steps S41 to S45, the processing for each layer of the feature extraction layer is repeatedly executed for the number of layers of the feature extraction layer.

第１ＲＯＭ２は、ヘッダデータＤ＿ｈｅａｄを出力した後、特徴抽出層のデータを、特徴抽出層の第１層のデータから第Ｎ層のデータまで、順番に、読み出し、読み出したデータをデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅとして、内積処理部７に出力する（重み係数データ取得処理、ステップＳ４２）。 After outputting the header data D_head, the first ROM 2 reads out the data of the feature extraction layer in order from the data of the first layer of the feature extraction layer to the data of the Nth layer, and the read data is internally loaded as data D_wij_feature. Output to the processing unit 7 (weight coefficient data acquisition processing, step S42).

具体的には、（１）特徴抽出層の第１層の重み係数データ、（２）特徴抽出層の第２層の重み係数データ、・・・、（Ｎ）特徴抽出層の第Ｎ層の重み係数データが、上記順番で、第１ＲＯＭ２から読み出され、データＤ＿ｗｉｊ＿ｆｅａｔｕｒｅとして、内積処理部７に出力される。 Specifically, (1) weight coefficient data of the first layer of the feature extraction layer, (2) weight coefficient data of the second layer of the feature extraction layer, ..., (N) Nth layer of the feature extraction layer. The weighting coefficient data are read from the first ROM 2 in the above order, and are output to the inner product processing unit 7 as data D_wij_feature.

ステップＳ４３では、内積演算処理が実行される。具体的には、以下のようにして、内積演算処理が実行される。 In step S43, the inner product calculation process is executed. Specifically, the inner product calculation process is executed as follows.

内積処理部７の第２インターフェース部７１は、第１ＲＯＭ２から出力される特徴抽出層の第１層の重み係数データを、データＤ＿ｗｉｊ＿ｆｅａｔｕｒｅとして、取得し、当該データをデータＤ＿ｗｉｊとして、内積演算処理部７２に出力する。 The second interface unit 71 of the inner product processing unit 7 acquires the weighting coefficient data of the first layer of the feature extraction layer output from the first ROM 2 as data D_wij_feature, and uses the data as data D_wij to obtain the inner product calculation processing unit 72. Output to.

また、内積処理部７の第２インターフェース部７１は、特徴抽出層の第１層に入力するデータの量子化後のデータを、第１ＲＡＭ５から読み出し、データＤ＿Ｑｉｎとして取得する。そして、第２インターフェース部７１は、取得したデータＤ＿Ｑｉｎを内積演算処理部７２に出力する。 Further, the second interface unit 71 of the inner product processing unit 7 reads the quantized data of the data input to the first layer of the feature extraction layer from the first RAM 5 and acquires it as data D_Qin. Then, the second interface unit 71 outputs the acquired data D_Qin to the inner product calculation processing unit 72.

内積演算処理部７２は、データＤ＿Ｑｉｎ、および、データＤ＿ｗｉｊを用いて、内積演算処理を実行する。例えば、データＤ＿Ｑｉｎがｎ次元ベクトルデータ（ｎ：自然数）であり、データＤ＿ｗｉｊ（重み係数データ）がｎ次元ベクトルデータであり、下記のように表現される場合、
Ｄ＿Ｑｉｎ＝［ｘ_１ｘ_２ｘ_３・・・ｘ_ｎ］
Ｄ＿ｗｉｊ＝［ｗ_１ｗ_２ｗ_３・・・ｗ_ｎ］
内積演算処理部７２は、２つのｎ次元ベクトルの内積を算出する処理、すなわち、下記数式に相当する処理を実行することで、内積演算結果を取得する。

なお、内積演算処理部７２は、行列演算処理を行う場合、上記内積演算処理を繰り返し実行することで、行列演算処理結果を取得する。例えば、内積演算処理部７２は、２つの行列Ａ、Ｂの積（行列の積）を求める演算を行う場合、行列Ａのｉ行目の１行に含まれる要素からなる行ベクトルと、行列Ｂのｊ列目の１列に含まれる要素からなる列ベクトルとに対して、上記と同様に内積演算処理を実行することで、行列Ａ、Ｂの積により取得される行列のｉ行ｊ列目の要素の値（データ）を取得することができる。 The inner product calculation processing unit 72 executes the inner product calculation process using the data D_Qin and the data D_wij. For example, when the data D_Qin is n-dimensional vector data (n: natural number) and the data D_wij (weight coefficient data) is n-dimensional vector data and is expressed as follows:
D_Qin = [x ₁ x ₂ x ₃ ... x _n ]
D_wij = [w ₁ w ₂ w ₃ ... w _n ]
The inner product calculation processing unit 72 acquires the inner product calculation result by executing a process of calculating the inner product of two n-dimensional vectors, that is, a process corresponding to the following mathematical expression.

When performing the matrix operation processing, the inner product calculation processing unit 72 acquires the matrix operation processing result by repeatedly executing the inner product calculation processing. For example, when the inner product calculation processing unit 72 performs an operation for obtaining the product (matrix product) of two matrices A and B, a row vector composed of elements included in the first row of the i-th row of the matrix A and a matrix B The i-th row and j-th column of the matrix obtained by the product of the matrices A and B by executing the internal product operation processing in the same manner as above for the column vector consisting of the elements included in the first column of the j-th column of. You can get the value (data) of the element of.

このように処理することで、内積演算処理部７２では、任意の行列演算の処理結果を取得することができる。 By processing in this way, the inner product calculation processing unit 72 can acquire the processing result of an arbitrary matrix operation.

内積演算処理部７２では、上記のようにして、特徴抽出層の第１層の処理（例えば、ＣＮＮのコンボリューション処理（データに対する重み付け加算処理））を実行し、特徴抽出層の第１層の各シナプスの出力に相当するデータを取得する。 In the inner product calculation processing unit 72, the processing of the first layer of the feature extraction layer (for example, the convolution processing of CNN (weighting addition processing for data)) is executed as described above, and the first layer of the feature extraction layer Acquire the data corresponding to the output of each synapse.

なお、内積演算処理部７２には、特徴抽出層の第１層の処理の順番に一致した、重み係数データが当該順番の通りに、データＤ＿ｗｉｊとして入力される。したがって、内積演算処理部７２では、当該データＤ＿ｗｉｊと、それに対応する量子化後のデータ（処理対象のデータ（第１ＲＡＭ５から読み出したデータ））とに対して、内積演算処理部７２に入力される順番に、上記内積演算処理を繰り返し実行することで、特徴抽出層の第１層の各シナプスの出力に相当するデータを取得することができる。 Note that the weight coefficient data that matches the processing order of the first layer of the feature extraction layer is input to the inner product calculation processing unit 72 as data D_wij in that order. Therefore, the inner product calculation processing unit 72 inputs the data D_wij and the corresponding data after quantization (data to be processed (data read from the first RAM 5)) to the inner product calculation processing unit 72. By repeatedly executing the inner product calculation process in order, data corresponding to the output of each synapse in the first layer of the feature extraction layer can be acquired.

内積演算処理部７２では、上記処理により、特徴抽出層の第１層の各シナプスの出力に相当するデータ（このデータを「データＤ０」とする）を取得したら、当該取得したデータに対して、活性化関数による処理を実行し、データＤｏを取得する。つまり、内積演算処理部７２は、
Ｄｏ＝ｆ＿ａｃｔ（Ｄ０）
ｆ＿ａｃｔ（）：活性化関数（例えば、ＲｅＬＵ関数、Ｓｏｆｔｍａｘ関数、シグモイド関数等）
に相当する処理を実行することで、データＤｏを取得する。 When the inner product calculation processing unit 72 acquires data corresponding to the output of each synapse in the first layer of the feature extraction layer (this data is referred to as “data D0”) by the above processing, the acquired data is subjected to the data. The process by the activation function is executed and the data Do is acquired. That is, the inner product calculation processing unit 72
Do = f_act (D0)
f_act (): Activation function (for example, ReLU function, Softmax function, sigmoid function, etc.)
Data Do is acquired by executing the process corresponding to.

ステップＳ４４において、内積演算処理部７２は、上記により取得したデータＤｏを第１ＲＡＭ５に出力し、当該データＤｏを第１ＲＡＭ５の所定の領域に書き込む。 In step S44, the inner product calculation processing unit 72 outputs the data Do acquired as described above to the first RAM 5, and writes the data Do to a predetermined area of the first RAM 5.

以上のようにして、ニューラルネットワーク用プロセッサ１００では、特徴抽出層の第１層の処理が実行される。図２のタイミングチャートでは、時刻ｔ１１から時刻ｔ１２の間において、第１ＲＯＭ２から、特徴抽出層の第１層の処理用のデータ（重み係数データ）Ｄ＿ｗｉｊ＿ｆｅａｔｕｒｅ（図２では、特徴抽出層の第１層の処理用のデータをデータＤｆ＿１ｓｔ＿Ｌと表記）が、処理順に読み出され、読み出した当該データ（重み係数データ）Ｄ＿ｗｉｊ＿ｆｅａｔｕｒｅと、処理対象の量子化後のデータＤ＿Ｑｉｎとを用いて、内積演算処理が実行される。そして、ニューラルネットワーク用プロセッサ１００では、上記のように、活性化関数に相当する処理が実行されて、データＤｏが取得される。 As described above, in the neural network processor 100, the processing of the first layer of the feature extraction layer is executed. In the timing chart of FIG. 2, between the time t11 and the time t12, the data for processing the first layer of the feature extraction layer (weight coefficient data) D_wij_feature (in FIG. 2, the first layer of the feature extraction layer) from the first ROM2. The data for processing is read as data Df_1st_L) in the order of processing, and the internal product calculation process is executed using the read data (weight coefficient data) D_wij_feature and the data D_Qin after quantization of the processing target. Will be done. Then, in the neural network processor 100, as described above, the process corresponding to the activation function is executed, and the data Do is acquired.

そして、ニューラルネットワーク用プロセッサ１００では、特徴抽出層の第２層、第３層、・・・、第Ｎ層の処理が、上記の第１層の処理と同様に、実行される（ステップＳ４１〜Ｓ４５、図２の時刻ｔ１２〜ｔ２の間の処理）。 Then, in the neural network processor 100, the processing of the second layer, the third layer, ..., The Nth layer of the feature extraction layer is executed in the same manner as the processing of the first layer described above (steps S41 to 1). S45, processing between times t12 and t2 in FIG. 2).

そして、上記処理により取得したデータＤｏ（特徴抽出層の処理により取得したデータ）が、第１ＲＡＭ５に出力される（書き込まれる）。 Then, the data Do (data acquired by the processing of the feature extraction layer) acquired by the above processing is output (written) to the first RAM 5.

（ステップＳ５１）：
ステップＳ５１において、特徴抽出層のデータ読み出し終了時刻を特定する処理が実行される。具体的には、制御部１は、第１ＲＯＭ２からヘッダデータＤ＿ｈｅａｄｅｒを読み出し（図２の時刻ｔ１〜ｔ１１）、当該ヘッダデータＤ＿ｈｅａｄｅｒに含まれるデータから、第１ＲＯＭ２から内積処理部７に、特徴抽出層の処理用のデータ（重み係数データ）を転送するのに必要な時間Ｔ１を算出する。 (Step S51):
In step S51, a process of specifying the data read end time of the feature extraction layer is executed. Specifically, the control unit 1 reads the header data D_header from the first ROM 2 (time t1 to t11 in FIG. 2), and from the data included in the header data D_header, the feature extraction layer is transferred from the first ROM 2 to the inner product processing unit 7. The time T1 required to transfer the data for processing (weight coefficient data) of the above is calculated.

例えば、第１ＲＯＭ２から取得されるヘッダデータＤ＿ｈｅａｄｅｒには、
（１）特徴抽出層に含まれる層の数Ｎ（Ｎ：自然数）と、
（２）特徴抽出層に含まれる第ｋ層（ｋ：自然数、１≦ｋ≦Ｎ）の処理における実数演算に必要な時間と、特徴抽出層に含まれる第ｋ層（ｋ：自然数、１≦ｋ≦Ｎ）の処理における実数演算の計算精度と、
（３）特徴抽出層に含まれる第ｋ層（ｋ：自然数、１≦ｋ≦Ｎ）の処理における整数演算に必要な時間と、特徴抽出層に含まれる第ｋ層（ｋ：自然数、１≦ｋ≦Ｎ）の処理における整数演算の計算精度と、
が含まれる。そして、制御部１は、上記データに基づいて、例えば、下記数式に相当する処理を実行することで、特徴抽出層の処理用のデータ（重み係数データ）を転送するのに必要な時間Ｔ１を算出する。

Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ｔｉｍｅ
＝Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ｎｕｍ×Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ａｃｃｕｒａｃｙ
Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ｔｉｍｅ
＝Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ｎｕｍ×Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ａｃｃｕｒａｃｙ
Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ｔｉｍｅ：第ｋ層の実数演算に必要な処理時間
Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ｔｉｍｅ：第ｋ層の整数演算に必要な処理時間
Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ｎｕｍ：第ｋ層の実数演算対象のデータ数
Ｌ（ｋ）．Ｒｅａｌ＿Ｄ＿ａｃｃｕｒａｃｙ：第ｋ層の実数演算の計算精度
Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ｎｕｍ：第ｋ層の整数演算対象のデータ数
Ｌ（ｋ）．Ｉｎｔ＿Ｄ＿ａｃｃｕｒａｃｙ：第ｋ層の整数演算の計算精度
そして、制御部１は、上記により取得した時間Ｔ１と、特徴抽出層の処理用のデータ（重み係数データ）の転送が開始される時刻ｔ１１とから、特徴抽出層の処理用のデータ（重み係数データ）の転送が完了する時刻ｔ２を、
ｔ２＝ｔ１１＋Ｔ１
により算出する。これにより、制御部１は、特徴抽出層の処理用のデータ（重み係数データ）の転送が完了する時刻ｔ２を特定する。 For example, the header data D_header acquired from the first ROM 2 may include
(1) The number of layers N (N: natural number) included in the feature extraction layer and
(2) The time required for real number calculation in the processing of the k-th layer (k: natural number, 1 ≦ k ≦ N) included in the feature extraction layer, and the k-th layer (k: natural number, 1 ≦ N) included in the feature extraction layer. Calculation accuracy of real number operation in the processing of k ≦ N) and
(3) The time required for integer calculation in the processing of the kth layer (k: natural number, 1 ≦ k ≦ N) included in the feature extraction layer, and the kth layer (k: natural number, 1 ≦ N) included in the feature extraction layer. The calculation accuracy of integer arithmetic in the processing of k ≦ N) and
Is included. Then, the control unit 1 determines the time T1 required to transfer the data (weighting coefficient data) for the processing of the feature extraction layer by, for example, executing the processing corresponding to the following mathematical expression based on the above data. calculate.

L (k). Real_D_time
= L (k). Real_D_num × L (k). Real_D_accuracy
L (k). Int_D_time
= L (k). Int_D_num × L (k). Int_D_accuracy
L (k). Real_D_time: Processing time required for real number calculation of layer k L (k). Int_D_time: Processing time required for integer operation of layer k L (k). Real_D_num: Number of data to be calculated as a real number in the k-th layer L (k). Real_D_accuracy: Calculation accuracy of real number operation of layer k L (k). Int_D_num: Number of data to be calculated as an integer in the k-th layer L (k). Int_D_accuracy: Calculation accuracy of the integer calculation of the k-th layer Then, the control unit 1 starts from the time T1 acquired by the above and the time t11 when the transfer of the data (weight coefficient data) for processing of the feature extraction layer is started. The time t2 at which the transfer of the data (weighting coefficient data) for processing of the feature extraction layer is completed is set.
t2 = t11 + T1
Calculated by As a result, the control unit 1 specifies the time t2 at which the transfer of the data (weighting coefficient data) for processing of the feature extraction layer is completed.

（ステップＳ５２）：
ステップＳ５２において、判定層の重み係数データの転送処理が実行される。具体的には、制御部１は、第２ＲＯＭ３に記憶保持されている判定層の重み係数データＤ＿ｗｉｊ＿ｄｅｃｉを第２ＲＡＭ６に転送するのにかかる時間Ｔ２を取得する。そして、制御部１は、時刻ｔ２０（＝ｔ２−Ｔ２）以前の時刻において、第２ＲＯＭ３に記憶保持されている判定層の重み係数データＤ＿ｗｉｊ＿ｄｅｃｉを第２ＲＡＭ６に転送する指令を第２ＲＯＭ３、第２ＲＡＭ６に出力し、時刻ｔ２０（＝ｔ２−Ｔ２）以前の時刻において、判定層の重み係数データＤ＿ｗｉｊ＿ｄｅｃｉのデータ転送処理を開始させる。 (Step S52):
In step S52, the transfer process of the weighting coefficient data of the determination layer is executed. Specifically, the control unit 1 acquires the time T2 required to transfer the weighting coefficient data D_wij_deci of the determination layer stored and held in the second ROM 3 to the second RAM 6. Then, the control unit 1 outputs a command to transfer the weighting coefficient data D_wij_deci of the determination layer stored and held in the second ROM 3 to the second ROM 6 to the second ROM 3 and the second RAM 6 at a time before the time t20 (= t2-T2). Then, at a time before the time t20 (= t2-T2), the data transfer process of the weighting coefficient data D_wij_deci of the determination layer is started.

これにより、判定層の重み係数データＤ＿ｗｉｊ＿ｄｅｃｉの第２ＲＯＭ３から第２ＲＡＭ６への転送処理を、特徴抽出層のデータ読み出し処理が完了する時刻ｔ２までに完了させることができる。 As a result, the transfer process of the weighting coefficient data D_wij_deci of the determination layer from the second ROM 3 to the second RAM 6 can be completed by the time t2 when the data read process of the feature extraction layer is completed.

ニューラルネットワーク用プロセッサ１００では、このように、特徴抽出層の処理（ステップＳ４１〜Ｓ４５）と、並行に、ステップＳ５１、Ｓ５２の処理を行うことで、特徴抽出層の処理が完了する時刻ｔ２の直後から、判定層の処理を実行することができる。 In the neural network processor 100, by performing the processing of the feature extraction layer (steps S41 to S45) and the processing of steps S51 and S52 in parallel in this way, immediately after the time t2 when the processing of the feature extraction layer is completed. Therefore, the processing of the determination layer can be executed.

（ステップＳ６、Ｓ７）：
ステップＳ６、Ｓ７では、判定層の処理が実行される（図２の時刻ｔ２〜ｔ３の処理）。 (Steps S6, S7):
In steps S6 and S7, the processing of the determination layer is executed (processing at times t2 to t3 in FIG. 2).

時刻ｔ２（判定層の処理の開始時刻）までに、特徴抽出層の処理が完了しており、かつ、判定層の処理用のデータ（判定層の処理用の重み係数データ）の第２ＲＯＭ３から第２ＲＡＭ６へのデータ転送処理が完了している。したがって、ステップＳ６では、内積処理部７の第２インターフェース部７１は、判定層の処理用データ（判定層の処理用の重み係数データ）Ｄ＿ｗｉｊ＿ｄｅｃｉを、第２ＲＡＭ６から取得する（読み出す）。そして、第２インターフェース部７１は、取得したデータＤ＿ｗｉｊ＿ｄｅｃｉを、データＤ＿ｗｉｊとして、内積演算処理部７２に出力する。 By time t2 (start time of processing of the determination layer), the processing of the feature extraction layer is completed, and the data for processing the determination layer (weight coefficient data for processing the determination layer) from the second ROM 3 to the third 2 The data transfer process to RAM 6 is completed. Therefore, in step S6, the second interface unit 71 of the inner product processing unit 7 acquires (reads) the processing data of the determination layer (weight coefficient data for processing of the determination layer) D_wij_deci from the second RAM 6. Then, the second interface unit 71 outputs the acquired data D_wij_deci as data D_wij to the inner product calculation processing unit 72.

また、内積処理部７の第２インターフェース部７１は、判定層に入力するデータの量子化後のデータを、第１ＲＡＭ５から読み出し、データＤ＿Ｑｉｎとして取得する。そして、第２インターフェース部７１は、取得したデータＤ＿Ｑｉｎを内積演算処理部７２に出力する。 Further, the second interface unit 71 of the inner product processing unit 7 reads the quantized data of the data input to the determination layer from the first RAM 5 and acquires the data as data D_Qin. Then, the second interface unit 71 outputs the acquired data D_Qin to the inner product calculation processing unit 72.

内積演算処理部７２では、上記のようにして、判定層の処理（例えば、データの次元数を調整するためのアウトマッピング処理（ＯｕｔＭａａｐｉｎｇ）、全結合層の処理（ベクトル内積処理に相当）、Ｓｏｆｔｍａｘ層の処理等）を実行し、判定層の処理の出力結果に相当するデータをデータＤｏとして取得する。そして、取得したデータＤｏは、内積演算処理部７２から第１ＲＡＭ５に出力され、当該データＤｏは、第１ＲＡＭ５の所定の領域に書き込まれる。 In the inner product calculation processing unit 72, as described above, the processing of the determination layer (for example, the out mapping processing for adjusting the number of dimensions of the data (Out Mapping), the processing of the fully connected layer (corresponding to the vector inner product processing), (Processing of the Softmax layer, etc.) is executed, and data corresponding to the output result of the processing of the determination layer is acquired as data Do. Then, the acquired data Do is output from the inner product calculation processing unit 72 to the first RAM 5, and the data Do is written in a predetermined area of the first RAM 5.

なお、判定層の処理においても、特徴抽出層の処理と同様に、必要に応じて、活性化関数に相当する処理が実行されてもよい。 In the processing of the determination layer as well, the processing corresponding to the activation function may be executed as necessary, as in the processing of the feature extraction layer.

以上の処理により取得された判定層の処理結果のデータは、例えば、第１インターフェース部ＩＦ１により、第１ＲＡＭ５から読み出され、出力データＤｏｕｔとして、例えば、外部に送信される。 The data of the processing result of the determination layer acquired by the above processing is read from the first RAM 5 by, for example, the first interface unit IF1, and is transmitted to the outside as output data Dout, for example.

以上のように、ニューラルネットワーク用プロセッサ１００では、ニューラルネットワークのモデルを決定したら確定する特徴抽出層の処理用データ（重み係数データ）を、ハードウェア化したときのハード規模が小さくてすむＲＯＭ（本実施形態の場合、第１ＲＯＭ２）に書き込み、データが変更される可能性のある判定層の処理用データ（重み係数データ）を、データ書き換えができるＲＡＭ（本実施形態の場合、第２ＲＡＭ６）に保持する。そして、ニューラルネットワーク用プロセッサ１００では、上記の状態により、ニューラルネットワークによる処理を実行する。つまり、ニューラルネットワーク用プロセッサ１００では、変更されることがない特徴抽出層の処理用データ（重み係数データ）を、ハードウェア規模が小さくてすむＲＯＭで保持し、変更する可能性がある判定層の処理用データ（重み係数データ）をＲＡＭで保持して、ニューラルネットワーク用処理を実行するので、ハードウェア規模の増大を抑えつつ、高性能なニューラルネットワーク処理を実行することができる。 As described above, in the neural network processor 100, the ROM (book) that requires a small hardware scale when the processing data (weight coefficient data) of the feature extraction layer, which is determined once the neural network model is determined, is converted into hardware. In the case of the embodiment, the processing data (weight coefficient data) of the determination layer, which is written to the first ROM 2) and whose data may be changed, is held in the RAM (second RAM 6 in the case of the present embodiment) that can rewrite the data. To do. Then, the neural network processor 100 executes the processing by the neural network in the above state. That is, in the neural network processor 100, the processing data (weight coefficient data) of the feature extraction layer that is not changed is held in the ROM that requires a small hardware scale, and the determination layer that may be changed. Since the processing data (weight coefficient data) is held in the RAM and the neural network processing is executed, it is possible to execute the high-performance neural network processing while suppressing the increase in the hardware scale.

また、ニューラルネットワーク用プロセッサ１００では、特徴抽出層のネットワーク構成にしたがって確定する重み係数データを、予測処理を実行するときに必要となる順番に（予測処理のときの処理順に）読み出されるように、ＲＯＭ（本実施形態では、第１ＲＯＭ２）に記憶している。したがって、ニューラルネットワーク用プロセッサ１００では、予測処理実行時において、単発のトリガー信号（制御部１から第１ＲＯＭ２に出力されるトリガー信号Ｓｉｇ＿ｔｒｉｇ）を第１ＲＯＭ２に入力するだけで、内積処理部７は、予測処理を実行するときに必要となる順番に、特徴抽出層の処理のための重み係数データを取得することができる。つまり、ニューラルネットワーク用プロセッサ１００では、単発のトリガー信号を第１ＲＯＭに入力するだけで、必要な特徴抽出層の処理のための重み係数データが、処理順に取得できるため、従来技術のように、ＲＯＭに対する複雑なアドレス指定を行う必要がない。なお、上記実施形態では、単発のトリガー信号により、連続的に、第１ＲＯＭ２から特徴抽出層の処理に必要なデータが、処理順に、第１ＲＯＭ２から内積処理部７にデータ転送される場合について、説明したが、これに限定されることはない。例えば、ニューラルネットワーク用プロセッサ１００において、第１ＲＯＭ２の１つのアドレスを指定することで、その後、連続的に、第１ＲＯＭ２から特徴抽出層の処理に必要なデータが、処理順に、第１ＲＯＭ２から内積処理部７にデータ転送されるようにしてもよい。 Further, in the neural network processor 100, the weighting coefficient data determined according to the network configuration of the feature extraction layer is read out in the order required when executing the prediction processing (in the processing order at the time of the prediction processing). It is stored in a ROM (in the present embodiment, the first ROM 2). Therefore, in the neural network processor 100, when the prediction processing is executed, the inner product processing unit 7 simply inputs a single trigger signal (trigger signal Sig_trig output from the control unit 1 to the first ROM 2) to the first ROM 2. The weight coefficient data for the processing of the feature extraction layer can be acquired in the order required when the processing is executed. That is, in the neural network processor 100, the weighting coefficient data for processing the necessary feature extraction layer can be acquired in the processing order only by inputting a single trigger signal to the first ROM. Therefore, as in the conventional technique, the ROM. There is no need to specify complicated addresses for. In the above embodiment, the case where the data required for the processing of the feature extraction layer from the first ROM 2 is continuously transferred from the first ROM 2 to the inner product processing unit 7 by the single trigger signal will be described. However, it is not limited to this. For example, in the neural network processor 100, by designating one address of the first ROM2, the data required for the processing of the feature extraction layer from the first ROM2 is continuously transferred from the first ROM2 to the inner product processing unit in the processing order. Data may be transferred to 7.

また、ニューラルネットワーク用プロセッサ１００では、上記で説明したように、第１ＲＯＭ２に記憶されているヘッダデータにより、特徴抽出層の処理用のデータ（重み係数データ）を読み出す時間を特定することができる。したがって、ニューラルネットワーク用プロセッサ１００では、特定した特徴抽出層の処理用のデータ（重み係数データ）を読み出す時間に基づいて、判定層の処理用のデータ（重み係数データ）を、第２ＲＯＭ３から第２ＲＡＭ６に予め転送しておくことができる。これにより、ニューラルネットワーク用プロセッサ１００では、特徴抽出層の処理が完了すると、すぐに、判定層の処理用のデータ（重み係数データ）を第２ＲＡＭ６から取得できる（読み出すことができる）状態にすることができる。つまり、ニューラルネットワーク用プロセッサ１００では、特徴抽出層の処理が完了すると、すぐに、判定層の処理が実行できる状態にできる。その結果、ニューラルネットワーク用プロセッサ１００では、ニューラルネットワーク用処理を高速化することができる（判定層の処理が完了するまでの時間を短くすることができる）。 Further, in the neural network processor 100, as described above, it is possible to specify the time for reading the data (weighting coefficient data) for processing of the feature extraction layer from the header data stored in the first ROM 2. Therefore, in the neural network processor 100, the data for processing of the determination layer (weighting coefficient data) is obtained from the second ROM 3 to the second RAM 6 based on the time for reading the processing data (weighting coefficient data) of the specified feature extraction layer. Can be transferred to. As a result, in the neural network processor 100, as soon as the processing of the feature extraction layer is completed, the data (weighting coefficient data) for the processing of the determination layer can be acquired (read) from the second RAM 6. Can be done. That is, in the neural network processor 100, as soon as the processing of the feature extraction layer is completed, the processing of the determination layer can be executed. As a result, in the neural network processor 100, the processing for the neural network can be speeded up (the time until the processing of the determination layer is completed can be shortened).

なお、上記では、判定層の処理用のデータ（重み係数データ）が第２ＲＯＭ３に記憶保持されている場合について、説明したが、これに限定されることはなく、判定層の処理用のデータ（重み係数データ）は、外部からニューラルネットワーク用プロセッサ１００に入力されるものであってもよい。この場合、ニューラルネットワーク用プロセッサ１００では、以下のように処理が実行される。すなわち、判定層の処理用のデータ（重み係数データ）は、データＤｉｎとして、外部から第１インターフェース部ＩＦ１に入力される。そして、ニューラルネットワーク用プロセッサ１００では、上記で説明したのと同様に、特定した特徴抽出層の処理用のデータ（重み係数データ）を読み出す時間に基づいて、判定層の処理用のデータ（重み係数データ）を、第１インターフェース部ＩＦ１から第２ＲＡＭ６に予め転送しておく。これにより、ニューラルネットワーク用プロセッサ１００では、ニューラルネットワーク用プロセッサ１００では、判定層の処理用のデータ（重み係数データ）が外部から入力される場合においても、特徴抽出層の処理が完了すると、すぐに、判定層の処理用のデータ（重み係数データ）を第２ＲＡＭ６から取得できる（読み出すことができる）状態にすることができる。 In the above description, the case where the data for processing the determination layer (weighting coefficient data) is stored and held in the second ROM 3 has been described, but the present invention is not limited to this, and the data for processing the determination layer (weight coefficient data) ( The weighting coefficient data) may be input to the neural network processor 100 from the outside. In this case, the neural network processor 100 executes the processing as follows. That is, the processing data (weighting coefficient data) of the determination layer is input to the first interface unit IF1 from the outside as data Din. Then, in the neural network processor 100, as described above, the data for processing the determination layer (weighting coefficient) is based on the time for reading the data for processing the specified feature extraction layer (weighting coefficient data). Data) is transferred in advance from the first interface unit IF1 to the second RAM 6. As a result, in the neural network processor 100, even when the data for processing the determination layer (weight coefficient data) is input from the outside in the neural network processor 100, as soon as the processing of the feature extraction layer is completed, The data (weight coefficient data) for processing of the determination layer can be acquired (read) from the second RAM 6.

つまり、ニューラルネットワーク用プロセッサ１００では、判定層の処理用のデータ（重み係数データ）が外部から入力される場合においても、特徴抽出層の処理が完了すると、すぐに、判定層の処理が実行できる状態にできる。その結果、ニューラルネットワーク用プロセッサ１００では、ニューラルネットワーク用処理を高速化することができる（判定層の処理が完了するまでの時間を短くすることができる）。 That is, in the neural network processor 100, even when the data for processing the determination layer (weighting coefficient data) is input from the outside, the processing of the determination layer can be executed as soon as the processing of the feature extraction layer is completed. Can be in a state. As a result, in the neural network processor 100, the processing for the neural network can be speeded up (the time until the processing of the determination layer is completed can be shortened).

また、第２ＲＡＭ６には、判定処理（判定層の処理）に必要なデータのみを記憶保持すれば良いので、第２ＲＡＭ６のメモリ容量を小さくすることができる。
また、ニューラルネットワーク用プロセッサ１００において、
（１）１回目の特徴抽出層の処理（例えば、第１層〜第Ｎ層の処理）を実行し、
（２）１回目の判定層の処理を実行し、
（３）２回目の特徴抽出層の処理（例えば、第１層〜第Ｎ層の処理）を実行し、
（４）２回目の判定層の処理を実行する、
場合、上記処理（２）の前に、１回目の判定層の処理の実行に必要なデータが第２ＲＡＭ６に記憶保持され、上記処理（４）の前に、１回目の判定層の処理の実行に必要なデータが第２ＲＡＭ６に記憶保持される。 Further, since it is only necessary to store and hold only the data necessary for the determination process (processing of the determination layer) in the second RAM 6, the memory capacity of the second RAM 6 can be reduced.
Further, in the neural network processor 100,
(1) The first treatment of the feature extraction layer (for example, treatment of the first layer to the Nth layer) is executed.
(2) Execute the first judgment layer processing,
(3) The second processing of the feature extraction layer (for example, the processing of the first layer to the Nth layer) is executed.
(4) Execute the second judgment layer processing,
In this case, the data necessary for executing the first determination layer process is stored and held in the second RAM 6 before the first process (2), and the first determination layer process is executed before the first process (4). The data required for this is stored and held in the second RAM 6.

このとき、第２ＲＡＭ６において、２回目の判定層の処理（次の判定層の処理）の実行に必要なデータのうち、１回目の判定層の処理の実行に必要なデータと異なるデータのみを更新するようにすればよい。これにより、ニューラルネットワーク用プロセッサ１００では、第２ＲＡＭ６のメモリ容量を小さくすることができるとともに、第２ＲＡＭ６のデータ転送量（更新するデータの量）を少なくすることができる。 At this time, in the second RAM 6, only the data different from the data required for executing the first determination layer process among the data required for executing the second determination layer process (the next determination layer process) is updated. You just have to do it. As a result, in the neural network processor 100, the memory capacity of the second RAM 6 can be reduced, and the data transfer amount (the amount of data to be updated) of the second RAM 6 can be reduced.

≪第１変形例≫
次に、第１実施形態の第１変形例について、説明する。なお、上記実施形態と同様の部分については、同一符号を付し、詳細な説明を省略する。 ≪First modification≫
Next, a first modification of the first embodiment will be described. The same parts as those in the above embodiment are designated by the same reference numerals, and detailed description thereof will be omitted.

図４は、第１実施形態の第１変形例に係るニューラルネットワーク用プロセッサ１００Ａの概略構成図である。 FIG. 4 is a schematic configuration diagram of the neural network processor 100A according to the first modification of the first embodiment.

本変形例のニューラルネットワーク用プロセッサ１００Ａは、第１実施形態のニューラルネットワーク用プロセッサ１００において、伸張部８を追加した構成を有している。それ以外については、本変形例のニューラルネットワーク用プロセッサ１００Ａは、第１実施形態のニューラルネットワーク用プロセッサ１００と同様である。 The neural network processor 100A of this modification has a configuration in which the extension unit 8 is added to the neural network processor 100 of the first embodiment. Other than that, the neural network processor 100A of this modification is the same as the neural network processor 100 of the first embodiment.

本変形例のニューラルネットワーク用プロセッサ１００Ａでは、第１ＲＯＭ２に圧縮されたデータが書き込まれている。 In the neural network processor 100A of this modification, the compressed data is written in the first ROM 2.

伸張部８は、第１ＲＯＭ２に記憶保持されている圧縮データＤ１を読み出し、当該圧縮データＤ１に対して伸張処理を実行し、伸張処理後のデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅを取得する。つまり、伸張部８は、上記伸張処理を実行することで、第１実施形態において、第１ＲＯＭ２から出力されるデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅを取得する。そして、伸張部８は、伸張処理後のデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅを内積処理部７に出力する。 The decompression unit 8 reads the compressed data D1 stored and held in the first ROM 2, executes the decompression process on the compressed data D1, and acquires the data D_wij_feature after the decompression process. That is, the stretching unit 8 acquires the data D_wij_feature output from the first ROM 2 in the first embodiment by executing the stretching process. Then, the stretching unit 8 outputs the data D_wij_feature after the stretching process to the inner product processing section 7.

ニューラルネットワーク用プロセッサ１００Ａでは、第１実施形態の第１ＲＯＭから出力されるデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅの代わりに、伸張部８から出力されるデータＤ＿ｗｉｊ＿ｆｅａｔｕｒｅを用いて、内積処理部７での処理が実行される。内積処理部７での処理は、第１実施形態と同様である。 In the neural network processor 100A, the processing in the inner product processing unit 7 is executed by using the data D_wij_feature output from the decompression unit 8 instead of the data D_wij_feature output from the first ROM of the first embodiment. The processing in the inner product processing unit 7 is the same as that in the first embodiment.

本変形例のニューラルネットワーク用プロセッサ１００Ａでは、特徴抽出層のデータを、圧縮したデータとして、第１ＲＯＭ２に記憶保持するため、第１実施形態に比べて、さらに、第１ＲＯＭ２で必要となるメモリ容量が少なくなる。したがって、本変形例のニューラルネットワーク用プロセッサ１００Ａでは、第１ＲＯＭ２のハードウェア規模をさらに小さくすることができる。その結果、本変形例のニューラルネットワーク用プロセッサ１００Ａは、第１実施形態のニューラルネットワーク用プロセッサ１００に比べて、さらにハードウェア規模を小さくできる。 In the neural network processor 100A of this modification, since the data of the feature extraction layer is stored and held in the first ROM 2 as compressed data, the memory capacity required in the first ROM 2 is further increased as compared with the first embodiment. Less. Therefore, in the neural network processor 100A of this modification, the hardware scale of the first ROM 2 can be further reduced. As a result, the neural network processor 100A of the present modification can further reduce the hardware scale as compared with the neural network processor 100 of the first embodiment.

［他の実施形態］
ニューラルネットワーク用プロセッサ１００、１００Ａの各機能部の一部または全部は、マイクロコードにより、あるいは、マイクロコードとともに所定のハードウェアにより実現されるものであってもよい。 [Other Embodiments]
Some or all of the functional parts of the neural network processors 100 and 100A may be realized by microcode or by predetermined hardware together with microcode.

また、ニューラルネットワーク用プロセッサ１００、１００Ａにおいて、下記先行技術文献Ａに開示されているように、Ｂｉｎａｒｉｚｅｄ−ＤＣＮＮ（ＤＣＮＮ：ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎＮｅｕｒａｌＮｅｔｗｏｒｋ）（以下、「ＢＮＮ」という）によるニューラルネットワークを実現するようにしてもよい。この場合、ＢＮＮにより実現される特徴抽出層の重み係数データをＲＯＭ（例えば、第１ＲＯＭ２）に、非圧縮データ、あるいは、圧縮データとして記憶させるようにすればよい。
（先行技術文献Ａ）：
神谷龍司等 “Binarized-DCNNによる識別計算の高速化とモデル圧縮” 信学技報116(366), 47-52, 2016-12-15 電子情報通信学会
また、ニューラルネットワーク用プロセッサ１００、１００Ａにおいて、ＢＮＮのベクトル分解により取得される二値基底行列を多値基底行列としたニューラルネットワーク（これを「多値ニューラルネットワーク」という）を実装するようにしてもよい。この場合、多値ニューラルネットワークにより実現される特徴抽出層の重み係数データをＲＯＭ（例えば、第１ＲＯＭ２）に、非圧縮データ、あるいは、圧縮データとして記憶させるようにすればよい。 Further, in the neural network processors 100 and 100A, as disclosed in the following Prior Art Document A, a neural network using a Binarized-DCNN (DCNN: Deep Convolution Natural Network) (hereinafter referred to as "BNN") is realized. It may be. In this case, the weighting coefficient data of the feature extraction layer realized by BNN may be stored in the ROM (for example, the first ROM 2) as uncompressed data or compressed data.
(Prior Art Document A):
Ryuji Kamiya et al. “Speeding up identification calculation and model compression by Binarized-DCNN” Shingaku Giho 116 (366), 47-52, 2016-12-15 Society of Electronics, Information and Communication Engineers In addition, in processors 100 and 100A for neural networks, You may implement a neural network (this is called "multi-value neural network") in which the binary base matrix obtained by the vector decomposition of BNN is used as the multi-value base matrix. In this case, the weighting coefficient data of the feature extraction layer realized by the multi-value neural network may be stored in the ROM (for example, the first ROM 2) as uncompressed data or compressed data.

上記実施形態（変形例を含む）では、ニューラルネットワーク用プロセッサ１００、１００Ａが、ＲＡＭを第１ＲＡＭ５、第２ＲＡＭ６を有する構成である場合について、説明したが、これに限定されることはない。ニューラルネットワーク用プロセッサ１００、１００Ａにおいて、例えば、第１ＲＡＭ５、第２ＲＡＭ６を１つのＲＡＭにより実現するようにしてもよい。 In the above embodiment (including a modification), the case where the neural network processors 100 and 100A have a configuration in which the RAMs include the first RAM 5 and the second RAM 6 has been described, but the present invention is not limited thereto. In the neural network processors 100 and 100A, for example, the first RAM 5 and the second RAM 6 may be realized by one RAM.

また、上記実施形態（変形例を含む）における各種のデータ転送、信号の送受信は、上記で説明した形態に限定されるものではなく、各機能部において、直接接続された信号線により、データ転送、信号の送受信が実行されるものであってもよいし、また、バスを介して、データ転送、信号の送受信が実行されるものであってもよい。 Further, various data transfers and signal transmission / reception in the above-described embodiment (including modified examples) are not limited to the embodiments described above, and data transfer is performed by directly connected signal lines in each functional unit. , Signal transmission / reception may be executed, or data transfer and signal transmission / reception may be executed via a bus.

また、上記実施形態（変形例を含む）において、ＲＯＭやＲＡＭの個数については一例として説明したが、これに限定されることはなく、ＲＯＭ、ＲＡＭの個数や配置は、上記以外のものであってもよい。また、ＲＯＭ、ＲＡＭは、ニューラルネットワーク用プロセッサ１００、１００Ａの外部に設けられるものであってもよい。 Further, in the above embodiment (including a modification), the number of ROMs and RAMs has been described as an example, but the present invention is not limited to this, and the number and arrangement of ROMs and RAMs are other than the above. You may. Further, the ROM and RAM may be provided outside the neural network processors 100 and 100A.

上記実施形態で説明したニューラルネットワーク用プロセッサ１００、１００Ａの各ブロック（各機能部）は、ＬＳＩなどの半導体装置により個別に１チップ化されても良いし、一部又は全部を含むように１チップ化されても良い。また、上記実施形態で説明したニューラルネットワーク用プロセッサ１００の各ブロック（各機能部）は、複数のＬＳＩなどの半導体装置により実現されるものであってもよい。 Each block (each functional unit) of the neural network processors 100 and 100A described in the above embodiment may be individually integrated into one chip by a semiconductor device such as an LSI, or one chip so as to include a part or all of them. It may be converted. Further, each block (each functional unit) of the neural network processor 100 described in the above embodiment may be realized by a semiconductor device such as a plurality of LSIs.

なお、ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Although it is referred to as an LSI here, it may be referred to as an IC, a system LSI, a super LSI, or an ultra LSI depending on the degree of integration.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路又は汎用プロセサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用しても良い。 Further, the method of making an integrated circuit is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after the LSI is manufactured, or a reconfigurable processor that can reconfigure the connection and settings of the circuit cells inside the LSI may be used.

また、上記各実施形態の各機能ブロックの処理の一部または全部は、プログラムにより実現されるものであってもよい。そして、上記各実施形態の各機能ブロックの処理の一部または全部は、コンピュータにおいて、中央演算装置（ＣＰＵ）により行われる。また、それぞれの処理を行うためのプログラムは、ハードディスク、ＲＯＭなどの記憶装置に格納されており、ＲＯＭにおいて、あるいはＲＡＭに読み出されて実行される。 In addition, a part or all of the processing of each functional block of each of the above embodiments may be realized by a program. Then, a part or all of the processing of each functional block of each of the above embodiments is performed by the central processing unit (CPU) in the computer. Further, the program for performing each process is stored in a storage device such as a hard disk or a ROM, and is read and executed in the ROM or the RAM.

また、上記実施形態の各処理をハードウェアにより実現してもよいし、ソフトウェア（ＯＳ（オペレーティングシステム）、ミドルウェア、あるいは、所定のライブラリとともに実現される場合を含む。）により実現してもよい。さらに、ソフトウェアおよびハードウェアの混在処理により実現しても良い。 Further, each process of the above embodiment may be realized by hardware, or may be realized by software (including a case where it is realized together with an OS (operating system), middleware, or a predetermined library). Further, it may be realized by mixed processing of software and hardware.

例えば、上記実施形態（変形例を含む）の各機能部を、ソフトウェアにより実現する場合、図５に示したハードウェア構成（例えば、ＣＰＵ、ＧＰＵ、ＲＯＭ、ＲＡＭ、入力部、出力部等をバスＢｕｓにより接続したハードウェア構成）を用いて、各機能部をソフトウェア処理により実現するようにしてもよい。 For example, when each functional unit of the above embodiment (including a modification) is realized by software, the hardware configuration (for example, CPU, GPU, ROM, RAM, input unit, output unit, etc.) shown in FIG. 5 is busted. (Hardware configuration connected by Bus) may be used to realize each functional unit by software processing.

また、上記実施形態における処理方法の実行順序は、必ずしも、上記実施形態の記載に制限されるものではなく、発明の要旨を逸脱しない範囲で、実行順序を入れ替えることができるものである。 Further, the execution order of the processing methods in the above-described embodiment is not necessarily limited to the description of the above-described embodiment, and the execution order can be changed without departing from the gist of the invention.

前述した方法をコンピュータに実行させるコンピュータプログラム及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体は、本発明の範囲に含まれる。ここで、コンピュータ読み取り可能な記録媒体としては、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、大容量ＤＶＤ、次世代ＤＶＤ、半導体メモリを挙げることができる。 A computer program that causes a computer to perform the above-mentioned method and a computer-readable recording medium that records the program are included in the scope of the present invention. Here, examples of computer-readable recording media include flexible disks, hard disks, CD-ROMs, MOs, DVDs, DVD-ROMs, DVD-RAMs, large-capacity DVDs, next-generation DVDs, and semiconductor memories. ..

上記コンピュータプログラムは、上記記録媒体に記録されたものに限られず、電気通信回線、無線又は有線通信回線、インターネットを代表とするネットワーク等を経由して伝送されるものであってもよい。 The computer program is not limited to the one recorded on the recording medium, and may be transmitted via a telecommunication line, a wireless or wired communication line, a network typified by the Internet, or the like.

また、文言「部」は、「サーキトリー（ｃｉｒｃｕｉｔｒｙ）」を含む概念であってもよい。サーキトリーは、ハードウェア、ソフトウェア、あるいは、ハードウェアおよびソフトウェアの混在により、その全部または一部が、実現されるものであってもよい。 Further, the wording "part" may be a concept including "circuitity". The circuit may be realized in whole or in part by hardware, software, or a mixture of hardware and software.

ここに開示される要素の機能は、当該開示される要素を実行するように構成された、あるいは当該開示される機能を実行するようにプログラミングされた汎用プロセッサ、専用プロセッサ、集積回路、ＡＳＩＣ（「特定用途向け集積回路」）、従来の回路構成及び／またはそれらの組み合わせを含む回路構成あるいは処理回路構成が用いられて実装されてもよい。プロセッサは、それが、その中にトランジスタ及び他の回路構成を含むとき、処理回路構成あるいは回路構成として見なされる。本開示において、回路構成、ユニットあるいは手段は、挙げられた機能を実行するハードウェア、あるいは当該機能を実行するようにプログラミングされたハードウェアである。ハードウェアは、挙げられた機能を実行するようにプログラミングされた、あるいは当該機能を実行するように構成された、ここで開示されるいかなるハードウェアあるいは既知の他のものであってもよい。ハードウェアが、あるタイプの回路構成として見なされるかもしれないプロセッサであるとき、回路構成、手段あるいはユニットは、ハードウェアとソフトウェアの組み合わせ、ハードウェアを構成するために用いられるソフトウェア及び／またはプロセッサである。 The functions of the elements disclosed herein are general purpose processors, dedicated processors, integrated circuits, ASICs, configured to perform the disclosed elements or programmed to perform the disclosed functions. It may be implemented using an application-specific integrated circuit ”), a conventional circuit configuration and / or a circuit configuration or processing circuit configuration including a combination thereof. A processor is considered as a processing circuit configuration or circuit configuration when it contains transistors and other circuit configurations. In the present disclosure, a circuit configuration, unit or means is hardware that performs the listed functions, or hardware that is programmed to perform such functions. The hardware may be any hardware disclosed herein or something else known that is programmed to perform the listed functions or configured to perform those functions. When the hardware is a processor that may be considered as a type of circuit configuration, the circuit configuration, means or unit is a combination of hardware and software, the software and / or processor used to configure the hardware. is there.

なお、本発明の具体的な構成は、前述の実施形態に限られるものではなく、発明の要旨を逸脱しない範囲で種々の変更および修正が可能である。 The specific configuration of the present invention is not limited to the above-described embodiment, and various changes and modifications can be made without departing from the gist of the invention.

１００、１００Ａニューラルネットワーク用プロセッサ
１制御部
２第１ＲＯＭ
３第２ＲＯＭ
５第１ＲＡＭ
６第２ＲＡＭ
７内積処理部
８伸張部 100, 100A Neural network processor 1 Control unit 2 1st ROM
3 2nd ROM
5 1st RAM
6 2nd RAM
7 Inner product processing unit 8 Stretching unit

Claims

A neural network processor for executing neural network processing including feature extraction layer processing and judgment layer processing.
A read-only memory that stores and holds data for processing in the feature extraction layer,
A random access memory that can read and write data and stores and holds data for processing in the determination layer.
A control unit that controls the read-only memory and the random access memory,
The data for processing the feature extraction layer is read from the read-only memory as the first data, the processing of the feature extraction layer is executed using the first data, and the processing of the determination layer is performed from the random access memory. Data for reading as second data, and using the second data, an inner product processing unit that executes the processing of the determination layer, and
A processor for neural networks.

The read-only memory is
Header data including data for specifying the read time of data for processing of the feature extraction layer, and
The data for processing of the feature extraction layer is stored and held so that the data for processing of the feature extraction layer can be output in the order required when the processing of the feature extraction layer is executed.
When the control unit outputs a read command signal to the read-only memory, the read-only memory is required when executing the processing of the feature extraction layer with data for processing the feature extraction layer. Output in the order of
The processor for a neural network according to claim 1.

The control unit
The header data stored in the read-only memory is read, and based on the header data, the time when the output processing of the data for processing of the feature extraction layer is completed is specified from the read-only memory.
Control so that the data for processing of the determination layer is stored and held in the random access memory at a time before the specified time.
The processor for a neural network according to claim 2.

It also has a decompression section that executes decompression processing on the compressed data.
The read-only memory is
The data for processing of the feature extraction layer is stored and retained as compressed data.
The extension part
The decompression process is executed on the compressed data stored and held in the read memory, and the decompression process is executed.
The inner product processing unit
Using the data stretched by the stretched portion, the processing of the feature extraction layer is executed.
The processor for a neural network according to any one of claims 1 to 3.

A neural network processor for executing neural network processing including feature extraction layer processing and judgment layer processing.
A read-only memory that stores and holds data for processing in the feature extraction layer,
A random access memory that can read and write data and stores and holds data for processing in the determination layer.
A control unit that controls the read-only memory and the random access memory,
It is a processing method for a neural network executed by using a processor for a neural network provided with.
The first step of reading the data for processing the feature extraction layer from the read-only memory as the first data and executing the processing of the feature extraction layer using the first data.
A second step of reading data for processing of the determination layer from the random access memory as second data and executing processing of the determination layer using the second data.
A processing method for a neural network including.

A program for causing a computer to execute the processing method for a neural network according to claim 5.