JP7179237B1

JP7179237B1 - neural network device

Info

Publication number: JP7179237B1
Application number: JP2022543749A
Authority: JP
Inventors: 督那須; 知嘉子中西
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-11-28
Anticipated expiration: 2042-03-10
Also published as: WO2023170855A1; JPWO2023170855A1

Abstract

ニューラルネットワーク装置（１００）のニューラルネットワーク解析部（１０２）は、ニューラルネットワークの演算構造を解析するネットワーク構造解析部（２０１）と、ニューラルネットワークを分割して得られる各演算について、当該演算を回路化するかソフトウェア処理するかを決定するニューラルネットワーク分割部（２０２）と、を有する。ニューラルネットワーク分割部（２０２）は、ニューラルネットワークの畳み込み演算を行う層の中で、同一または類似するパラメータを持つ層をグループ化する畳み込み層回路化部（５０１）と、グループ化された層の畳み込み演算それぞれについて、当該畳み込み演算を回路化した場合の回路規模を算出する回路規模算出部（５０３）と、回路規模算出部（５０３）が算出した回路規模に基づいて、回路化する演算を決定する回路化箇所決定部（５０４）と、を有する。The neural network analysis unit (102) of the neural network device (100) includes a network structure analysis unit (201) for analyzing the operation structure of the neural network, and each operation obtained by dividing the neural network into a circuit. and a neural network dividing unit (202) that determines whether to perform software processing. A neural network division unit (202) includes a convolution layer circuitization unit (501) that groups layers having the same or similar parameters among the layers that perform convolution operations of the neural network; For each operation, a circuit scale calculation unit (503) for calculating a circuit scale when the convolution operation is circuitized, and an operation to be circuitized is determined based on the circuit scale calculated by the circuit scale calculation unit (503). and a circuitization location determination unit (504).

Description

本開示は、人工知能技術に関し、特に、ニューラルネットワークを処理するプログラムを作成する装置および方法に関する。 TECHNICAL FIELD The present disclosure relates to artificial intelligence technology, and more particularly to an apparatus and method for creating a program for processing neural networks.

例えば画像処理等の分野において、ニューラルネットワークは非常に高い精度で処理を行うことが可能であり、昨今広く利用されている。また、ニューラルネットワークは多数の演算を内包しており、処理負荷が高いことでも知られている。所望の時間以内に処理を完了させるために、ＧＰＧＰＵ（General Purpose Graphics Processing Unit）などの専用プロセッサやＦＰＧＡ（Field Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）などのハードウェアでニューラルネットワークが実現されることも多い。 For example, in fields such as image processing, neural networks are widely used these days because they can perform processing with extremely high accuracy. In addition, neural networks include a large number of operations and are known to have a high processing load. In order to complete processing within a desired time, a neural network is realized by hardware such as a dedicated processor such as GPGPU (General Purpose Graphics Processing Unit), FPGA (Field Programmable Gate Array), and ASIC (Application Specific Integrated Circuit). There are many things.

ニューラルネットワークは、多数の演算を内包する一方で、畳み込み、活性化関数、全結合といった演算の組み合わせで構成されており、その構造は比較的単純である。例えば下記の特許文献１には、このニューラルネットワークの特性を利用して、演算量が多いという課題を解決する技術が提案されている。特許文献１の技術では、ニューラルネットワークの多層演算の各層演算に対応する単一命令に基づいて同一演算装置を制御することで異なるネットワーク構造を同一演算装置の演算にコンパイルし、それによって、同一演算装置がすべての層のロジック演算を実現できるようにしている（例えば、下記の特許文献１）。 A neural network contains a large number of operations, and is composed of a combination of operations such as convolution, activation function, and total connection, and its structure is relatively simple. For example, Japanese Unexamined Patent Application Publication No. 2002-200002 proposes a technique for solving the problem of a large amount of computation by using the characteristics of this neural network. In the technique of Patent Document 1, different network structures are compiled into operations of the same arithmetic unit by controlling the same arithmetic unit based on a single instruction corresponding to each layer operation of multi-layer operations of the neural network, thereby performing the same operation. The device is designed to implement logic operations of all layers (eg, Patent Document 1 below).

特開２０１９－１３９７４７号公報JP 2019-139747 A

特許文献１の技術では、ハードウェア上に予め用意された単一命令をネットワーク構造に合わせて連通して使用することで効率的にハードウェアリソースを使用し、高速にニューラルネットワーク処理を実現することができる。しかし、ハードウェア上に予め用意された単一命令の組み合わせを用いて処理を行うため、所望の時間内に処理が完了しないことが懸念される。 In the technique of Patent Document 1, hardware resources are efficiently used by connecting and using a single instruction prepared in advance on hardware according to the network structure, and neural network processing is realized at high speed. can be done. However, since processing is performed using a combination of single instructions prepared in advance on hardware, there is concern that processing may not be completed within a desired time.

本開示は以上のような課題を解決するためになされたものであり、より少ないリソースのハードウェア上で所望の性能（実行時間および精度）を達成できるニューラルネットワーク処理の設計を可能とすることを目的とする。 The present disclosure has been made to solve the above problems, and aims to enable the design of neural network processing that can achieve desired performance (execution time and accuracy) on hardware with fewer resources. aim.

本開示に係るニューラルネットワーク装置は、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定するニューラルネットワーク解析部と、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力するニューラルネットワーク演算方式出力部と、を備えるニューラルネットワーク装置であって、前記ニューラルネットワーク解析部は、前記ニューラルネットワークの演算構造を解析するネットワーク構造解析部と、前記ニューラルネットワークを分割して得られる前記各演算について、当該演算を回路化するかソフトウェア処理するかを決定するニューラルネットワーク分割部と、を有し、前記ネットワーク構造解析部は、前記ニューラルネットワークの各層を、当該層を構成する演算の種類に応じて分類する演算構造分類部と、前記演算構造分類部によって畳み込み演算を行う層に分類された層について、前記畳み込み演算のパラメータを特定する畳み込み層分析部と、を有し、前記ニューラルネットワーク分割部は、前記畳み込み層分析部が特定したパラメータに基づいて、同一または類似するパラメータを持つ層をグループ化する畳み込み層回路化部を有する。

The neural network device according to the present disclosure includes a neural network analysis unit that determines an operation method for each operation that constitutes a neural network, whether to circuitize the operation or to perform software processing, and a circuit that performs the operation that has been determined to be circuitized. and a neural network operation method output unit that creates and outputs a program for software processing of operations determined to be processed by software and circuit information for processing, wherein the neural network analysis unit is , a network structure analysis unit that analyzes the operation structure of the neural network; and a neural network division unit that determines whether each operation obtained by dividing the neural network is circuitized or software-processed; and the network structure analysis unit classifies each layer of the neural network into an operation structure classification unit that classifies each layer according to the type of operation that constitutes the layer, and a layer that performs a convolution operation by the operation structure classification unit. a convolutional layer analyzer that identifies parameters of the convolutional operation for each layer, wherein the neural network partitioner has the same or similar parameters based on the parameters identified by the convolutional layer analyzer. It has a convolutional layer circuitizer that groups the layers.

本開示によれば、より少ないリソースのハードウェア上で所望の性能（時間内処理および精度）を達成するニューラルネットワーク処理の設計が可能となる。 The present disclosure enables the design of neural network processing that achieves desired performance (in-time processing and accuracy) on hardware with fewer resources.

本開示の目的、特徴、態様、および利点は、以下の詳細な説明と添付図面とによって、より明白となる。 Objects, features, aspects and advantages of the present disclosure will become more apparent with the following detailed description and accompanying drawings.

実施の形態１に係るニューラルネットワーク装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a neural network device according to Embodiment 1; FIG. 実施の形態１に係るニューラルネットワーク解析部の構成を示すブロック図である。2 is a block diagram showing the configuration of a neural network analysis unit according to Embodiment 1; FIG. 実施の形態１に係るニューラルネットワーク演算方式出力部の構成を示すブロック図である。2 is a block diagram showing the configuration of a neural network arithmetic method output unit according to Embodiment 1; FIG. 実施の形態１に係るネットワーク構造解析部の構成を示すブロック図である。3 is a block diagram showing the configuration of a network structure analysis unit according to Embodiment 1; FIG. 実施の形態１に係るニューラルネットワーク分割部の構成を示すブロック図である。2 is a block diagram showing the configuration of a neural network dividing unit according to Embodiment 1; FIG. 実施の形態１に係る畳み込み層回路化部の処理を示すフローチャートである。4 is a flowchart showing processing of a convolutional layer circuitization unit according to Embodiment 1; 実施の形態１に係る活性化層回路化部の処理を示すフローチャートである。7 is a flow chart showing processing of an activation layer circuitization unit according to the first embodiment; 実施の形態１に係る回路規模算出部の処理を示すフローチャートである。7 is a flow chart showing processing of a circuit scale calculator according to the first embodiment; ニューラルネットワーク構築部のハードウェア構成例を示す図である。FIG. 3 is a diagram showing a hardware configuration example of a neural network construction unit; ニューラルネットワーク構築部のハードウェア構成例を示す図である。FIG. 3 is a diagram showing a hardware configuration example of a neural network construction unit; 実施の形態２に係るニューラルネットワーク装置の構成を示すブロック図である。FIG. 9 is a block diagram showing the configuration of a neural network device according to Embodiment 2;

＜実施の形態１＞
図１は、実施の形態１に係るニューラルネットワーク装置１００の構成を示すブロック図である。図１のように、ニューラルネットワーク装置１００は、ニューラルネットワーク解析部１０２と、ニューラルネットワーク演算方式出力部１０３と、記憶部１０４とを有するニューラルネットワーク構築部１０１を備える。<Embodiment 1>
FIG. 1 is a block diagram showing the configuration of a neural network device 100 according to Embodiment 1. As shown in FIG. As shown in FIG. 1 , the neural network device 100 includes a neural network constructing section 101 having a neural network analysis section 102 , a neural network arithmetic method output section 103 , and a storage section 104 .

ニューラルネットワーク解析部１０２は、記憶部１０４に格納されたニューラルネットワークのネットワーク構造データを読み取り、そのネットワーク構造を解析して、ニューラルネットワークを動作させるプログラムおよび回路を含む演算方式を決定し、決定した演算方式を出力する。すなわち、ニューラルネットワーク解析部１０２は、ニューラルネットワークを構成する各演算について、回路化するかソフトウェア処理するかという演算方式を決定する。 The neural network analysis unit 102 reads the network structure data of the neural network stored in the storage unit 104, analyzes the network structure, determines the operation method including the program and circuit for operating the neural network, and determines the operation Output the scheme. In other words, the neural network analysis unit 102 determines the calculation method of circuitization or software processing for each calculation that constitutes the neural network.

ニューラルネットワーク演算方式出力部１０３は、ニューラルネットワーク解析部１０２から受け取った演算方式を基に、ＣＰＵ（Central Processing Unit）等のプロセッサで動作するプログラムのデータと、ＦＰＧＡ上に演算回路を構築するための回路情報とを作成して出力する。すなわち、ニューラルネットワーク演算方式出力部１０３は、ニューラルネットワーク解析部１０２により回路化すると決定された演算を回路化するための回路情報と、ニューラルネットワーク解析部１０２によりソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力する。 Based on the calculation method received from the neural network analysis unit 102, the neural network calculation method output unit 103 outputs data of a program that operates on a processor such as a CPU (Central Processing Unit), and data for constructing an calculation circuit on an FPGA. Create and output circuit information. That is, the neural network operation method output unit 103 performs software processing of circuit information for circuitizing operations determined to be circuitized by the neural network analysis unit 102 and operations determined to be software-processed by the neural network analysis unit 102. Create a program to do and output.

図２は、ニューラルネットワーク解析部１０２の構成を示すブロック図である。図２のように、ニューラルネットワーク解析部１０２は、ネットワーク構造解析部２０１と、ニューラルネットワーク分割部２０２とを有する。 FIG. 2 is a block diagram showing the configuration of the neural network analysis unit 102. As shown in FIG. As shown in FIG. 2 , the neural network analysis section 102 has a network structure analysis section 201 and a neural network division section 202 .

ネットワーク構造解析部２０１は、記憶部１０４から読み取ったニューラルネットワークのネットワーク構造データを基に、ネットワーク内の演算構造を解析し、その解析結果を出力する。 The network structure analysis unit 201 analyzes the operation structure in the network based on the network structure data of the neural network read from the storage unit 104, and outputs the analysis result.

ニューラルネットワーク分割部２０２は、ネットワーク構造解析部２０１から入力されるネットワーク内の演算構造の解析結果を受け、ネットワーク構造を構成する演算を、予め定められた規模の演算単位に分割し、分割後の各演算の処理をＣＰＵで動作させるかＦＰＧＡで動作させるか（つまり、ソフトウェア処理するか回路化するか）を決定し、その決定結果を分割後の各演算と関連付けして出力する。 The neural network dividing unit 202 receives the analysis result of the operation structure in the network input from the network structure analysis unit 201, divides the operations forming the network structure into operation units of a predetermined scale, and It determines whether the processing of each operation is to be performed by the CPU or the FPGA (that is, whether to be processed by software or by circuitization), and the result of the determination is output in association with each operation after division.

図３は、ニューラルネットワーク演算方式出力部１０３の構成を示すブロック図である。図３のように、ニューラルネットワーク演算方式出力部１０３は、制御プログラム作成部３０１と、演算回路作成部３０２と、データ取得回路用制御データ生成部３０３とを有する。 FIG. 3 is a block diagram showing the configuration of the neural network arithmetic method output unit 103. As shown in FIG. As shown in FIG. 3 , the neural network arithmetic method output unit 103 has a control program creation unit 301 , an arithmetic circuit creation unit 302 , and a control data generation unit 303 for data acquisition circuit.

制御プログラム作成部３０１は、ニューラルネットワーク解析部１０２が決定したニューラルネットワークの各処理の演算方式を受け、ＣＰＵでソフトウェア処理する演算について、ＣＰＵ向けのプログラムを作成して出力する。当該プログラムには、ＦＰＧＡ上で動く演算回路の入出力を管理し、ＦＰＧＡを制御することでニューラルネットワーク全体の処理を可能とする制御プログラムも含まれる。制御プログラムは、例えば、演算回路Ａへデータを入力し、演算回路Ａから演算結果を受け取り、それを演算回路Ｂへと入力する、といった処理を行う。また、演算処理の一部をＦＰＧＡ上で行わずＣＰＵ上で行うために、演算処理の一部を制御プログラムに含ませてもよい。その場合、制御プログラムは、例えば、演算回路Ａと演算回路Ｂの出力の積を計算する、といった処理を行う。制御プログラム作成部３０１が出力するプログラムは、例えばＣ言語などで記述されたコードを、コンパイラにより特定のＣＰＵ向けにコンパイルした実行プログラムのバイナリデータなどである。 The control program creation unit 301 receives the calculation method of each process of the neural network determined by the neural network analysis unit 102, creates and outputs a program for the CPU for calculations to be software-processed by the CPU. The program also includes a control program that manages input and output of arithmetic circuits running on the FPGA and controls the FPGA to enable processing of the entire neural network. The control program performs processing such as inputting data to the arithmetic circuit A, receiving an arithmetic result from the arithmetic circuit A, and inputting it to the arithmetic circuit B, for example. Also, part of the arithmetic processing may be included in the control program so that part of the arithmetic processing is performed on the CPU instead of on the FPGA. In that case, the control program performs processing such as calculating the product of the outputs of the arithmetic circuit A and the arithmetic circuit B, for example. The program output by the control program creation unit 301 is, for example, binary data of an execution program obtained by compiling a code written in C language or the like with a compiler for a specific CPU.

演算回路作成部３０２は、ニューラルネットワーク解析部１０２が決定したニューラルネットワークを動作させるプログラムおよび回路を含む演算方式を受け、ＦＰＧＡ上で動作する演算回路を構築するための回路情報を作成して出力する。 The arithmetic circuit creation unit 302 receives the arithmetic method including the program and circuit for operating the neural network determined by the neural network analysis unit 102, and creates and outputs circuit information for constructing an arithmetic circuit that operates on the FPGA. .

データ取得回路用制御データ生成部３０３は、ＦＰＧＡがＣＰＵからデータを受け取る際に使用する、共有メモリなどからデータを受けとる専用回路に提供するパラメータを、回路情報に基づいて算出する。例えば、ＦＰＧＡ上の演算回路Ａが、一定のサイズ（データ幅）のデータを受け取り、それに含まれる一定数のデータの平均を演算するものであり、一定のサイズのデータが共有メモリに格納されると、演算回路ＡがＣＰＵからの演算実行指令を受け取ることなしに演算を開始する動作、すなわち演算回路Ａが自律的にデータを取得する動作を行う場合、上記のパラメータは、その動作を可能にするための「一定のサイズ」の値を、予めＦＰＧＡ上の演算回路に組み込むためのものである。 The data acquisition circuit control data generator 303 calculates, based on the circuit information, parameters to be provided to a dedicated circuit that receives data from a shared memory or the like, which is used when the FPGA receives data from the CPU. For example, an arithmetic circuit A on an FPGA receives data of a certain size (data width) and calculates the average of a certain number of data contained therein, and the data of a certain size is stored in the shared memory. Then, when arithmetic circuit A starts an operation without receiving an operation execution command from the CPU, that is, when arithmetic circuit A autonomously acquires data, the above parameters enable the operation. It is for incorporating in advance a value of "fixed size" to be used in the arithmetic circuit on the FPGA.

図４は、ネットワーク構造解析部２０１の構成素示すブロック図である。図４のように、ネットワーク構造解析部２０１は、演算構造分類部４０１と、畳み込み層分析部４０２と、活性化層分析部４０３とを有する。 FIG. 4 is a block diagram showing components of the network structure analysis unit 201. As shown in FIG. As shown in FIG. 4 , the network structure analysis unit 201 has an arithmetic structure classification unit 401 , a convolutional layer analysis unit 402 and an activation layer analysis unit 403 .

演算構造分類部４０１は、記憶部１０４から読み取ったニューラルネットワークのネットワーク構造データを基に、ニューラルネットワーク内の各層がどのような演算で構成されているかを分析し、その分析結果を各層の演算情報として各層に関連付けて出力する。各層に関連付ける演算は、例えば、畳み込み演算であったり、活性化関数を用いた演算であったりする。なお、各層に関連付ける演算には、畳み込み演算および活性化関数以外の他の演算が含まれていてもよい。その場合、他の演算は、回路化されてもよいし、ＣＰＵで処理されてもよい。例えば、全結合層を構成する演算は、そのまま回路化されてもよいし、パラメータの大きな畳み込み演算とみなして畳み込み層として回路化してもよい。 Based on the network structure data of the neural network read from the storage unit 104, the operation structure classification unit 401 analyzes what kind of operation each layer in the neural network is composed of, and converts the analysis result into the operation information of each layer. , which is associated with each layer and output. The operation associated with each layer is, for example, a convolution operation or an operation using an activation function. Note that the operations associated with each layer may include operations other than convolution operations and activation functions. In that case, other operations may be circuitized or processed by the CPU. For example, the operations forming the fully connected layer may be circuitized as they are, or may be regarded as convolutional operations with large parameters and circuitized as a convolutional layer.

畳み込み層分析部４０２は、演算構造分類部４０１からニューラルネットワーク内の各層の演算情報を受け取り、それらの層のうち畳み込み演算を行う層の分析を行う。具体的には、畳み込み演算を行う各層で行われる畳み込み演算のパラメータを特定する。当該パラメータは、例えば、入力サイズ、出力サイズ、カーネル数、カーネルサイズ、BorderModeなどである。 The convolutional layer analysis unit 402 receives the operation information of each layer in the neural network from the operation structure classification unit 401, and analyzes the layer that performs the convolution operation among those layers. Specifically, the parameters of the convolution operation performed in each layer in which the convolution operation is performed are specified. The parameters are, for example, input size, output size, number of kernels, kernel size, BorderMode, and the like.

活性化層分析部４０３は、演算構造分類部４０１からニューラルネットワーク内の各層の演算情報を受け取り、それらの層のうち活性化関数に基づく処理を行う層の分析を行う。具体的には、活性化関数に基づく処理を行う各層で用いられる活性化関数が何かを特定する。 The activation layer analysis unit 403 receives the operation information of each layer in the neural network from the operation structure classification unit 401, and analyzes the layer that performs processing based on the activation function among those layers. Specifically, it identifies what activation function is used in each layer that performs processing based on the activation function.

図５は、ニューラルネットワーク分割部２０２の構成を示す図である。図５のように、ニューラルネットワーク分割部２０２は、畳み込み層回路化部５０１、活性化層回路化部５０２、回路規模算出部５０３、回路化箇所決定部５０４を有する。 FIG. 5 is a diagram showing the configuration of the neural network dividing unit 202. As shown in FIG. As shown in FIG. 5, the neural network dividing unit 202 has a convolutional layer circuitization unit 501, an activation layer circuitization unit 502, a circuit scale calculation unit 503, and a circuitization location determination unit 504. FIG.

畳み込み層回路化部５０１は、ネットワーク構造解析部２０１による解析結果を受け取り、畳み込み演算処理の回路化に係る情報を出力する。 The convolutional layer circuitization unit 501 receives the analysis result by the network structure analysis unit 201, and outputs information related to circuitization of the convolution arithmetic processing.

活性化層回路化部５０２は、ネットワーク構造解析部２０１による解析結果を受け取り、活性化関数の回路化に係る情報を出力する。 The activation layer circuitization unit 502 receives the analysis result from the network structure analysis unit 201 and outputs information on circuitization of the activation function.

回路規模算出部５０３は、畳み込み層回路化部５０１および活性化層回路化部５０２が出力する回路化に係る情報を受け、各演算を回路化した場合の回路規模を算出する。 The circuit scale calculation unit 503 receives information related to circuitization output from the convolutional layer circuitization unit 501 and the activation layer circuitization unit 502, and calculates the circuit scale when each operation is circuitized.

回路化箇所決定部５０４は、回路規模算出部５０３が算出した各演算の回路規模の情報を受け、回路化する演算と、回路化せずにＣＰＵ等のプロセッサにて処理する演算とを切り分ける。 A circuitization location determination unit 504 receives information on the circuit scale of each operation calculated by the circuit scale calculation unit 503, and separates calculations to be circuitized from calculations to be processed by a processor such as a CPU without circuitization.

図６は、畳み込み層回路化部５０１の処理を示すフローチャートである。以下、図６を参照しつつ、畳み込み層回路化部５０１の動作を説明する。 FIG. 6 is a flow chart showing the processing of the convolutional layer circuitization unit 501. As shown in FIG. The operation of the convolutional layer circuitization unit 501 will be described below with reference to FIG.

畳み込み層回路化部５０１は、まず、ネットワーク構造解析部２０１から受け取るニューラルネットワーク内の各層の演算情報の中から、ひとつの畳み込み層の情報を取得する（ステップＳ６０１）。そして、畳み込み層回路化部５０１は、取得した情報に含まれるパラメータを確認し、当該パラメータの情報を当該畳み込み層に関連付けて畳み込み層情報として保存する（ステップＳ６０２）。 The convolutional layer circuitization unit 501 first acquires information of one convolutional layer from the operation information of each layer in the neural network received from the network structure analysis unit 201 (step S601). Then, the convolutional layer circuitizing unit 501 confirms the parameter included in the acquired information, associates the information of the parameter with the convolutional layer, and saves it as convolutional layer information (step S602).

続いて、畳み込み層回路化部５０１は、ニューラルネットワーク内の各層の演算情報の中に、まだ情報を取得していない畳み込み層が残っているか確認する（ステップＳ６０３）。情報を取得していない畳み込み層が残っていれば、ステップＳ６０１へ戻り、畳み込み層回路化部５０１は、次の畳み込み層に対してステップＳ６０１，Ｓ６０２の処理を行う。 Subsequently, the convolutional layer circuitization unit 501 checks whether there is a convolutional layer whose information has not yet been acquired among the operation information of each layer in the neural network (step S603). If there remains a convolutional layer for which information has not been acquired, the process returns to step S601, and the convolutional layer circuitization unit 501 performs the processes of steps S601 and S602 on the next convolutional layer.

全ての畳み込み層の情報を取得済みであれば、畳み込み層回路化部５０１は、保存された畳み込み層情報に対応する畳み込み層のうちから、同一のパラメータを持つ層を抽出し、同一のパラメータを持つ層をグループ化する（ステップＳ６０４）。言い換えれば、畳み込み層回路化部５０１は、畳み込み層を、それが持つパラメータごとにグループ分けする。ここでは、グルーピングする単位を層単位としたが、層内の畳み込み演算を構成する積和演算をさらに小さなグルーピングの単位としてもよい。 If the information of all the convolutional layers has already been acquired, the convolutional layer circuitization unit 501 extracts layers having the same parameters from among the convolutional layers corresponding to the stored convolutional layer information, and extracts the layers having the same parameters. The layers that have are grouped (step S604). In other words, the convolutional layer circuitization unit 501 groups convolutional layers according to their parameters. Here, the grouping unit is a layer unit, but the sum-of-products operation that constitutes the convolution operation in the layer may be a smaller grouping unit.

ステップＳ６０４のグループ化が完了すると、畳み込み層回路化部５０１は、作成されたグループ数が予め定められた数以上であるか確認する（ステップＳ６０５）。グループ数が予め定められた数以上ある場合は、畳み込み層回路化部５０１は、さらに、類似するパラメータを持つグループをグループ化する（ステップＳ６０６）。ここでいう「類似」とは、パラメータ同士が一定範囲内で近接する関係として定義されてもよいし、一方のグループのパラメータがもう一方のグループのパラメータを含むという包含関係として定義されてもよい。パラメータが包含関係にあるグループをグループ化するのは、包含する側の演算を回路化しておけば、包含される側の演算は包含する側の演算の回路の一部を使用して実行できることによる。なお、グループ化の結果を保存する際には、各グループに属する演算がニューラルネットワーク全体において何回使用されるかという情報を関連付けて保存する。 When the grouping in step S604 is completed, the convolutional layer circuitization unit 501 confirms whether or not the number of created groups is equal to or greater than a predetermined number (step S605). If the number of groups is equal to or greater than the predetermined number, the convolutional layer circuitization unit 501 further groups groups having similar parameters (step S606). "Similar" here may be defined as a relationship in which parameters are close to each other within a certain range, or may be defined as an inclusion relationship in which parameters in one group include parameters in the other group. . The reason for grouping groups in which parameters have an inclusion relationship is that if the operation on the including side is circuitized, the operation on the included side can be executed using a part of the circuit of the operation on the including side. . When storing the grouping result, information indicating how many times the operation belonging to each group is used in the entire neural network is stored in association with the grouping result.

図７は、活性化層回路化部５０２の処理を示すフローチャートである。以下、図７を参照しつつ、活性化層回路化部５０２の動作を説明する。 FIG. 7 is a flow chart showing the processing of the activation layer circuitization unit 502. As shown in FIG. The operation of the activation layer circuitization unit 502 will be described below with reference to FIG.

活性化層回路化部５０２は、まず、ネットワーク構造解析部２０１から受け取るニューラルネットワーク内の各層の演算情報に含まれる各層の活性化関数を取得し、活性化関数情報として保存する（ステップＳ７０１）。そして、活性化層回路化部５０２は、保存した活性化関数の中から同一の活性化関数を抽出し、同一の活性化関数をグループ化する（ステップＳ７０２）。言い換えれば、活性化層回路化部５０２は、活性化関数に基づく処理を行う層を、活性化関数ごとにグループ分けする。なお、グループ化の結果を保存する際には、各グループに属する演算がニューラルネットワーク全体において何回使用されるかという情報を関連付けて保存する。 Activation layer circuitization unit 502 first acquires the activation function of each layer included in the operation information of each layer in the neural network received from network structure analysis unit 201, and stores it as activation function information (step S701). Then, the activation layer circuitization unit 502 extracts the same activation functions from the saved activation functions and groups the same activation functions (step S702). In other words, the activation layer circuitization unit 502 groups the layers to be processed based on the activation function for each activation function. When storing the grouping result, information indicating how many times the operation belonging to each group is used in the entire neural network is stored in association with the grouping result.

次に、活性化層回路化部５０２は、グループ化された活性化関数のうちのひとつを取得し（ステップＳ７０３）、当該活性化関数が線形近似可能かどうかを確認する（ステップＳ７０４）。当該活性化関数が線形近似可能であれば、線形近似された関数を当該活性化関数に関連付けて保存する（ステップＳ７０５）。 Next, the activation layer circuitization unit 502 acquires one of the grouped activation functions (step S703), and checks whether the activation function can be linearly approximated (step S704). If the activation function can be linearly approximated, the linearly approximated function is stored in association with the activation function (step S705).

続いて、活性化層回路化部５０２は、グループ化された活性化関数の中に、まだ取得していない活性化関数が残っているか確認する（ステップＳ７０６）。まだ取得していない活性化関数が残っていれば、ステップＳ７０３へ戻り、畳み込み層回路化部５０１は、次の活性化関数に対してステップＳ７０３～Ｓ７０６の処理を行う。つまり、畳み込み層回路化部５０１は、ステップＳ７０３～Ｓ７０６の処理を、グループ化された活性化関数の全てに対して行う。 Subsequently, the activation layer circuitization unit 502 checks whether there is an activation function that has not yet been acquired among the grouped activation functions (step S706). If there remains an activation function that has not yet been acquired, the process returns to step S703, and the convolutional layer circuitization unit 501 performs the processes of steps S703 to S706 for the next activation function. That is, the convolutional layer circuitization unit 501 performs the processing of steps S703 to S706 for all grouped activation functions.

回路規模算出部５０３は、畳み込み層回路化部５０１によってグループ化された畳み込み層情報、および、活性化層回路化部５０２によってグループ化された活性化関数情報のそれぞれのグループにつき、回路化した際の規模を算出する。回路化した際の規模は、例えば、回路に含まれる加算器や乗算器などの演算器の数に基づいて算出することができる。 The circuit scale calculation unit 503 performs circuitization for each group of the convolutional layer information grouped by the convolutional layer circuitization unit 501 and the activation function information grouped by the activation layer circuitization unit 502. Calculate the scale of The scale of the circuit can be calculated based on the number of calculators such as adders and multipliers included in the circuit, for example.

図８は、回路規模算出部５０３の処理を示すフローチャートである。以下、図８を参照しつつ、回路規模算出部５０３の動作を説明する。 FIG. 8 is a flow chart showing the processing of the circuit scale calculator 503. As shown in FIG. The operation of the circuit scale calculator 503 will be described below with reference to FIG.

回路規模算出部５０３は、まず、グループ化された畳み込み層情報に含まれる畳み込み層のグループをひとつ取得する（ステップＳ８０１）。そして、回路規模算出部５０３は、取得した畳み込み層に関連付けて保存されているパラメータに基づいて、当該畳み込み層の演算の回路化に必要な演算器を決定する（ステップＳ８０２）。そして回路規模算出部５０３は、決定した演算器の種類と個数から、当該畳み込み層の演算を回路化した場合の回路規模を算出する（ステップＳ８０３）。 The circuit scale calculation unit 503 first acquires one convolutional layer group included in the grouped convolutional layer information (step S801). Then, the circuit scale calculation unit 503 determines arithmetic units necessary for circuitization of the calculation of the convolutional layer based on the parameters stored in association with the acquired convolutional layer (step S802). Then, the circuit scale calculation unit 503 calculates the circuit scale when the calculation of the convolutional layer is circuitized from the determined type and number of arithmetic units (step S803).

なお、演算の回路化において選択される演算器は、当該演算を実行できる最小規模の演算器である必要はなく、あえて大きな回路規模の演算器が選択されてもよい。これは、回路規模の大きさが２の累乗であればデータアクセス面で効率的であるため、演算器の回路規模の大きさが２の累乗となるように、意図的に演算器の回路規模を大きくすることによる。 It should be noted that the arithmetic unit selected in the circuitization of the arithmetic does not have to be the smallest arithmetic unit capable of executing the arithmetic operation, and an arithmetic unit with a large circuit scale may be selected. If the circuit scale is a power of 2, data access is efficient. by increasing

その後、回路規模算出部５０３は、グループ化された畳み込み層情報の中に、まだ取得していない畳み込み層のグループが残っているか確認する（ステップＳ８０４）。まだ取得していない畳み込み層のグループが残っていれば、ステップＳ８０１へ戻り、回路規模算出部５０３は、次のグループに対してステップＳ８０１～Ｓ８０３の処理を行う。つまり、回路規模算出部５０３は、ステップＳ８０１～Ｓ８０３の処理を、畳み込み層のグループの全てに対して行う。 After that, the circuit scale calculation unit 503 checks whether there is a convolutional layer group that has not yet been acquired in the grouped convolutional layer information (step S804). If there are groups of convolutional layers that have not yet been acquired, the process returns to step S801, and the circuit scale calculation unit 503 performs the processes of steps S801 to S803 on the next group. That is, the circuit scale calculation unit 503 performs the processing of steps S801 to S803 for all convolutional layer groups.

全ての畳み込み層のグループを取得済みであれば、回路規模算出部５０３は、グループ化された活性化関数情報に含まれる、活性化層に対応する活性化関数のグループをひとつ取得する（ステップＳ８０５）。次に、回路規模算出部５０３は、当該活性化関数を回路化するのに必要な演算器を決定する（ステップＳ８０６）。そして、回路規模算出部５０３は、演算器の種類と個数から、当該活性化関数を回路化した場合の回路規模を算出する（ステップＳ８０７）。 If the groups of all convolutional layers have already been acquired, the circuit scale calculation unit 503 acquires one group of activation functions corresponding to the activation layer included in the grouped activation function information (step S805). ). Next, the circuit scale calculation unit 503 determines arithmetic units required to circuitize the activation function (step S806). Then, the circuit scale calculation unit 503 calculates the circuit scale when the activation function is circuitized from the types and the number of arithmetic units (step S807).

続いて、回路規模算出部５０３は、当該活性化関数が線形近似可能かどうか判断する（ステップＳ８０８）。当該活性化関数が線形近似可能な場合は、近似関数を特定し（ステップＳ８０９）、近似関数の回路化に必要な演算器を決定し（ステップＳ８１０）、決定した演算器の種類と個数から回路規模を算出し（ステップＳ８１１）、算出した回路規模を当該活性化関数に関連付けて保存する。 Subsequently, the circuit scale calculator 503 determines whether the activation function can be linearly approximated (step S808). If the activation function can be linearly approximated, the approximation function is specified (step S809), the arithmetic units required for circuitization of the approximation function are determined (step S810), and the circuit is constructed from the type and number of the determined arithmetic units. A scale is calculated (step S811), and the calculated circuit scale is stored in association with the activation function.

その後、回路規模算出部５０３は、グループ化された活性化関数情報の中に、まだ取得していない活性化関数のグループが残っているか確認する（ステップＳ８１２）。まだ取得していない活性化関数のグループが残っていれば、ステップＳ８０５へ戻り、回路規模算出部５０３は、次のグループに対してステップＳ８０５～Ｓ８１２の処理を行う。つまり、回路規模算出部５０３は、ステップＳ８０５～Ｓ８１２の処理を、活性化関数のグループの全てに対して行う。 After that, the circuit scale calculation unit 503 checks whether or not a group of activation functions that have not yet been acquired remains in the grouped activation function information (step S812). If there remains a group of activation functions that have not yet been acquired, the process returns to step S805, and the circuit scale calculation unit 503 performs the processes of steps S805 to S812 on the next group. That is, the circuit scale calculation unit 503 performs the processing of steps S805 to S812 for all activation function groups.

回路化箇所決定部５０４は、回路規模算出部５０３から、ニューラルネットワークを処理するのに必要な回路の一覧と、その回路規模の情報とを受け取り、実装先のＦＰＧＡの容量を考慮して、ニューラルネットワークの処理のうち回路化すべき処理を決定する。このとき回路化しないと決定された処理は、ＣＰＵ等のプロセッサ上でソフトウェア処理されることとなる。回路化する処理の選定基準（つまり、各処理を回路化するかどうかの判断基準）としては、ニューラルネットワーク処理の実行時間が小さく（短く）なるかどうか、また、ニューラルネットワークの精度が高くなるかどうか、という二つの要素が勘案される。 The circuit location determination unit 504 receives a list of circuits necessary for processing the neural network and information on the circuit scale from the circuit scale calculation unit 503, and considers the capacity of the FPGA to be implemented, and determines the neural Among the network processes, the processes to be circuitized are determined. At this time, the processing determined not to be circuitized is processed by software on a processor such as a CPU. The selection criteria for the processing to be circuitized (that is, the criteria for determining whether to circuitize each processing) are whether the execution time of the neural network processing is reduced (shortened) and whether the accuracy of the neural network is increased. Two factors are taken into consideration.

通常、ニューラルネットワーク処理の実行時間は、各処理をソフトウェア処理するよりも回路化した方が小さくなる。そのため、ニューラルネットワーク処理の実行時間を選定基準にして回路化する処理を選定する場合、回路規模算出部５０３から受け取ったニューラルネットワークを処理するのに必要な回路の一覧のうち、ソフトウェア処理した場合の処理時間が大きく（長く）ものほど優先的に回路化すべきと判断される。このとき、各回路がニューラルネットワークの処理の中で使用される回数も考慮される。例えば、処理Ａをソフトウェア処理した場合の実行時間が５ミリ秒、処理Ｂをソフトウェア処理した場合の実行時間が３０ミリ秒であったとしても、ニューラルネットワークにおける処理Ａの使用回数が２０回、処理Ｂの使用回数２回であれば、ニューラルネットワーク全体で見ると処理Ａの処理時間（５ミリ秒×２０）の方が処理Ｂの処理時間（３０ミリ秒×２）がよりも長いため、処理Ａが処理Ｂよりも優先的に回路化すべきと判断される。 Usually, the execution time of neural network processing is shorter when each processing is circuitized than when each processing is processed by software. Therefore, when selecting processing to be circuitized based on the execution time of neural network processing as a selection criterion, among the list of circuits necessary for processing the neural network received from the circuit scale calculation unit 503, It is determined that the larger (longer) processing time should be preferentially circuitized. At this time, the number of times each circuit is used in the processing of the neural network is also considered. For example, even if the execution time for software processing of process A is 5 milliseconds and the execution time for process B is 30 milliseconds for software processing, the number of times processing A is used in the neural network is 20 times. If B is used twice, the processing time of process A (5 milliseconds × 20) is longer than the processing time of process B (30 milliseconds × 2) in terms of the neural network as a whole. It is determined that A should be circuitized with priority over process B.

一方、ニューラルネットワークの精度は、各処理を回路化してもソフトウェア処理しても基本的に同じである。ただし、活性化関数の処理を線形近似関数を用いて行った場合は、精度が低下する可能性がある。そのため、活性化関数の処理を回路化するかどうかの判断は、回路化による実行時間の短縮の度合いと精度の低下の度合いとの両方を考慮する必要がある。線形近似による精度の低下については、活性化関数を線形近似することによって生じる、ニューラルネットワーク内で取りうる関数入力の定義域における値域の誤差（線形近似する場合としない場合との差異）が大きいほど、精度の低下の度合いが大きいといえる。また、ニューラルネットワーク内で使用される回数が多い活性化関数ほど、当該活性化関数を線形近似したときの精度の低下は大きくなるといえる。さらに、活性化関数がニューラルネットワーク内で使用される箇所も、当該活性化関数を線形近似したときの精度の低下の度合いに影響する。例えば、ニューラルネットワークの最初の層である入力層に近い箇所の活性化関数は、ニューラルネットワークへの入力データに直接影響を与えるため、線形近似することで生じる精度の低下は大きく、後段の層の活性化関数は、特徴量抽出後の処理に用いられるため、線形近似することで生じる精度の低下は小さいと考えられる。ただし、必ずしも、入力層に近い箇所の活性化関数ほど線形近似による精度の低下が大きいと評価しなくてもよい。回路化箇所決定部５０４は、これらを総合して、ニューラルネットワークの精度の低下の度合いを評価することとなる。例えば、回路化箇所決定部５０４は、活性化関数を線形近似することによって生じる誤差と、当該活性化関数のニューラルネットワーク内での使用回数との積を、当該活性化関数の線形近似による精度低下の度合いとして算出してもよい。 On the other hand, the accuracy of the neural network is basically the same regardless of whether each process is circuitized or software processed. However, if the activation function is processed using a linear approximation function, the accuracy may decrease. Therefore, it is necessary to consider both the degree of shortening of the execution time and the degree of deterioration of accuracy due to circuitization in determining whether or not to circuitize the processing of the activation function. Regarding the decrease in accuracy due to linear approximation, the larger the range error (difference between linear approximation and non-linear approximation) in the domain of possible function inputs in the neural network caused by linear approximation of the activation function, , it can be said that the degree of decrease in accuracy is large. Also, it can be said that the more frequently an activation function is used in a neural network, the greater the decrease in precision when the activation function is linearly approximated. Furthermore, the location where the activation function is used in the neural network also affects the degree of accuracy degradation when the activation function is linearly approximated. For example, the activation function near the input layer, which is the first layer of the neural network, directly affects the input data to the neural network. Since the activation function is used for processing after feature quantity extraction, it is considered that a decrease in precision caused by linear approximation is small. However, it is not always necessary to evaluate that the lower the accuracy due to linear approximation, the greater the decrease in the accuracy of the activation function closer to the input layer. The circuitization location determining unit 504 evaluates the degree of deterioration of the accuracy of the neural network by integrating these. For example, the circuitization location determination unit 504 calculates the product of the error caused by linearly approximating the activation function and the number of times the activation function is used in the neural network, and calculates the accuracy reduction due to the linear approximation of the activation function. may be calculated as the degree of

また、ＦＰＧＡの容量を超えての回路化は実現できないため、ＦＰＧＡの容量という制約により回路化できる処理には上限が設定される。一方で、ニューラルネットワーク処理の実行時間は、より多くの処理が回路化されるほど短縮の度合いは大きくなる。さらに、非線形処理を含む活性化関数の処理を回路化したときの回路規模は、線形処理のみの活性化関数の処理を回路化したときの回路規模に比べて大きくなるため、非線形処理を含む活性化関数の処理を回路化すると、ＦＰＧＡの容量の制約により、回路化できる処理の数を減らしてしまう。活性化関数を線形近似して回路化すれば、この問題を回避することができるが、線形近似による精度の低下が生じる。つまり、ニューラルネットワーク処理の実行時間（すなわち回路化する処理の数）とニューラルネットワークの精度とは、トレードオフの関係にある。そのため、回路化箇所決定部５０４は、ニューラルネットワーク処理の実行時間とニューラルネットワークの精度という相反する要素のバランスを考えて、回路化する箇所を決定する。例えば、ニューラルネットワーク処理の実行時間とニューラルネットワークの精度との重み付き線形和を評価関数とし、当該評価関数の値が最小となるよう回路化する箇所を決定するといった手法が考えられる。このとき、ニューラルネットワーク処理の実行時間の上限、ニューラルネットワークの精度の下限、ＣＰＵとＦＰＧＡとの間で必要となる共有メモリの容量やデータ転送バスの帯域幅の上限などが、制約条件として加えられてもよい。また、極端な実行時間の増加や精度の低下は好ましくないため、実行時間の増加や精度の低下についてペナルティーを科すような非線形関数を評価関数としてもよい。 In addition, since circuitization beyond the capacity of the FPGA cannot be realized, an upper limit is set for processing that can be circuitized due to the limitation of the capacity of the FPGA. On the other hand, the execution time of neural network processing is reduced more as more processing is circuitized. Furthermore, since the circuit scale of the activation function processing including nonlinear processing is larger than the circuit scale of activation function processing including nonlinear processing, If the processing of the transformation function is circuitized, the number of processing that can be circuitized is reduced due to the limitation of the capacity of the FPGA. This problem can be avoided by linearly approximating the activation function to form a circuit, but the linear approximation causes a loss of precision. In other words, there is a trade-off between the execution time of neural network processing (that is, the number of processes to be circuitized) and the accuracy of the neural network. Therefore, the circuitization location determining unit 504 determines the circuitization location by considering the balance between conflicting elements, namely, the execution time of the neural network processing and the accuracy of the neural network. For example, a method is conceivable in which a weighted linear sum of the execution time of neural network processing and the accuracy of the neural network is used as an evaluation function, and a portion to be circuitized is determined so that the value of the evaluation function is minimized. At this time, the upper limit of the execution time of the neural network processing, the lower limit of the accuracy of the neural network, the upper limit of the shared memory capacity required between the CPU and the FPGA, the bandwidth of the data transfer bus, etc. are added as constraint conditions. may Moreover, since an extreme increase in execution time and a decrease in accuracy are not preferable, a non-linear function that penalizes an increase in execution time and a decrease in accuracy may be used as the evaluation function.

このように、実施の形態１に係るニューラルネットワーク装置１００によれば、処理の回路化によるニューラルネットワーク処理の実行時間の短縮の度合いと精度の低下の度合いとを考慮して、ニューラルネットワークの各処理をＦＰＧＡ上で回路化するかＣＰＵ等のプロセッサでソフトウェア処理するかを判断することで、ニューラルネットワーク処理の設計が行われる。これにより、プロセッサとＦＰＧＡ上の回路とが連携して処理を行うニューラルネットワークを設計でき、小さなリソースのハードウェア上で高速な処理を行うことができるニューラルネットワークを実現することが可能となる。 As described above, according to the neural network device 100 according to the first embodiment, each processing of the neural network is performed in consideration of the degree of reduction in the execution time of neural network processing and the degree of deterioration in accuracy due to circuitization of processing. Neural network processing is designed by determining whether to circuitize on an FPGA or to perform software processing on a processor such as a CPU. As a result, it is possible to design a neural network in which the processor and the circuit on the FPGA cooperate to perform processing, and to realize a neural network capable of performing high-speed processing on hardware with small resources.

図９および図１０は、それぞれニューラルネットワーク構築部１０１のハードウェア構成の例を示す図である。図１に示したニューラルネットワーク構築部１０１の構成要素の各機能は、例えば図９に示す処理回路１０により実現される。すなわち、ニューラルネットワーク構築部１０１は、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定し、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力するための処理回路１０を備える。処理回路１０は、専用のハードウェアであってもよいし、メモリに格納されたプログラムを実行するプロセッサ（中央処理装置（ＣＰＵ：Central Processing Unit）、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、ＤＳＰ（Digital Signal Processor）とも呼ばれる）を用いて構成されていてもよい。 9 and 10 are diagrams showing examples of hardware configurations of the neural network constructing unit 101, respectively. Each function of the components of the neural network constructing unit 101 shown in FIG. 1 is implemented by, for example, the processing circuit 10 shown in FIG. That is, the neural network constructing unit 101 determines, for each operation that constitutes the neural network, an operation method, that is, whether the operation is to be circuitized or software processed, and a circuit for circuitizing the operation determined to be circuitized. A processing circuit 10 is provided for creating and outputting a program for software processing of information and operations determined to be processed by software. The processing circuit 10 may be dedicated hardware, or a processor (central processing unit (CPU: Central Processing Unit), processing device, arithmetic device, microprocessor, microcomputer, etc.) that executes a program stored in a memory. Also called a DSP (Digital Signal Processor)).

処理回路１０が専用のハードウェアである場合、処理回路１０は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、またはこれらを組み合わせたものなどが該当する。ニューラルネットワーク構築部１０１の構成要素の各々の機能が個別の処理回路で実現されてもよいし、それらの機能がまとめて一つの処理回路で実現されてもよい。 When the processing circuit 10 is dedicated hardware, the processing circuit 10 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or a combination of these. Each function of the components of the neural network constructing unit 101 may be realized by individual processing circuits, or these functions may be collectively realized by one processing circuit.

図１０は、処理回路１０がプログラムを実行するプロセッサ１１を用いて構成されている場合におけるニューラルネットワーク構築部１０１のハードウェア構成の例を示している。この場合、ニューラルネットワーク構築部１０１の構成要素の機能は、ソフトウェア等（ソフトウェア、ファームウェア、またはソフトウェアとファームウェアとの組み合わせ）により実現される。ソフトウェア等はプログラムとして記述され、メモリ１２に格納される。プロセッサ１１は、メモリ１２に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、ニューラルネットワーク構築部１０１は、プロセッサ１１により実行されるときに、ニューラルネットワークを構成する各演算について、当該演算を回路化するかソフトウェア処理するかという演算方式を決定する処理と、回路化すると決定された演算を回路化するための回路情報およびソフトウェア処理すると決定された演算をソフトウェア処理するためのプログラムを作成して出力する処理と、が結果的に実行されることになるプログラムを格納するためのメモリ１２を備える。換言すれば、このプログラムは、ニューラルネットワーク構築部１０１の構成要素の動作の手順や方法をコンピュータに実行させるものであるともいえる。 FIG. 10 shows an example of the hardware configuration of the neural network construction unit 101 when the processing circuit 10 is configured using a processor 11 that executes programs. In this case, the functions of the components of the neural network construction unit 101 are implemented by software or the like (software, firmware, or a combination of software and firmware). Software or the like is written as a program and stored in the memory 12 . The processor 11 reads out and executes programs stored in the memory 12 to achieve the functions of each unit. That is, when executed by the processor 11, the neural network constructing unit 101 performs a process of determining an operation method for each operation constituting the neural network, that is, whether the operation is to be circuitized or software-processed. Circuit information for circuitizing the determined operation, processing to create and output a program for software processing of the determined operation, and a program to be executed as a result are stored. A memory 12 is provided for In other words, it can be said that this program causes a computer to execute the procedures and methods of operation of the components of the neural network constructing unit 101 .

ここで、メモリ１２は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ、ＥＰＲＯＭ（Erasable Programmable Read Only Memory）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）などの、不揮発性または揮発性の半導体メモリ、ＨＤＤ（Hard Disk Drive）、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ（Digital Versatile Disc）およびそのドライブ装置のほか、今後使用されるあらゆる記憶媒体であってもよい。 Here, the memory 12 is, for example, a non-volatile or Volatile semiconductor memories, HDDs (Hard Disk Drives), magnetic disks, flexible disks, optical disks, compact disks, mini disks, DVDs (Digital Versatile Disks) and their drive devices, as well as all storage media that will be used in the future. good too.

以上、ニューラルネットワーク構築部１０１の構成要素の機能が、ハードウェアおよびソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、ニューラルネットワーク構築部１０１の一部の構成要素を専用のハードウェアで実現し、別の一部の構成要素をソフトウェア等で実現する構成であってもよい。例えば、一部の構成要素については専用のハードウェアとしての処理回路１０でその機能を実現し、他の一部の構成要素についてはプロセッサ１１としての処理回路１０がメモリ１２に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 The configuration in which the functions of the components of the neural network constructing unit 101 are realized by either hardware or software has been described above. However, the configuration is not limited to this, and a configuration in which some components of the neural network construction unit 101 are realized by dedicated hardware and another part of the components is realized by software or the like may be used. For example, the functions of some of the components are implemented by the processing circuit 10 as dedicated hardware, and the processing circuit 10 as the processor 11 executes the programs stored in the memory 12 for some of the other components. Its function can be realized by reading and executing it.

以上のように、ニューラルネットワーク構築部１０１は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。 As described above, the neural network constructing unit 101 can implement each of the functions described above using hardware, software, etc., or a combination thereof.

＜実施の形態２＞
図１１は、実施の形態２に係るニューラルネットワーク装置１００の構成を示すブロック図である。図１１においては、実施の形態１（図１）に示したものと同一または同等の要素には同一の符号を付しており、ここではそれらの説明は省略する。<Embodiment 2>
FIG. 11 is a block diagram showing the configuration of neural network device 100 according to the second embodiment. In FIG. 11, elements identical or equivalent to those shown in Embodiment 1 (FIG. 1) are denoted by the same reference numerals, and descriptions thereof are omitted here.

図１１のように、実施の形態２に係るニューラルネットワーク装置１００は、ニューラルネットワーク構築部１０１に加え、ニューラルネットワーク実行部９０１を備えている。ニューラルネットワーク実行部９０１は、記憶部９０５と、ＣＰＵ９０２と、ＦＰＧＡ９０３と、メモリ９０４と、データ取得回路９０６とを有する。 As shown in FIG. 11 , the neural network device 100 according to the second embodiment includes a neural network construction section 101 and a neural network execution section 901 . The neural network execution unit 901 has a storage unit 905 , a CPU 902 , an FPGA 903 , a memory 904 and a data acquisition circuit 906 .

ニューラルネットワーク実行部９０１は、ニューラルネットワーク構築部１０１が作成したプログラムおよび回路情報に基づき、ＣＰＵ９０２とＦＰＧＡ９０３とが連携して処理を行うニューラルネットワークの演算処理を実行する。 The neural network execution unit 901 executes neural network arithmetic processing in which the CPU 902 and the FPGA 903 work together based on the program and circuit information created by the neural network construction unit 101 .

記憶部９０５は、ニューラルネットワーク構築部１０１が作成したプログラムおよび回路情報を記憶する。ＣＰＵ９０２は、記憶部９０５に記憶されたプログラムを読み取り、当該プログラムに基づいて、ＣＰＵ９０２に割り当てられたニューラルネットワークの演算処理ならびにＦＰＧＡ９０３の制御を行う。ＦＰＧＡ９０３は、記憶部９０５に記憶された回路情報を読み取り、当該回路情報に基づいて、演算回路を構成し、ＦＰＧＡ９０３に割り当てられたニューラルネットワークの演算処理を行う。 A storage unit 905 stores the program and circuit information created by the neural network construction unit 101 . The CPU 902 reads a program stored in the storage unit 905 and performs arithmetic processing of the neural network assigned to the CPU 902 and control of the FPGA 903 based on the program. The FPGA 903 reads circuit information stored in the storage unit 905 , configures an arithmetic circuit based on the circuit information, and performs arithmetic processing of the neural network assigned to the FPGA 903 .

メモリ９０４は、ＣＰＵ９０２とＦＰＧＡ９０３の間でやり取りされるデータを中継するためのものである。より具体的には、ＣＰＵ９０２が、ＦＰＧＡ９０３上に構築された回路を用いた演算の入力データをメモリ９０４に格納し、ＦＰＧＡ９０３が、この入力データを読み出して回路上の演算に用いる。また、ＦＰＧＡ９０３が、その演算結果をメモリ９０４に格納し、ＣＰＵ９０２が、その演算結果をメモリ９０４から読み出してソフトウェア処理に用いる。 The memory 904 is for relaying data exchanged between the CPU 902 and the FPGA 903 . More specifically, the CPU 902 stores input data for computation using a circuit constructed on the FPGA 903 in the memory 904, and the FPGA 903 reads out this input data and uses it for computation on the circuit. The FPGA 903 stores the calculation result in the memory 904, and the CPU 902 reads the calculation result from the memory 904 and uses it for software processing.

データ取得回路９０６は、メモリ９０４からＦＰＧＡ９０３がデータを読み出す際に利用する回路である。本実施の形態では、データ取得回路９０６は、ＦＰＧＡ９０３上に演算回路の１つとして構築されている。 A data acquisition circuit 906 is a circuit used when the FPGA 903 reads data from the memory 904 . In this embodiment, the data acquisition circuit 906 is built on the FPGA 903 as one of the arithmetic circuits.

一般的に、ＦＰＧＡが外部のメモリからデータを読み出して演算を行う場合、必要なデータがメモリの予め定められた位置に格納された旨の通知を受けて、データの取得を開始する。データ取得回路９０６は、この通知処理を省略するためのものである。具体的には、データ取得回路９０６は、ＦＰＧＡ９０３上の回路それぞれの入力データのサイズを予め定めておき、そのサイズのデータがメモリ９０４上に揃った段階で当該データを自動的にＦＰＧＡ９０３に転送する。入力データのサイズは、ニューラルネットワーク構築部１０１が回路構成を決定する段階で確定するため、ニューラルネットワーク構築部１０１のデータ取得回路用制御データ生成部３０３で算出することができる。ニューラルネットワーク構築部１０１は、データ取得回路用制御データ生成部３０３が算出した各回路の入力データのサイズを回路情報に含ませて、記憶部９０５に格納する。 In general, when an FPGA reads out data from an external memory and performs an operation, it receives a notification that necessary data has been stored in a predetermined location in the memory and starts acquiring data. The data acquisition circuit 906 is for omitting this notification process. Specifically, the data acquisition circuit 906 predetermines the size of the input data for each circuit on the FPGA 903, and automatically transfers the data to the FPGA 903 when the data of that size is available on the memory 904. . Since the size of the input data is determined when the neural network construction unit 101 determines the circuit configuration, it can be calculated by the data acquisition circuit control data generation unit 303 of the neural network construction unit 101 . The neural network construction unit 101 includes the size of the input data of each circuit calculated by the data acquisition circuit control data generation unit 303 in the circuit information and stores the circuit information in the storage unit 905 .

なお、図１１においては、ＣＰＵ９０２、ＦＰＧＡ９０３およびメモリ９０４は、それぞれ別のブロックとして示されているが、これらすべてを搭載した１チップのＳｏＣ（System-on-a-Chip）で構成されていてもよい。 In FIG. 11, the CPU 902, FPGA 903, and memory 904 are shown as separate blocks. good.

また、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略したりすることが可能である。 In addition, it is possible to freely combine each embodiment, and to modify or omit each embodiment as appropriate.

上記した説明は、すべての態様において、例示であって、例示されていない無数の変形例が想定され得るものと解される。 It is to be understood that the above description is illustrative in all aspects and that countless variations not illustrated can be envisaged.

１００ニューラルネットワーク装置、１０１ニューラルネットワーク構築部、１０２ニューラルネットワーク解析部、１０３ニューラルネットワーク演算方式出力部、１０４記憶部、２０１ネットワーク構造解析部、２０２ニューラルネットワーク分割部、３０１制御プログラム作成部、３０２演算回路作成部、３０３データ取得回路用制御データ生成部、４０１演算構造分類部、４０２畳み込み層分析部、４０３活性化層分析部、５０１畳み込み層回路化部、５０２活性化層回路化部、５０３回路規模算出部、５０４回路化箇所決定部、９０１ニューラルネットワーク実行部、９０２ＣＰＵ、９０３ＦＰＧＡ、９０４メモリ、９０５記憶部、９０６データ取得回路、１０処理回路、１１プロセッサ、１２メモリ。 100 neural network device, 101 neural network construction unit, 102 neural network analysis unit, 103 neural network calculation method output unit, 104 storage unit, 201 network structure analysis unit, 202 neural network division unit, 301 control program creation unit, 302 arithmetic circuit Creation unit 303 Control data generation unit for data acquisition circuit 401 Operation structure classification unit 402 Convolution layer analysis unit 403 Activation layer analysis unit 501 Convolution layer circuitization unit 502 Activation layer circuitization unit 503 Circuit scale Calculation unit 504 Circuit location determination unit 901 Neural network execution unit 902 CPU 903 FPGA 904 Memory 905 Storage unit 906 Data acquisition circuit 10 Processing circuit 11 Processor 12 Memory.

Claims

a neural network analysis unit that determines an operation method for each operation that constitutes the neural network, whether the operation is circuitized or software-processed;
a neural network operation method output unit that creates and outputs circuit information for circuitizing the operation determined to be circuitized and a program for software processing the operation determined to be processed by software;
A neural network device comprising:
The neural network analysis unit is
a network structure analysis unit that analyzes the operation structure of the neural network;
a neural network dividing unit that determines whether each of the operations obtained by dividing the neural network is to be circuitized or software-processed;
The network structure analysis unit
an operation structure classification unit that classifies each layer of the neural network according to the type of operation that constitutes the layer;
a convolution layer analysis unit that identifies parameters for the convolution operation for layers classified by the operation structure classification unit as layers that perform convolution operations;
The neural network dividing unit
a convolutional layer circuitization unit that groups layers having the same or similar parameters based on the parameters identified by the convolutional layer analysis unit;
Neural network device.

The neural network dividing unit further
a circuit scale calculation unit for calculating a circuit scale when the convolution operation is circuitized for each convolution operation of the layers grouped by the convolution layer circuitization unit;
a circuitization location determination unit that determines an operation to be circuitized based on the circuit scale calculated by the circuit scale calculation unit;
The neural network device according to claim 1.

The neural network operation method output unit,
an arithmetic circuit creation unit that creates the circuit information;
a control program creation unit that creates the program,
The program created by the control program creation unit includes a control program for managing input/output of an arithmetic circuit constructed based on the circuit information.
3. The neural network device according to claim 1 or 2 .

The network structure analysis unit
further comprising an activation layer analysis unit that specifies an activation function used in each layer classified by the arithmetic structure classification unit as a layer that performs processing based on the activation function;
The neural network dividing unit
Grouping the same activation functions among the activation functions used in each layer classified into layers that perform processing based on the activation functions by the arithmetic structure classification unit, and linearly approximable among the grouped activation functions further comprising an activation layer circuitization unit that stores a linear approximation function obtained by linearly approximating an object in association with the activation function;
The circuit scale calculation unit further calculates a circuit scale when each of the grouped activation function and the linear approximation function is circuitized.
3. The neural network device according to claim 2 .

The neural network device is
Further comprising a neural network execution unit that executes neural network processing based on the circuit information and the program output by the neural network operation method output unit,
The neural network execution unit is
a storage unit that stores the circuit information and the program;
a CPU that executes the program;
an FPGA that constructs an arithmetic circuit based on the circuit information and executes an arithmetic operation by the arithmetic circuit;
a memory for relaying data between the CPU and the FPGA;
comprising a
The neural network device according to any one of claims 1 to 4 .

The neural network execution unit is
further comprising a data acquisition circuit that automatically acquires data passed from the CPU to the FPGA via the memory from the memory and reads the data into the FPGA;
The neural network operation method output unit,
further comprising a data acquisition circuit control data generation unit that creates data for controlling the data acquisition circuit based on the circuit information;
The neural network device according to claim 5 .

The data for controlling the data acquisition circuit is data indicating the size of the input data of the arithmetic circuit.
The neural network device according to claim 6 .