JP6973651B2

JP6973651B2 - Arithmetic optimizers, methods and programs

Info

Publication number: JP6973651B2
Application number: JP2020537921A
Authority: JP
Inventors: 芙美代鷹野; 崇竹中; 誠也柴田; 浩明井上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2021-12-01
Anticipated expiration: 2038-08-21
Also published as: WO2020039493A1; JPWO2020039493A1

Description

本発明は、例えば、ニューラルネットワーク等の判別モデルを用いた演算を最適化する演算最適化装置、演算最適化方法および演算最適化プログラムに関する。 The present invention relates to, for example, an arithmetic optimization device, an arithmetic optimization method, and an arithmetic optimization program for optimizing an arithmetic using a discrimination model such as a neural network.

与えられたデータに対する推論を、モデルを用いることによって行う場合がある。このようなモデルは、判別モデルと呼ばれる。例えば、画像データが与えられ、その画像データと、判別モデルとによって、その画像データが表わしている物（画像に写っている物）を推論する場合がある。 Inferences about given data may be made by using a model. Such a model is called a discriminant model. For example, image data may be given, and the image data and the discriminant model may infer what the image data represents (the object shown in the image).

判別モデルの例として、ニューラルネットワークが知られている。ニューラルネットワークでは、複数の層が結合されたモデルであり、それぞれの層は、１つ以上のユニット（ニューロン）で構成されている。ニューラルネットワークを用いて推論処理を行う場合、入力データを入力層に入力し、入力層側から出力層側に順方向に演算を行うことによって、入力データに関する推論結果が得られる。 A neural network is known as an example of a discrimination model. A neural network is a model in which a plurality of layers are connected, and each layer is composed of one or more units (neurons). When the inference process is performed using a neural network, the inference result regarding the input data can be obtained by inputting the input data to the input layer and performing the calculation in the forward direction from the input layer side to the output layer side.

ニューラルネットワークは、深層学習によって学習される。 Neural networks are learned by deep learning.

ニューラルネットワークを用いた演算を行うためのツールが、例えば、非特許文献１に記載されている。 A tool for performing an operation using a neural network is described in, for example, Non-Patent Document 1.

ニューラルネットワークを用いた演算を行うための既存のツールには、ニューラルネットワークの全ての層の演算精度を一律に変えることができるものがある。例えば、ニューラルネットワークの全ての層の演算を、浮動小数点演算に設定したり、整数演算に設定したりすることができるツールがある。また、そのような設定は、ユーザ（人間）が行う。 Some existing tools for performing operations using neural networks can uniformly change the calculation accuracy of all layers of the neural network. For example, there are tools that can set operations on all layers of a neural network to floating-point operations or integer operations. Further, such a setting is made by a user (human).

“NVIDIA TensorRT”、［online］、NVIDIA Corporation、［平成３０年７月３日検索］、インターネット<URL: https://developer.nvidia.com/tensorrt>"NVIDIA TensorRT", [online], NVIDIA Corporation, [Search on July 3, 2018], Internet <URL: https://developer.nvidia.com/tensorrt>

前述のように、ニューラルネットワークを用いた演算を行うための既存のツールには、ニューラルネットワークの全ての層の演算精度を一律に変えることができるものがある。しかし、そのようなツールでは、個々の層の演算精度をそれぞれ個別に設定することができない。そのため、ニューラルネットワークを用いた演算を最適化することが困難であった。 As mentioned above, some existing tools for performing operations using neural networks can uniformly change the calculation accuracy of all layers of the neural network. However, with such a tool, the calculation accuracy of each layer cannot be set individually. Therefore, it has been difficult to optimize the calculation using the neural network.

例えば、ニューラルネットワークの全ての層の演算を浮動小数点演算に設定した場合には、高精度で演算を行えるが、消費電力等に関する効率は低下する。逆に、ニューラルネットワークの全ての層の演算を整数演算に設定した場合には、消費電力等に関する効率は向上するが、演算精度は低下する。 For example, when the calculation of all layers of the neural network is set to the floating-point calculation, the calculation can be performed with high accuracy, but the efficiency related to power consumption and the like is lowered. On the contrary, when the calculation of all layers of the neural network is set to the integer calculation, the efficiency regarding power consumption and the like is improved, but the calculation accuracy is lowered.

また、既存のツールでは、ニューラルネットワーク全体の演算精度を、浮動小数点演算にするか、整数演算にするかを、ユーザが判断しなければならなかった。 In addition, with existing tools, the user had to decide whether to use floating-point arithmetic or integer arithmetic for the calculation accuracy of the entire neural network.

そこで、本発明は、判別モデルを用いた演算を最適化できるように、判別モデルの各層における演算精度を自動的に定めることができる演算最適化装置、演算最適化方法および演算最適化プログラムを提供することを目的とする。 Therefore, the present invention provides a calculation optimization device, a calculation optimization method, and a calculation optimization program that can automatically determine the calculation accuracy in each layer of the discrimination model so that the calculation using the discrimination model can be optimized. The purpose is to do.

本発明による演算最適化装置は、１つ以上のユニットでそれぞれ構成された複数の層が結合された判別モデルを用いた演算で、第１の演算精度で演算を行う第１の演算回路をどの層に適用し、第１の演算精度よりも高い第２の演算精度で演算を行う第２の演算回路をどの層に適用するかを定めた情報である適用パターン毎に、所定の説明変数の値を取得する説明変数値取得手段と、所定の説明変数で表される目的関数の値を、適用パターン毎に計算する目的関数計算手段と、目的関数の値が最小となる適用パターンを決定する適用パターン決定手段とを備えることを特徴とする。 The calculation optimization device according to the present invention is a calculation using a discrimination model in which a plurality of layers composed of one or more units are connected, and which is a first calculation circuit that performs a calculation with the first calculation accuracy. A predetermined explanatory variable for each application pattern, which is information that defines to which layer the second arithmetic circuit, which is applied to the layer and performs the arithmetic with the second arithmetic accuracy higher than the first arithmetic accuracy, is applied. Explanatory variable to acquire the value Determine the value acquisition means, the objective function calculation means that calculates the value of the objective function represented by the predetermined explanatory variable for each application pattern, and the application pattern that minimizes the value of the objective function. It is characterized by providing means for determining an application pattern.

また、本発明による演算最適化方法は、１つ以上のユニットでそれぞれ構成された複数の層が結合された判別モデルを用いた演算で、第１の演算精度で演算を行う第１の演算回路をどの層に適用し、第１の演算精度よりも高い第２の演算精度で演算を行う第２の演算回路をどの層に適用するかを定めた情報である適用パターン毎に、所定の説明変数の値を取得し、所定の説明変数で表される目的関数の値を、適用パターン毎に計算し、目的関数の値が最小となる適用パターンを決定することを特徴とする。 Further, the arithmetic optimization method according to the present invention is an arithmetic using a discriminant model in which a plurality of layers composed of one or more units are combined, and the first arithmetic circuit performs the arithmetic with the first arithmetic accuracy. Is information on which layer the second arithmetic circuit is applied to, and the second arithmetic circuit that performs the arithmetic with the second arithmetic accuracy higher than the first arithmetic accuracy is applied to which layer. A predetermined explanation is given for each application pattern. It is characterized in that the value of a variable is acquired, the value of an objective function represented by a predetermined explanatory variable is calculated for each application pattern, and the application pattern in which the value of the objective function is minimized is determined.

また、本発明による演算最適化プログラムは、コンピュータに、１つ以上のユニットでそれぞれ構成された複数の層が結合された判別モデルを用いた演算で、第１の演算精度で演算を行う第１の演算回路をどの層に適用し、第１の演算精度よりも高い第２の演算精度で演算を行う第２の演算回路をどの層に適用するかを定めた情報である適用パターン毎に、所定の説明変数の値を取得する説明変数値取得処理、所定の説明変数で表される目的関数の値を、適用パターン毎に計算する目的関数計算処理、および、目的関数の値が最小となる適用パターンを決定する適用パターン決定処理を実行させることを特徴とする。 Further, the calculation optimization program according to the present invention is a first calculation using a discrimination model in which a plurality of layers composed of one or more units are connected to a computer, and the calculation is performed with the first calculation accuracy. For each application pattern, which is information that defines to which layer the second arithmetic circuit is applied and which layer the second arithmetic circuit is applied to, which performs arithmetic with a second arithmetic accuracy higher than that of the first arithmetic accuracy. The explanatory variable value acquisition process for acquiring the value of a predetermined explanatory variable, the objective function calculation process for calculating the value of the objective function represented by the predetermined explanatory variable for each application pattern, and the value of the objective function are minimized. It is characterized in that an application pattern determination process for determining an application pattern is executed.

本発明によれば、判別モデルを用いた演算を最適化できるように、判別モデルの各層における演算精度を自動的に定めることができる。 According to the present invention, the calculation accuracy in each layer of the discrimination model can be automatically determined so that the calculation using the discrimination model can be optimized.

ニューラルネットワークを用いた推論処理を示す模式図である。It is a schematic diagram which shows the inference processing using a neural network. １つのユニットに着目したときの当該ユニットの入出力および他のユニットとの結合の例を示す説明図である。It is explanatory drawing which shows the example of the input / output of the said unit, and the coupling with other units when paying attention to one unit. ニューラルネットワークの各層のうち、一部の層の演算を低精度で実行し、残りの層の演算を高精度で実行する処理装置の例を示す模式図である。It is a schematic diagram which shows the example of the processing apparatus which executes the operation of a part of the layers of a neural network with low accuracy, and executes the operation of the remaining layers with high accuracy. 低精度演算回路の一例を示す概略構成図である。It is a schematic block diagram which shows an example of a low precision arithmetic circuit. ＭＡＣの構成例を示すブロック図である。It is a block diagram which shows the configuration example of MAC. ニューラルネットワークを用いた推論処理の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the processing progress of the inference processing using a neural network. 本発明の演算最適化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the arithmetic optimization apparatus of this invention. 適用パターンの例を示す模式図である。It is a schematic diagram which shows the example of the application pattern. 処理装置を備える演算最適化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the arithmetic optimization apparatus which includes the processing apparatus. 設計情報記憶部を備える演算最適化装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the arithmetic optimization apparatus provided with the design information storage part. 本発明の実施形態の演算最適化装置の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the processing progress of the arithmetic optimization apparatus of embodiment of this invention. 本発明の実施形態またはその変形例に係るコンピュータの構成例を示す概略ブロック図である。It is a schematic block diagram which shows the structural example of the computer which concerns on embodiment of this invention, or the modification thereof. 本発明の演算最適化装置の概要を示すブロック図である。It is a block diagram which shows the outline of the arithmetic optimization apparatus of this invention.

本発明の演算最適化装置は、１つ以上のユニットでそれぞれ構成された複数の層が結合された判別モデルにおける各層の演算精度を決定する。このような判別モデルの例として、ニューラルネットワークがある。以下の説明では、判別モデルがニューラルネットワークである場合を例にして説明する。ただし、判別モデルは、ニューラルネットワークに限定されるわけではない。 The calculation optimization device of the present invention determines the calculation accuracy of each layer in a discrimination model in which a plurality of layers each composed of one or more units are combined. An example of such a discriminant model is a neural network. In the following description, the case where the discriminant model is a neural network will be described as an example. However, the discriminant model is not limited to the neural network.

また、以下の説明では、ニューラルネットワークを用いた処理として、与えられた入力データが示す内容を推論する処理を例にして説明する。例えば、画像データが与えられ、その画像データと、ニューラルネットワークとによって、その画像データが表わしている物（画像に写っている物）を推論する処理を例にして説明する。 Further, in the following description, as a process using a neural network, a process of inferring the content indicated by the given input data will be described as an example. For example, a process in which image data is given and an object represented by the image data (an object reflected in the image) is inferred by the image data and a neural network will be described as an example.

ただし、ニューラルネットワークを用いた処理は、上記の推論処理に限定されず、例えば、ニューラルネットワークの各層のパラメタ更新処理等もある。後述の実施形態では、本発明の演算最適化装置が、推論処理を行う場合におけるニューラルネットワークの各層の演算精度を決定する場合を例にして説明するが、上記のパラメタ更新処理等の他の処理も本発明に適用可能である。 However, the processing using the neural network is not limited to the above inference processing, and may include, for example, parameter updating processing for each layer of the neural network. In the embodiment described later, the case where the calculation optimization device of the present invention determines the calculation accuracy of each layer of the neural network in the case of performing inference processing will be described as an example, but other processing such as the above parameter update processing will be described. Is also applicable to the present invention.

図１は、ニューラルネットワークを用いた推論処理を示す模式図である。図１において、ニューラルネットワークにおけるニューロンに相当するユニット５１が楕円で表されている。各層には、１つ以上のユニットが存在する。また、線分５２（図中のユニット間を結ぶ線）は、ユニット間結合を表わす。また、矢印５３（図中の右向きの太線矢印）は、推論処理を模式的に表している。なお、図１では、各ユニット５１への入力が前段の層のユニットの出力となるフィードフォワード型のニューラルネットワークの例を示しているが、各ユニット５１への入力はこれに限られない。例えば、時系列情報を保持している場合には、リカレント型のニューラルネットワークのように、各ユニット５１への入力に、前の時刻における前段の層のユニットの出力を含めることも可能である。なお、そのような場合も、推論処理の方向は、入力層から出力層への向かう方向（順方向）であるとみなされる。このように入力層から所定の順番で行われる推論処理は「順伝搬」とも呼ばれる。以下の説明では、入力層を第０層と記し、出力層を第ｎ層と記す。 FIG. 1 is a schematic diagram showing inference processing using a neural network. In FIG. 1, the unit 51 corresponding to a neuron in a neural network is represented by an ellipse. There are one or more units in each layer. Further, the line segment 52 (the line connecting the units in the figure) represents the connection between the units. Further, the arrow 53 (thick arrow pointing to the right in the figure) schematically represents the inference process. Note that FIG. 1 shows an example of a feedforward type neural network in which the input to each unit 51 is the output of the unit in the previous layer, but the input to each unit 51 is not limited to this. For example, when holding time-series information, it is possible to include the output of the unit of the previous layer at the previous time in the input to each unit 51, as in the case of a recurrent type neural network. Even in such a case, the direction of the inference processing is considered to be the direction (forward direction) from the input layer to the output layer. Inference processing performed in a predetermined order from the input layer in this way is also called "forward propagation". In the following description, the input layer is referred to as a 0th layer, and the output layer is referred to as an nth layer.

図２は、１つのユニット５１に着目したときの当該ユニット５１の入出力および他のユニットとの結合の例を示す説明図である。図２（ａ）は、１つのユニット５１の入出力の例を示し、図２（ｂ）は、２層に並べられたユニット５１の間の結合の例を示す。図２（ａ）に示すように、１つのユニット５１に対して４つの入力（ｘ_１〜ｘ_４）と１つの出力（ｚ）があった場合に、当該ユニット５１の動作は、例えば、式（１Ａ）のように表される。ここで、ｆ（）は活性化関数を表わしている。FIG. 2 is an explanatory diagram showing an example of input / output of the unit 51 and connection with another unit when focusing on one unit 51. FIG. 2A shows an example of input / output of one unit 51, and FIG. 2B shows an example of coupling between the units 51 arranged in two layers. As shown in FIG. 2A, when there are four inputs (x _{1 to} _{x 4} ) and one output (z) for one unit 51, the operation of the unit 51 is, for example, an equation. It is expressed as (1A). Here, f () represents an activation function.

ｚ＝ｆ（ｕ）・・・（１Ａ）
ただし、ｕ＝ａ＋ｗ_１ｘ_１＋ｗ_２ｘ_２＋ｗ_３ｘ_３＋ｗ_４ｘ_４・・・（１Ｂ）z = f (u) ... (1A)
However, u = a + w ₁ x ₁ + w ₂ x ₂ + w ₃ x ₃ + w ₄ x ₄ ... (1B)

式（１Ｂ）において、ａは切片、ｗ_１〜ｗ_４は各入力（ｘ_１〜ｘ_４）に対応した重み等のパラメタを表す。In the formula (1B), a represents an intercept, and w _{1 to} _{w 4} represent parameters such as weights corresponding to each input (x _{1 to} _{x 4).}

一方、図２（ｂ）に示すように、２層に並べられた層間で各ユニット５１が結合されている場合、後段の層に着目すると、当該層（２層のうちの後段の層）内の各ユニットへの入力（それぞれｘ_１〜ｘ_４）に対する各ユニット５１の出力（ｚ_１〜ｚ_４）は、例えば、次のように表される。なお、ｉは同一層内のユニットの識別子（本例ではｉ＝１〜３）である。On the other hand, as shown in FIG. 2B, when each unit 51 is bonded between the layers arranged in the two layers, when focusing on the subsequent layer, the inside of the layer (the latter layer of the two layers). the output of each unit 51 for input to each unit _(x 1 ~x _4, respectively) of _(z 1 to z _4), for example, be expressed as follows. Note that i is an identifier of a unit in the same layer (i = 1 to 3 in this example).

ｚ_ｉ＝ｆ（ｕ_ｉ）・・・（２Ａ）
ただし、ｕ_ｉ＝ａ＋ｗ_ｉ，１ｘ_１＋ｗ_ｉ，２ｘ_２＋ｗ_ｉ，３ｘ_３＋ｗ_ｉ，４ｘ_４
・・・（２Ｂ） _{_{z i = f (u i)}} ··· (2A)
However, u _i = a + wi _{, 1} x ₁ + wi _{, 2} x ₂ + wi _{, 3} x ₃ + wi _{, 4} x ₄
... (2B)

以下では、式（２Ｂ）を単純化して、ｕ_ｉ＝Σｗ_ｉ，ｋ＊ｘ_ｋと記す場合がある。なお、切片ａは省略した。なお、切片ａを値１の定数項の係数（パラメタの１つ）とみなすことも可能である。ここで、ｋは、当該層における各ユニット５１への入力の識別子を表わす。より具体的には、ｋは、その入力を行う他のユニットの識別子を表わしているということもできる。このとき当該層における各ユニット５１への入力が前段の層の各ユニットの出力のみである場合には、上記の簡略式を、ｕ_ｉ ^（Ｌ）＝Σｗ_ｉ，ｋ ^（Ｌ）＊ｘ_ｋ ^{（Ｌ−１）}と記すことも可能である。なお、Ｌは層の識別子を表わす。これらの式において、ｗ_ｉ，ｋが、当該層（第Ｌ層）における各ユニットｉのパラメタに相当する。このパラメタは、より具体的には、各ユニットｉと他のユニットｋとの結合（ユニット間結合）の重みに相当する。以下では、ユニットを特に区別せず、ユニットの出力値を決める関数（活性化関数）を簡略化して、ｚ＝Σｗ＊ｘと記す場合がある。In the following, the equation (2B) may be simplified and described as _{u i} = Σwi _{i, k} * x _k. The intercept a was omitted. It is also possible to regard the intercept a as a coefficient (one of the parameters) of the constant term having a value of 1. Here, k represents an identifier of an input to each unit 51 in the layer. More specifically, k can also be said to represent the identifier of another unit that makes the input. At this time when the input to the unit 51 in the layer is only the output of each unit of the preceding layer, the aforementioned simplified _{^{formula, u i (L) = Σw}} i, k (L) * x k ( It is also possible to write ^L-1). In addition, L represents an identifier of a layer. In these equations, wi _{and k} correspond to the parameters of each unit i in the layer (L layer). More specifically, this parameter corresponds to the weight of the connection between each unit i and the other unit k (connection between units). In the following, the unit is not particularly distinguished, and the function (activation function) that determines the output value of the unit may be simplified and described as z = Σw * x.

上記の例において、ある層の各ユニット５１について、入力ｘから出力ｚを求める演算が、その層における推論処理に相当する。 In the above example, for each unit 51 of a certain layer, the operation of obtaining the output z from the input x corresponds to the inference processing in that layer.

本発明の実施形態を説明する前に、ニューラルネットワークの各層のうち、一部の層の演算を低精度で実行し、残りの層の演算を高精度で実行する処理装置の例について説明する。図３は、上記の処理装置の例を示す模式図である。処理装置１８は、例えば、低精度演算回路５と、高精度演算回路６と、第１メモリ７と、第２メモリ８と、第３メモリ９とを備える。低精度演算回路５、高精度演算回路６、第１メモリ７、第２メモリ８および第３メモリ９は、例えば、バス１０を介して接続される。 Before explaining the embodiment of the present invention, an example of a processing device that executes the calculation of a part of the layers of the neural network with low accuracy and the calculation of the remaining layers with high accuracy will be described. FIG. 3 is a schematic diagram showing an example of the above processing apparatus. The processing device 18 includes, for example, a low-precision arithmetic circuit 5, a high-precision arithmetic circuit 6, a first memory 7, a second memory 8, and a third memory 9. The low-precision arithmetic circuit 5, the high-precision arithmetic circuit 6, the first memory 7, the second memory 8, and the third memory 9 are connected via, for example, a bus 10.

低精度演算回路５は、推論処理において、ニューラルネットワークの各層のうち、一部の層の演算を第１の演算精度で実行する。 In the inference processing, the low-precision arithmetic circuit 5 executes the arithmetic of a part of each layer of the neural network with the first arithmetic accuracy.

第１メモリ７は、低精度演算回路５が演算を実行する際に使用するメモリであり、低精度演算回路５は、第１メモリ７に適宜アクセスしながら、演算を実行する。 The first memory 7 is a memory used when the low-precision arithmetic circuit 5 executes an arithmetic, and the low-precision arithmetic circuit 5 executes an arithmetic while appropriately accessing the first memory 7.

高精度演算回路６は、推論処理において、ニューラルネットワークの各層のうち、残りの層の演算を、第１の演算精度よりも高い第２の演算精度で実行する。 In the inference processing, the high-precision arithmetic circuit 6 executes the arithmetic of the remaining layers of each layer of the neural network with a second arithmetic accuracy higher than the first arithmetic accuracy.

第２メモリ８は、高精度演算回路６が演算を実行する際に使用するメモリであり、高精度演算回路６は、第２メモリ８に適宜アクセスしながら、演算を実行する。 The second memory 8 is a memory used when the high-precision calculation circuit 6 executes a calculation, and the high-precision calculation circuit 6 executes the calculation while appropriately accessing the second memory 8.

なお、第１メモリ７および第２メモリ８は、異なるメモリで実現されも、単一のメモリで実現されていてもよい。第１メモリ７および第２メモリ８が単一のメモリで実現される場合には、その単一のメモリが、低精度演算回路５のアクセス領域と、高精度演算回路６のアクセス領域とに分けられていればよい。 The first memory 7 and the second memory 8 may be realized by different memories or may be realized by a single memory. When the first memory 7 and the second memory 8 are realized by a single memory, the single memory is divided into an access area of the low-precision arithmetic circuit 5 and an access area of the high-precision arithmetic circuit 6. It suffices if it is done.

また、第３メモリ９は、低精度演算回路５と高精度演算回路６がデータを授受する際に用いられるデータ授受用メモリである。なお、第３メモリ９が設けられていなくてもよい。すなわち、低精度演算回路５と高精度演算回路６が、第３メモリ９（データ授受用メモリ）を介さずに、通信によってデータを授受してもよい。 Further, the third memory 9 is a data transfer memory used when the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 exchange data. The third memory 9 may not be provided. That is, the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 may exchange data by communication without going through the third memory 9 (data transfer memory).

高精度演算回路６の演算精度（第２の演算精度）は、低精度演算回路５の演算精度（第１の演算精度）よりも高い。なお、演算に用いる数値データの値域の広さ・細かさの尺度（より具体的には、演算回路におけるビット幅および小数点の取り扱い等で定まる数値データの値域の広さ・細かさの尺度）を、「精度」または「演算精度」と呼ぶ。 The calculation accuracy of the high-precision calculation circuit 6 (second calculation accuracy) is higher than the calculation accuracy of the low-precision calculation circuit 5 (first calculation accuracy). In addition, the scale of the range and fineness of the numerical data used for the calculation (more specifically, the scale of the range and fineness of the numerical data determined by the handling of the bit width and the decimal point in the arithmetic circuit). , Called "precision" or "calculation precision".

以下、低精度演算回路５の演算精度が８ビットの整数演算であり、高精度演算回路６の演算精度が３２ビットの浮動小数点演算である場合を例にして説明する。ただし、低精度演算回路５の演算精度および高精度演算回路６の演算精度は、この例に限定されず、高精度演算回路６の演算精度が、低精度演算回路５の演算精度よりも高ければよい。 Hereinafter, a case where the arithmetic precision of the low-precision arithmetic circuit 5 is an 8-bit integer arithmetic and the arithmetic precision of the high-precision arithmetic circuit 6 is a 32-bit floating-point arithmetic will be described as an example. However, the calculation accuracy of the low-precision calculation circuit 5 and the calculation accuracy of the high-precision calculation circuit 6 are not limited to this example, and if the calculation accuracy of the high-precision calculation circuit 6 is higher than the calculation accuracy of the low-precision calculation circuit 5. good.

低精度演算回路５および高精度演算回路６は、例えば、ＧＰＵ（Graphics Processing Unit）に実装される。 The low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 are mounted on, for example, a GPU (Graphics Processing Unit).

図４は、低精度演算回路５の一例を示す概略構成図である。図４に例示するように、低精度演算回路５は、例えば、複数のＭＡＣ（Multiplier-Accumulator）２２１を並列に接続した構成であってもよい。 FIG. 4 is a schematic configuration diagram showing an example of a low-precision arithmetic circuit 5. As illustrated in FIG. 4, the low-precision arithmetic circuit 5 may have, for example, a configuration in which a plurality of MACs (Multiplier-Accumulator) 221s are connected in parallel.

同様に、高精度演算回路６も、図４に例示するように、複数のＭＡＣを並列に接続した構成であってもよい。ただし、低精度演算回路５に設けられるＭＡＣ２２１の演算精度よりも、高精度演算回路６に設けられるＭＡＣの演算精度の方が高い。 Similarly, the high-precision arithmetic circuit 6 may also have a configuration in which a plurality of MACs are connected in parallel, as illustrated in FIG. However, the calculation accuracy of the MAC provided in the high-precision calculation circuit 6 is higher than the calculation accuracy of the MAC 221 provided in the low-precision calculation circuit 5.

ＭＡＣは、低精度演算回路５や高精度演算回路６に設けられる演算器の一例である。 The MAC is an example of an arithmetic unit provided in the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6.

図５は、ＭＡＣ２２１の構成例を示すブロック図である。ＭＡＣ２２１は、乗算器２３４と、加算器２３５と、３つの入力を保持する記憶素子２３１〜２３３と、１つの出力を保持する記憶素子２３６とを備えていてもよい。図５に例示するＭＡＣ２２１は、３つの変数ａ，ｗ，ｘを受け取ると、１つの出力変数ｚ＝ａ＋ｗ＊ｘを計算する演算回路である。本例において、ｚがユニットの出力に相当し、ａ，ｗがパラメタに相当し、ｘがユニットの入力に相当する。ＭＡＣ２２１は、３つの変数ｗ，ｘ，ａをそれぞれ、記憶素子２３１，２３２，２３３を介して受け取る。計算されたｚは、記憶素子２３６を介して外部に送られる。このような構成において、ＭＡＣ２２１の演算精度は、乗算器２３４や加算器２３５のビット幅および小数点の取り扱い（浮動小数点か固定小数点か等）により決定される。例えば、低精度演算回路５に設けられるＭＡＣ２２１では、乗算器２３４および加算器２３５による演算が、低精度演算回路５の演算精度（例えば、８ビットの整数演算）に対応していればよい。 FIG. 5 is a block diagram showing a configuration example of MAC221. The MAC 221 may include a multiplier 234, an adder 235, storage elements 231 to 233 holding three inputs, and a storage element 236 holding one output. The MAC221 illustrated in FIG. 5 is an arithmetic circuit that calculates one output variable z = a + w * x when it receives three variables a, w, and x. In this example, z corresponds to the output of the unit, a and w correspond to the parameters, and x corresponds to the input of the unit. The MAC221 receives the three variables w, x, and a via the storage elements 231, 232, and 233, respectively. The calculated z is sent to the outside via the storage element 236. In such a configuration, the calculation accuracy of the MAC 221 is determined by the bit width of the multiplier 234 and the adder 235 and the handling of the decimal point (floating point or fixed point, etc.). For example, in the MAC 221 provided in the low-precision arithmetic circuit 5, the arithmetic by the multiplier 234 and the adder 235 may correspond to the arithmetic precision of the low-precision arithmetic circuit 5 (for example, 8-bit integer arithmetic).

高精度演算回路６に設けられるＭＡＣも、図５に示す構成と同様に表すことができる。ただし、高精度演算回路６に設けられるＭＡＣでは、乗算器２３４および加算器２３５による演算が、高精度演算回路６の演算精度（例えば、３２ビットの浮動小数点演算）に対応する。 The MAC provided in the high-precision arithmetic circuit 6 can also be represented in the same manner as the configuration shown in FIG. However, in the MAC provided in the high-precision arithmetic circuit 6, the arithmetic by the multiplier 234 and the adder 235 corresponds to the arithmetic precision of the high-precision arithmetic circuit 6 (for example, 32-bit floating-point arithmetic).

なお、低精度演算回路５および高精度演算回路６の構成は、図４に例示する構成に限定されない。図４に示す構成とは異なる構成によって、低精度演算回路５および高精度演算回路６が実現されていてもよい。例えば、低精度演算回路５および高精度演算回路６は、ＭＡＣ以外の演算器を備える構成であってもよい。 The configurations of the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 are not limited to the configurations illustrated in FIG. The low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 may be realized by a configuration different from the configuration shown in FIG. For example, the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 may be configured to include an arithmetic unit other than the MAC.

図６は、ニューラルネットワークを用いた推論処理の処理経過の例を示すフローチャートである。 FIG. 6 is a flowchart showing an example of the processing progress of the inference processing using the neural network.

低精度演算回路５に入力データが与えられると（ステップＳ１１１）、低精度演算回路５は、ニューラルネットワークの第１層から第（ｋ−１）層までの順伝搬を、第１の演算精度で行う（ステップＳ１１２）。すなわち、低精度演算回路５は、第１層から第（ｋ−１）層までの各層に含まれる各ユニットの出力を計算する推論演算を、第１の演算精度で実行する。 When the input data is given to the low-precision arithmetic circuit 5 (step S111), the low-precision arithmetic circuit 5 propagates forward from the first layer to the (k-1) layer of the neural network with the first arithmetic accuracy. (Step S112). That is, the low-precision calculation circuit 5 executes an inference operation for calculating the output of each unit included in each layer from the first layer to the (k-1) layer with the first calculation accuracy.

次に、低精度演算回路５は、ステップＳ１１２の演算結果を第３メモリ９に保存する（ステップＳ１１３）。具体的には、低精度演算回路５は、第（ｋ−１）層の各ユニットからの出力を、第３メモリ９に保存する。 Next, the low-precision arithmetic circuit 5 saves the arithmetic result of step S112 in the third memory 9 (step S113). Specifically, the low-precision arithmetic circuit 5 stores the output from each unit of the (k-1) layer in the third memory 9.

次に、低精度演算回路５は、高精度演算回路６は、ステップＳ１１２の演算結果（第（ｋ−１）層の各ユニットからの出力）を、第３メモリ９から読み出す（ステップＳ１１４）。 Next, in the low-precision calculation circuit 5, the high-precision calculation circuit 6 reads the calculation result (output from each unit of the (k-1) layer) of step S112 from the third memory 9 (step S114).

ステップＳ１１３，Ｓ１１４において、低精度演算回路５および高精度演算回路６は、データ（ステップＳ１１２の演算結果。具体的には、第（ｋ−１）層の各ユニットからの出力。）を、第３メモリ９を介して、授受していることになる。 In steps S113 and S114, the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 output data (calculation result of step S112, specifically, output from each unit of the (k-1) layer). 3 It means that the data is exchanged via the memory 9.

なお、低精度演算回路５および高精度演算回路６は、第３メモリ９を介さずに、通信によって直接、データを授受してもよい。 The low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6 may directly exchange data by communication without going through the third memory 9.

ステップＳ１１４の後、高精度演算回路６は、ニューラルネットワークの第ｋ層から第ｎ層までの順伝搬を、第２の演算精度で行う（ステップＳ１１５）。すなわち、高精度演算回路６は、第ｋ層から第ｎ層までの各層に含まれる各ユニットの出力を計算する推論演算を、第２の計算精度で実行する。 After step S114, the high-precision arithmetic circuit 6 performs forward propagation from the kth layer to the nth layer of the neural network with the second arithmetic accuracy (step S115). That is, the high-precision calculation circuit 6 executes an inference operation for calculating the output of each unit included in each layer from the kth layer to the nth layer with the second calculation accuracy.

なお、図６に示す処理経過において、ニューラルネットワークの入力層を第０層とし、第ｎ層が出力層であるものとする。また、上記の第（ｋ−１）層は、入力層（第０層）よりも後段でかつ出力層（第ｎ層）よりも前段の中間層であるものとする。すなわち、ｋは、０＜ｋ−１＜ｎを満たす整数であるものとする。 In the processing process shown in FIG. 6, it is assumed that the input layer of the neural network is the 0th layer and the nth layer is the output layer. Further, it is assumed that the above-mentioned (k-1) layer is an intermediate layer after the input layer (0th layer) and before the output layer (nth layer). That is, k is an integer satisfying 0 <k-1 <n.

ステップＳ１１５で得られる第ｎ層のユニットの出力が、推論結果を表わしているということができる。 It can be said that the output of the nth layer unit obtained in step S115 represents the inference result.

以下、本発明の実施形態を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施形態では、本発明の演算最適化装置が、ニューラルネットワークにおける各層の演算精度を決定する場合を例にして説明する。また、前述のように、ニューラルネットワークを用いた処理として、与えられた入力データが示す内容を推論する処理を例にして説明する。例えば、画像データが与えられ、その画像データと、ニューラルネットワークとによって、その画像データが表わしている物（画像に写っている物）を推論する処理を例にして説明する。ただし、本発明は、ニューラルネットワークを用いた他の処理にも適用可能である。 In the present embodiment, the case where the calculation optimization device of the present invention determines the calculation accuracy of each layer in the neural network will be described as an example. Further, as described above, as a process using a neural network, a process of inferring the content indicated by the given input data will be described as an example. For example, a process in which image data is given and an object represented by the image data (an object reflected in the image) is inferred by the image data and a neural network will be described as an example. However, the present invention can also be applied to other processes using a neural network.

図７は、本発明の演算最適化装置の構成例を示すブロック図である。本発明の演算最適化装置は、判別モデル記憶部２１と、データ記憶部２２と、説明変数値取得部２３と、目的関数記憶部２４と、目的関数計算部２５と、計算結果記憶部２６と、適用パターン決定部２７とを備える。 FIG. 7 is a block diagram showing a configuration example of the calculation optimization device of the present invention. The calculation optimization device of the present invention includes a discrimination model storage unit 21, a data storage unit 22, an explanatory variable value acquisition unit 23, an objective function storage unit 24, an objective function calculation unit 25, and a calculation result storage unit 26. , The application pattern determination unit 27 is provided.

判別モデル記憶部２１は、判別モデルとして、ニューラルネットワークを記憶する記憶装置である。 The discrimination model storage unit 21 is a storage device that stores a neural network as a discrimination model.

データ記憶部２２は、ニューラルネットワークを用いた推論処理の対象なるデータ（例えば、画像に写っている物を推論対象とする画像データ）を記憶する記憶装置である。データ記憶部２２は、推論対象となるデータを複数個（Ｎ個とする。）、記憶し、個々のデータに対応する推論結果の正解データも記憶する。例えば、データ記憶部２２は、Ｎ個の画像データと、個々の画像データに対応する正解データ（実際に画像に写っている物を示すデータ）とを記憶する。 The data storage unit 22 is a storage device that stores data to be inferred by using a neural network (for example, image data in which an object in an image is inferred). The data storage unit 22 stores a plurality of (N) data to be inferred, and also stores correct answer data of the inference result corresponding to each data. For example, the data storage unit 22 stores N image data and correct answer data (data indicating an object actually shown in the image) corresponding to each image data.

目的関数記憶部２４は、所定の説明変数（以下、単に説明変数と記す。）で表される目的関数を記憶する。目的関数を表わす式は、予め定められる。本実施形態では、少なくとも、ニューラルネットワークを用いた推論処理における「推論精度」と「処理速度」とを、上記の説明変数として用いるものとする。以下の説明では、説明を簡単にするために、まず、「推論精度」と「処理速度」とを説明変数とする場合について説明する。目的関数が、「推論精度」および「処理速度」に加え、さらに他の説明変数によって表されてもよいが、この場合については、後述する。 The objective function storage unit 24 stores an objective function represented by a predetermined explanatory variable (hereinafter, simply referred to as an explanatory variable). The formula representing the objective function is predetermined. In this embodiment, at least "inference accuracy" and "processing speed" in inference processing using a neural network are used as the above explanatory variables. In the following description, in order to simplify the explanation, first, a case where "inference accuracy" and "processing speed" are used as explanatory variables will be described. The objective function may be represented by other explanatory variables in addition to "inference accuracy" and "processing speed", but this case will be described later.

ここで、「推論精度」とは、推論処理の演算結果（換言すれば、推論結果）の正確さである。 Here, the "inference accuracy" is the accuracy of the operation result (in other words, the inference result) of the inference process.

目的関数記憶部２４は、目的関数として、例えば、以下の式（３）で表される関数を記憶する。 The objective function storage unit 24 stores, for example, a function represented by the following equation (3) as an objective function.

目的関数＝「推論精度」×α＋「処理速度」×β ・・・（３） Objective function = "inference accuracy" x α + "processing speed" x β ... (3)

「推論精度」および「処理速度」は、説明変数である。αは、「推論精度」の係数であり、βは、「処理速度」の係数である。αおよびβの値は、予め決定されている。本実施形態では、αおよびβがいずれも、正の値として定められている場合を例にして説明する。 "Inference accuracy" and "processing speed" are explanatory variables. α is a coefficient of “inference accuracy” and β is a coefficient of “processing speed”. The values of α and β are predetermined. In the present embodiment, the case where both α and β are defined as positive values will be described as an example.

説明変数値取得部２３は、目的関数記憶部２４に記憶されている目的関数において用いられている説明変数の値を取得する。本例では、ニューラルネットワークを用いた推論処理における「推論精度」および「処理速度」の値を取得する。 The explanatory variable value acquisition unit 23 acquires the value of the explanatory variable used in the objective function stored in the objective function storage unit 24. In this example, the values of "inference accuracy" and "processing speed" in the inference processing using the neural network are acquired.

また、説明変数値取得部２３は、予め、複数種類の適用パターンを記憶している。適用パターンとは、判別モデル（本実施形態では、ニューラルネットワーク）を用いた演算で、低精度演算回路５（図３参照）をニューラルネットワークのどの層に適用し、高精度演算回路６（図３参照）をニューラルネットワークのどの層に適用するのかを定めた情報である。なお、本実施形態では、第１層以降に低精度演算回路５を適用し、いずれかの層と層の間で、層に適用する回路を低精度演算回路５から高精度演算回路６に切り替えてもよいものとする。ただし、説明を簡単にするために、その切り替えは最大で１回である場合を例にして説明する。また、第１層以降の全ての層に高精度演算回路６を適用してもよいものとする。 Further, the explanatory variable value acquisition unit 23 stores a plurality of types of application patterns in advance. The application pattern is an operation using a discrimination model (in this embodiment, a neural network), in which the low-precision arithmetic circuit 5 (see FIG. 3) is applied to which layer of the neural network, and the high-precision arithmetic circuit 6 (FIG. 3). (See) is information that defines which layer of the neural network to apply. In this embodiment, the low-precision arithmetic circuit 5 is applied to the first and subsequent layers, and the circuit applied to the layer is switched from the low-precision arithmetic circuit 5 to the high-precision arithmetic circuit 6 between any layers. May be acceptable. However, for the sake of simplicity, the case where the switching is performed once at the maximum will be described as an example. Further, the high-precision arithmetic circuit 6 may be applied to all the layers after the first layer.

従って、本実施形態では、第１層から第ｐ層までに低精度演算回路５を適用し、第ｐ＋１層から第ｑ層までに高精度演算回路６を適用し、第ｑ＋１層から第ｎ層（出力層）までに再び低精度演算回路５を適用するようなケースは、適用パターンから除外する。ただし、本発明において、このようなケースを適用パターンに含めてもよい。 Therefore, in the present embodiment, the low-precision arithmetic circuit 5 is applied from the first layer to the p-th layer, the high-precision arithmetic circuit 6 is applied from the p + 1 layer to the q-th layer, and the q + 1 layer to the n-th layer. The case where the low-precision arithmetic circuit 5 is applied again by (output layer) is excluded from the application pattern. However, in the present invention, such a case may be included in the application pattern.

図８は、適用パターンの例を示す模式図である。図８に示す各矩形は、ニューラルネットワークの各層を表わしている。 FIG. 8 is a schematic diagram showing an example of an application pattern. Each rectangle shown in FIG. 8 represents each layer of the neural network.

図８に示す適用パターン１は、第１層から第ｎ層までの全ての層に低精度演算回路５を適用することを定めている。換言すれば、適用パターン１は、第１層から第ｎ層までの全ての層の演算を低精度演算回路５が実行することを定めている。 The application pattern 1 shown in FIG. 8 defines that the low-precision arithmetic circuit 5 is applied to all layers from the first layer to the nth layer. In other words, the application pattern 1 defines that the low-precision arithmetic circuit 5 executes the arithmetic of all the layers from the first layer to the nth layer.

図８に示す適用パターン２は、第１層から第ｎ−１層までの各層に低精度演算回路５を適用し、第ｎ層に高精度演算回路６を適用することを定めている。換言すれば、適用パターン２は、第１層から第ｎ−１層までの各層の演算を低精度演算回路５が実行し、第ｎ層の演算を高精度演算回路６が実行することを定めている。 The application pattern 2 shown in FIG. 8 defines that the low-precision arithmetic circuit 5 is applied to each layer from the first layer to the n-1th layer, and the high-precision arithmetic circuit 6 is applied to the nth layer. In other words, the application pattern 2 defines that the low-precision arithmetic circuit 5 executes the arithmetic of each layer from the first layer to the n-1th layer, and the high-precision arithmetic circuit 6 executes the arithmetic of the nth layer. ing.

図８に示す適用パターン３は、第１層から第ｎ−２層までの各層に低精度演算回路５を適用し、第ｎ−１層および第ｎ層に高精度演算回路６を適用することを定めている。 In the application pattern 3 shown in FIG. 8, the low-precision arithmetic circuit 5 is applied to each layer from the first layer to the n-2th layer, and the high-precision arithmetic circuit 6 is applied to the n-1th layer and the nth layer. Is defined.

図８に示す適用パターンＸ−１は、第１層に低精度演算回路５を適用し、第２層から第ｎ層までの各層に高精度演算回路６を適用することを定めている。換言すれば、適用パターンＸ−１は、第１層の演算を低精度演算回路５が実行し、第２層から第ｎ層までの各層の演算を高精度演算回路６が実行することを定めている。 The application pattern X-1 shown in FIG. 8 defines that the low-precision arithmetic circuit 5 is applied to the first layer and the high-precision arithmetic circuit 6 is applied to each layer from the second layer to the nth layer. In other words, the application pattern X-1 defines that the low-precision arithmetic circuit 5 executes the arithmetic of the first layer, and the high-precision arithmetic circuit 6 executes the arithmetic of each layer from the second layer to the nth layer. ing.

図８に示す適用パターンＸは、第１層から第ｎ層までの全ての層に高精度演算回路６を適用することを定めている。換言すれば、適用パターンＸは、第１層から第ｎ層までの全ての層の演算を高精度演算回路６が実行することを定めている。 The application pattern X shown in FIG. 8 defines that the high-precision arithmetic circuit 6 is applied to all the layers from the first layer to the nth layer. In other words, the application pattern X defines that the high-precision arithmetic circuit 6 executes the arithmetic of all the layers from the first layer to the nth layer.

図８に例示するような種々の適用パターンは、予め決定されていて、説明変数値取得部２３は、個々の適用パターンを予め記憶している。そして、説明変数値取得部２３は、個々の適用パターン毎に、説明変数「推論精度」の値、および、説明変数「処理速度」の値を取得する。 Various application patterns as illustrated in FIG. 8 are predetermined, and the explanatory variable value acquisition unit 23 stores each application pattern in advance. Then, the explanatory variable value acquisition unit 23 acquires the value of the explanatory variable “inference accuracy” and the value of the explanatory variable “processing speed” for each application pattern.

適用パターンが異なれば、説明変数（本例では、「推論精度」および「処理速度」）の値も異なる。 Different application patterns have different values for the explanatory variables (in this example, "inference accuracy" and "processing speed").

説明変数値取得部２３が説明変数（本例では、「推論精度」および「処理速度」）の値を取得する態様として、２つの態様がある。第１の態様は、説明変数値取得部２３が、実際に存在する処理装置１８（図３参照）に推論処理を実行させ、実測により「推論精度」および「処理速度」の値を取得する態様である。第２の態様は、説明変数値取得部２３がシミュレーションによって「推論精度」および「処理速度」の値を取得する態様である。すなわち、第１の態様は、説明変数の値を実測により取得する態様であり、第２の態様は、説明変数の値をシミュレーションにより取得する態様である。 There are two modes in which the explanatory variable value acquisition unit 23 acquires the values of the explanatory variables (in this example, “inference accuracy” and “processing speed”). In the first aspect, the explanatory variable value acquisition unit 23 causes an actually existing processing device 18 (see FIG. 3) to execute inference processing, and acquires the values of "inference accuracy" and "processing speed" by actual measurement. Is. The second aspect is an aspect in which the explanatory variable value acquisition unit 23 acquires the values of "inference accuracy" and "processing speed" by simulation. That is, the first aspect is an aspect in which the value of the explanatory variable is acquired by actual measurement, and the second aspect is an embodiment in which the value of the explanatory variable is acquired by simulation.

説明変数値取得部２３が実測により説明変数の値を取得する場合、演算最適化装置は、図９に示すように、処理装置１８を備えていてもよい。処理装置１８の構成や動作は、図３等を参照して既に説明しているので、ここでは説明を省略する。 When the explanatory variable value acquisition unit 23 acquires the value of the explanatory variable by actual measurement, the arithmetic optimization device may include a processing device 18 as shown in FIG. Since the configuration and operation of the processing device 18 have already been described with reference to FIG. 3 and the like, the description thereof will be omitted here.

また、処理装置１８がまだ設計段階であり、まだ実際に処理装置１８が存在していない場合もあり得る。その場合には、図１０に示すように、演算最適化装置は、設計情報記憶部１９を備えていてもよい。設計情報記憶部１９は、処理装置１８の設計情報を記憶する記憶装置である。設計情報の例として、処理装置１８内の低精度演算回路５に設けられる演算器（例えば、ＭＡＣ）の数や、処理装置１８内の高精度演算回路６に設けられる演算器（例えば、ＭＡＣ）の数等が挙げられる。ただし、設計情報は、これらの例に限定されない。説明変数値取得部２３は、設計情報記憶部１９に記憶された設計情報に基づいて、説明変数の値をシミュレーションにより取得すればよい。 Further, it is possible that the processing device 18 is still in the design stage and the processing device 18 does not actually exist yet. In that case, as shown in FIG. 10, the calculation optimization device may include a design information storage unit 19. The design information storage unit 19 is a storage device that stores the design information of the processing device 18. As an example of design information, the number of arithmetic units (for example, MAC) provided in the low-precision arithmetic circuit 5 in the processing device 18 and the arithmetic units (for example, MAC) provided in the high-precision arithmetic circuit 6 in the processing apparatus 18. The number of However, the design information is not limited to these examples. The explanatory variable value acquisition unit 23 may acquire the value of the explanatory variable by simulation based on the design information stored in the design information storage unit 19.

まず、説明変数値取得部２３が実測により説明変数の値を取得する場合の動作について説明する。ここでは、図９に示すように、演算最適化装置が、処理装置１８を備えている場合を例にして説明する。 First, the operation when the explanatory variable value acquisition unit 23 acquires the value of the explanatory variable by actual measurement will be described. Here, as shown in FIG. 9, a case where the arithmetic optimization apparatus includes the processing apparatus 18 will be described as an example.

説明変数値取得部２３が「処理速度」の値を実測によって取得する動作を説明する。説明変数値取得部２３は、処理装置１８に対して適用パターンを指定する。そして、説明変数値取得部２３は、判別モデル記憶部２１に記憶されているニューラルネットワークと、データ記憶部２２に記憶されているデータとを、処理装置１８に入力し、処理装置１８に推論処理を実行させ、処理装置１８がそのデータに対する推論処理を行う際の処理速度を計測すればよい。この結果、説明変数値取得部２３は、処理速度の値を取得する。また、このとき、処理装置１８は、指定された適用パターンに応じた動作で、推論処理を実行する。 The operation of the explanatory variable value acquisition unit 23 to acquire the value of the "processing speed" by actual measurement will be described. The explanatory variable value acquisition unit 23 specifies an application pattern for the processing device 18. Then, the explanatory variable value acquisition unit 23 inputs the neural network stored in the discrimination model storage unit 21 and the data stored in the data storage unit 22 into the processing device 18, and infer processing is performed in the processing device 18. Is executed, and the processing speed when the processing device 18 performs inference processing on the data may be measured. As a result, the explanatory variable value acquisition unit 23 acquires the processing speed value. Further, at this time, the processing device 18 executes the inference processing by the operation according to the designated application pattern.

処理速度は、例えば、１つのデータに対する推論処理時間（換言すれば、１秒当たりに処理可能なデータ数の逆数）である。あるいは、説明変数値取得部２３は、処理速度の値として、例えば、レイテンシまたはスループットの値を取得してもよい。この点は、シミュレーションによって処理速度の値を取得する場合においても同様である。 The processing speed is, for example, the inference processing time for one data (in other words, the reciprocal of the number of data that can be processed per second). Alternatively, the explanatory variable value acquisition unit 23 may acquire, for example, a latency or throughput value as the processing speed value. This point is the same even when the processing speed value is acquired by simulation.

なお、説明変数値取得部２３は、１つのデータに関して、処理装置１８に推論処理を実行させることで、処理速度の値を取得することができる。 The explanatory variable value acquisition unit 23 can acquire the value of the processing speed by causing the processing device 18 to execute the inference processing for one data.

説明変数値取得部２３は、指定する適用パターンを順次、変更し、適用パターン毎に、実測によって処理速度の値を取得する。 The explanatory variable value acquisition unit 23 sequentially changes the designated application pattern, and acquires the processing speed value by actual measurement for each application pattern.

説明変数値取得部２３が「推論精度」の値を実測によって取得する動作を説明する。推論精度の値を実測によって取得する場合、説明変数値取得部２３は、例えば、以下のように動作すればよい。説明変数値取得部２３は、処理装置１８に対して適用パターンを指定する。そして、説明変数値取得部２３は、判別モデル記憶部２１に記憶されているニューラルネットワークを処理装置１８に入力する。また、説明変数値取得部２３は、データ記憶部２２に記憶されている複数個（Ｎ個とする。）のデータをそれぞれ、処理装置１８に入力し、個々のデータ毎に、処理装置１８に推論結果を導出させる。すなわち、説明変数値取得部２３は、処理装置１８にＮ回の推論処理を実行させる。このとき、処理装置１８は、指定された適用パターンに応じた動作で、推論処理を実行する。この結果、Ｎ個の推論結果が得られる。説明変数値取得部２３は、データ記憶部２２に記憶されている正解データと、それぞれの推論結果とを照合し、Ｎ回の推論処理回数に対する、正解データが得られた推論処理回数の割合を算出し、さらにその割合の逆数を算出する。その割合の逆数が、推論精度の値に該当する。なお、説明変数値取得部２３が推論精度の値を実測によって取得する動作は、上記の例に限定されない。 The operation of the explanatory variable value acquisition unit 23 to acquire the value of "inference accuracy" by actual measurement will be described. When the value of the inference accuracy is acquired by actual measurement, the explanatory variable value acquisition unit 23 may operate as follows, for example. The explanatory variable value acquisition unit 23 specifies an application pattern for the processing device 18. Then, the explanatory variable value acquisition unit 23 inputs the neural network stored in the discrimination model storage unit 21 to the processing device 18. Further, the explanatory variable value acquisition unit 23 inputs a plurality of (N) data stored in the data storage unit 22 to the processing device 18, and inputs each data to the processing device 18. Have the inference result derived. That is, the explanatory variable value acquisition unit 23 causes the processing device 18 to execute the inference processing N times. At this time, the processing device 18 executes the inference processing by the operation according to the designated application pattern. As a result, N inference results are obtained. The explanatory variable value acquisition unit 23 collates the correct answer data stored in the data storage unit 22 with each inference result, and determines the ratio of the number of inference processes for which the correct answer data is obtained to the number of inference processes N times. Calculate and then calculate the inverse of the ratio. The reciprocal of the ratio corresponds to the value of inference accuracy. The operation of the explanatory variable value acquisition unit 23 to acquire the inference accuracy value by actual measurement is not limited to the above example.

説明変数値取得部２３は、指定する適用パターンを順次、変更し、適用パターン毎に、実測によって、推論精度の値を取得する。 The explanatory variable value acquisition unit 23 sequentially changes the designated application pattern, and acquires the inference accuracy value by actual measurement for each application pattern.

次に、説明変数値取得部２３がシミュレーションにより説明変数の値を取得する場合の動作について説明する。ここでは、図１０に示すように、演算最適化装置が、設計情報記憶部１９を備えている場合を例にして説明する。本例では、処理装置１８内の低精度演算回路５（図３参照）に設けられる演算器（例えば、ＭＡＣ）の数や、処理装置１８内の高精度演算回路６（図３参照）に設けられる演算器（例えば、ＭＡＣ）の数が、設計情報として設計情報記憶部１９に記憶されているものとする。 Next, the operation when the explanatory variable value acquisition unit 23 acquires the value of the explanatory variable by simulation will be described. Here, as shown in FIG. 10, a case where the calculation optimization device includes the design information storage unit 19 will be described as an example. In this example, the number of arithmetic units (for example, MAC) provided in the low-precision arithmetic circuit 5 (see FIG. 3) in the processing device 18 and the high-precision arithmetic circuit 6 (see FIG. 3) in the processing apparatus 18 are provided. It is assumed that the number of arithmetic units (for example, MAC) to be used is stored in the design information storage unit 19 as design information.

説明変数値取得部２３が「処理速度」の値をシミュレーションによって取得する動作を説明する。本例では、説明変数値取得部２３は、例えば、「処理速度」の値を求めるための関数（以下、処理速度関数と記す。）を予め保持する。処理速度関数は、予め定められている。処理速度関数は、例えば、低精度演算回路５に設けられる演算器の数、高精度演算回路６に設けられる演算器の数、低精度演算回路５が第１メモリ７にアクセスする場合のメモリアクセス量（メモリアクセス回数）、高精度演算回路６が第２メモリ８にアクセスする場合のメモリアクセス量（メモリアクセス回数）、および、低精度演算回路５と高精度演算回路６との間で授受されるデータ量（以下、データ授受量と記す場合がある。）を変数とする。以下、処理速度関数が、上記の各変数で表される場合を例にして説明する。ただし、処理速度関数で用いられる変数は、上記の例に限定されない。 The operation of the explanatory variable value acquisition unit 23 acquiring the value of the "processing speed" by simulation will be described. In this example, the explanatory variable value acquisition unit 23 holds, for example, a function for obtaining the value of the “processing speed” (hereinafter, referred to as a processing speed function) in advance. The processing speed function is predetermined. The processing speed function is, for example, the number of arithmetic units provided in the low-precision arithmetic circuit 5, the number of arithmetic units provided in the high-precision arithmetic circuit 6, and the memory access when the low-precision arithmetic circuit 5 accesses the first memory 7. Amount (number of memory accesses), amount of memory access when the high-precision arithmetic circuit 6 accesses the second memory 8 (number of memory accesses), and exchanged between the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6. The amount of data (hereinafter, may be referred to as the amount of data exchanged) is used as a variable. Hereinafter, the case where the processing speed function is represented by each of the above variables will be described as an example. However, the variables used in the processing speed function are not limited to the above example.

なお、データ授受量は、例えば、授受されるデータの個数と、データ１個当たりのバイト数との積によって表される。この場合の単位は、例えば、バイトである。 The amount of data exchanged is represented by, for example, the product of the number of data exchanged and the number of bytes per data. The unit in this case is, for example, bytes.

説明変数値取得部２３は、上記の各変数の値を処理速度関数に代入することによって、処理速度の値を計算すればよい。ここで、変数のうち、低精度演算回路５に設けられる演算器の数、高精度演算回路６に設けられる演算器の数は、設計情報で定められた値を用いればよい。低精度演算回路５が第１メモリ７にアクセスする場合のメモリアクセス量（メモリアクセス回数）、高精度演算回路６が第２メモリ８にアクセスする場合のメモリアクセス量（メモリアクセス回数）、および、低精度演算回路５と高精度演算回路６との間のデータ授受量に関しては、説明変数値取得部２３が適用パターンを選択し、設計情報記憶部１９に記憶された設計情報から定まる処理装置１８の動作であって選択した適用パターンに応じた動作を模擬することによって、導出すればよい。説明変数値取得部２３は、上記の演算器の数や、選択した適用パターンに基づいて導出したメモリアクセス量、低精度演算回路５と高精度演算回路６との間のデータ授受量を、処理速度関数に代入することによって、処理速度の値を計算すればよい。この結果、説明変数値取得部２３は、シミュレーションに基づいて、処理速度の値を取得することができる。 The explanatory variable value acquisition unit 23 may calculate the value of the processing speed by substituting the value of each of the above variables into the processing speed function. Here, among the variables, the values defined in the design information may be used for the number of arithmetic units provided in the low-precision arithmetic circuit 5 and the number of arithmetic units provided in the high-precision arithmetic circuit 6. The amount of memory access (number of memory accesses) when the low-precision calculation circuit 5 accesses the first memory 7, the amount of memory access (number of memory accesses) when the high-precision calculation circuit 6 accesses the second memory 8, and Regarding the amount of data exchanged between the low-precision calculation circuit 5 and the high-precision calculation circuit 6, the explanatory variable value acquisition unit 23 selects an application pattern, and the processing device 18 is determined from the design information stored in the design information storage unit 19. It may be derived by simulating the operation according to the selected application pattern. The explanatory variable value acquisition unit 23 processes the number of the above arithmetic units, the amount of memory access derived based on the selected application pattern, and the amount of data exchanged between the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6. The value of the processing speed may be calculated by substituting it into the speed function. As a result, the explanatory variable value acquisition unit 23 can acquire the processing speed value based on the simulation.

説明変数値取得部２３は、選択する適用パターンを順次、変更し、適用パターン毎に、シミュレーションに基づく処理速度の値を計算する。 The explanatory variable value acquisition unit 23 sequentially changes the application pattern to be selected, and calculates the processing speed value based on the simulation for each application pattern.

説明変数値取得部２３が「推論精度」の値をシミュレーションによって取得する動作を説明する。推論精度の値をシミュレーションによって取得する場合、説明変数値取得部２３は、例えば、以下のように動作すればよい。説明変数値取得部２３は、適用パターンを選択する。そして、説明変数値取得部２３は、データ記憶部２２に記憶されている複数個（本例では、Ｎ個）のデータ毎に、設計情報から定まる処理装置１８の動作であって選択した適用パターンに応じた動作を模擬することによって、データに対する推論結果を導出する。この結果、Ｎ個の推論結果が得られる。説明変数値取得部２３は、推論結果の数（Ｎ個）に対する、正解データと一致する推論結果の数の割合を算出し、さらにその割合の逆数を算出する。その割合の逆数が、推論精度の値に該当する。なお、説明変数値取得部２３が推論精度の値をシミュレーションによって取得する動作は、上記の例に限定されない。 The operation of the explanatory variable value acquisition unit 23 acquiring the value of "inference accuracy" by simulation will be described. When the value of the inference accuracy is acquired by simulation, the explanatory variable value acquisition unit 23 may operate as follows, for example. The explanatory variable value acquisition unit 23 selects an application pattern. Then, the explanatory variable value acquisition unit 23 is an application pattern selected by the operation of the processing device 18 determined from the design information for each of a plurality of (N in this example) data stored in the data storage unit 22. The inference result for the data is derived by simulating the operation according to. As a result, N inference results are obtained. The explanatory variable value acquisition unit 23 calculates the ratio of the number of inference results that match the correct answer data to the number of inference results (N), and further calculates the reciprocal of the ratio. The reciprocal of the ratio corresponds to the value of inference accuracy. The operation of the explanatory variable value acquisition unit 23 to acquire the inference accuracy value by simulation is not limited to the above example.

説明変数値取得部２３は、指定する適用パターンを順次、変更し、適用パターン毎に、シミュレーションによって、推論精度の値を取得する。 The explanatory variable value acquisition unit 23 sequentially changes the designated application pattern, and acquires the inference accuracy value by simulation for each application pattern.

本発明では、説明変数値取得部２３は、説明変数の値を、実測によって取得してもよく、あるいは、シミュレーションによって取得してもよい。いずれの場合であっても、説明変数値取得部２３は、適用パターン毎に説明変数（本例では、「推論精度」および「処理速度」）の値を取得する。 In the present invention, the explanatory variable value acquisition unit 23 may acquire the value of the explanatory variable by actual measurement or by simulation. In any case, the explanatory variable value acquisition unit 23 acquires the values of the explanatory variables (in this example, “inference accuracy” and “processing speed”) for each application pattern.

なお、上記の例のように、１つのデータに対する推論処理時間（換言すれば、１秒当たりに処理可能なデータ数の逆数）で処理速度を表わす場合、処理速度を示す値が小さい方が好ましい。同様に、Ｎ回の推論処理回数に対する、正解データが得られた推論処理回数の割合の逆数（換言すれば、推論結果の数（Ｎ個）に対する、正解データと一致する推論結果の数の割合の逆数）によって推論精度を表わす場合にも、推論精度を示す値が小さいほど好ましい。 When the processing speed is expressed by the inference processing time for one data (in other words, the reciprocal of the number of data that can be processed per second) as in the above example, it is preferable that the value indicating the processing speed is small. .. Similarly, the inverse of the ratio of the number of inference processes for which correct answer data was obtained to the number of inference processes of N times (in other words, the ratio of the number of inference results that match the correct answer data to the number of inference results (N)). Even when the inference accuracy is expressed by the inverse number of), the smaller the value indicating the inference accuracy, the more preferable.

目的関数計算部２５は、説明変数値取得部２３が適用パターン毎に算出した説明変数（本例では、「推論精度」および「処理速度」）の値を、目的関数を表わす式（本例では、前述の式（３））に代入することによって、目的関数の値を計算する。目的関数計算部２５は、目的関数の値を計算する処理を、適用パターン毎に行う。 The objective function calculation unit 25 uses the values of the explanatory variables (“inference accuracy” and “processing speed” in this example) calculated by the explanatory variable value acquisition unit 23 for each application pattern as an expression representing the objective function (in this example). , The value of the objective function is calculated by substituting into the above equation (3)). The objective function calculation unit 25 performs a process of calculating the value of the objective function for each application pattern.

計算結果記憶部２６は、適用パターン毎に計算された目的関数の値を記憶する記憶装置である。目的関数計算部２５は、適用パターン毎に目的関数の値を計算し、適用パターン毎の目的関数の値を、計算結果記憶部２６に記憶させる。 The calculation result storage unit 26 is a storage device that stores the value of the objective function calculated for each application pattern. The objective function calculation unit 25 calculates the value of the objective function for each application pattern, and stores the value of the objective function for each application pattern in the calculation result storage unit 26.

前述のように、本例では、処理速度を示す値が小さい方が好ましく、同様に、推論精度を示す値が小さいほど好ましい。従って、式(３）に例示するように表される目的関数の値が小さいほど好ましい。従って、目的関数の値が最小となる適用パターンが最も好ましい適用パターン（すなわち、最適な適用パターン）であると言える。 As described above, in this example, it is preferable that the value indicating the processing speed is small, and similarly, it is preferable that the value indicating the inference accuracy is small. Therefore, it is preferable that the value of the objective function represented by the equation (3) is smaller. Therefore, it can be said that the application pattern in which the value of the objective function is the minimum is the most preferable application pattern (that is, the optimum application pattern).

適用パターン決定部２７は、計算結果記憶部２６に記憶された適用パターン毎の目的関数の値を参照し、目的関数の値が最小となる適用パターンを決定する。前述のように、目的関数の値が最小となる適用パターンは、最適な適用パターンである。 The application pattern determination unit 27 refers to the value of the objective function for each application pattern stored in the calculation result storage unit 26, and determines the application pattern in which the value of the objective function is the minimum. As described above, the application pattern that minimizes the value of the objective function is the optimum application pattern.

ここで、適用パターンは、判別モデル（本実施形態では、ニューラルネットワーク）を用いた演算で、低精度演算回路５（図３参照）をニューラルネットワークのどの層に適用し、高精度演算回路６（図３参照）をニューラルネットワークのどの層に適用するのかを定めた情報である。従って、適用パターンが決定されることで、ニューラルネットワークを用いた演算を最適化することができる。そして、ニューラルネットワークの個々の層に、低精度演算回路５および高精度演算回路６のどちらを適用するのかが決定されるので、ニューラルネットワークの各層の演算精度を、第１の演算精度（例えば、低精度演算回路５による８ビットの整数演算）とするのか、第２の演算精度（例えば、高精度演算回路６による３２ビットの浮動小数点演算）とするのかを決定することができる。 Here, the application pattern is an operation using a discrimination model (in this embodiment, a neural network), in which the low-precision arithmetic circuit 5 (see FIG. 3) is applied to which layer of the neural network, and the high-precision arithmetic circuit 6 (in this embodiment) (neural network). This is information that defines to which layer of the neural network (see FIG. 3) is applied. Therefore, by determining the application pattern, it is possible to optimize the calculation using the neural network. Then, since it is determined whether the low-precision arithmetic circuit 5 or the high-precision arithmetic circuit 6 is applied to each layer of the neural network, the arithmetic accuracy of each layer of the neural network is determined by the first arithmetic accuracy (for example, for example). It is possible to determine whether to perform 8-bit integer arithmetic by the low-precision arithmetic circuit 5 or second arithmetic accuracy (for example, 32-bit floating-point arithmetic by the high-precision arithmetic circuit 6).

説明変数値取得部２３、目的関数計算部２５および適用パターン決定部２７は、例えば、演算最適化プログラムに従って動作するコンピュータのＣＰＵ（Central Processing Unit ）によって実現される。この場合、ＣＰＵが、プログラム記憶装置等のプログラム記録媒体から演算最適化プログラムを読み込む。そして、ＣＰＵは、その演算最適化プログラムに従って、説明変数値取得部２３、目的関数計算部２５および適用パターン決定部２７として動作すればよい。 The explanatory variable value acquisition unit 23, the objective function calculation unit 25, and the application pattern determination unit 27 are realized by, for example, a CPU (Central Processing Unit) of a computer that operates according to a calculation optimization program. In this case, the CPU reads the calculation optimization program from the program recording medium such as the program storage device. Then, the CPU may operate as the explanatory variable value acquisition unit 23, the objective function calculation unit 25, and the application pattern determination unit 27 according to the calculation optimization program.

次に、本発明の実施形態の処理経過の例を説明する。図１１は、本発明の実施形態の演算最適化装置の処理経過の例を示すフローチャートである。なお、ここでは、目的関数記憶部２４が前述の式（３）で表される目的関数を記憶し、説明変数値取得部２３が説明変数の値として、「推論精度」の値および「処理速度」の値を取得する場合を例にして説明する。また、既に説明した事項については、適宜、説明を省略する。 Next, an example of the processing process of the embodiment of the present invention will be described. FIG. 11 is a flowchart showing an example of the processing progress of the calculation optimization device according to the embodiment of the present invention. Here, the objective function storage unit 24 stores the objective function represented by the above equation (3), and the explanatory variable value acquisition unit 23 uses the value of "inference accuracy" and the "processing speed" as the values of the explanatory variables. The case of acquiring the value of "" will be described as an example. In addition, the matters already explained will be omitted as appropriate.

まず、説明変数値取得部２３は、予め記憶している複数の適用パターンの中から、未選択の適用パターンを１つ選択する（ステップＳ１）。 First, the explanatory variable value acquisition unit 23 selects one unselected application pattern from the plurality of application patterns stored in advance (step S1).

次に、説明変数値取得部２３は、ステップＳ１で選択した適用パターンのもとでの推論処理の動作における説明変数の値を取得する（ステップＳ２）。本例では、説明変数値取得部２３は、ステップＳ１で選択した適用パターンのもとでの動作における、「推論精度」の値、および、「処理速度」の値をそれぞれ取得する。 Next, the explanatory variable value acquisition unit 23 acquires the value of the explanatory variable in the operation of the inference processing under the application pattern selected in step S1 (step S2). In this example, the explanatory variable value acquisition unit 23 acquires the value of “inference accuracy” and the value of “processing speed” in the operation under the application pattern selected in step S1, respectively.

説明変数値取得部２３は、実測によって説明変数の値を取得してもよく、あるいは、シミュレーションによって説明変数の値を取得してもよい。実測により「推論精度」の値や「処理速度」の値を取得する動作や、シミュレーションにより「推論精度」の値や「処理速度」の値を取得する動作については、既に説明したので、ここでは説明を省略する。 The explanatory variable value acquisition unit 23 may acquire the value of the explanatory variable by actual measurement, or may acquire the value of the explanatory variable by simulation. The operation of acquiring the "inference accuracy" value and "processing speed" value by actual measurement and the operation of acquiring the "inference accuracy" value and "processing speed" value by simulation have already been explained, so here. The explanation is omitted.

ステップＳ２の後、目的関数計算部２５は、選択された適用パターンに関してステップＳ２で取得された説明変数の値（本例では、「推論精度」の値、および、「処理速度」の値）を、目的関数を表わす式（本例では、前述の式（３））に代入することによって、目的関数の値を計算する（ステップＳ３）。そして、目的関数計算部２５は、ステップＳ１で選択された適用パターンと、目的関数の値とを対応付けて、計算結果記憶部２６に記憶させる。 After step S2, the objective function calculation unit 25 determines the value of the explanatory variable (in this example, the value of “inference accuracy” and the value of “processing speed”) acquired in step S2 with respect to the selected application pattern. , The value of the objective function is calculated by substituting it into the equation representing the objective function (in this example, the above equation (3)) (step S3). Then, the objective function calculation unit 25 associates the application pattern selected in step S1 with the value of the objective function and stores it in the calculation result storage unit 26.

次に、説明変数値取得部２３は、予め記憶している全ての適用パターンがステップＳ１で選択済みになっているか否かを判定する（ステップＳ４）。 Next, the explanatory variable value acquisition unit 23 determines whether or not all the application patterns stored in advance have been selected in step S1 (step S4).

未選択の適用パターンが存在する場合には（ステップＳ４のＮｏ）、演算最適化装置は、ステップＳ１以降の処理を繰り返す。 If there is an unselected application pattern (No in step S4), the arithmetic optimization device repeats the processes after step S1.

全ての適用パターンが選択済みとなっている場合には（ステップＳ４のＹｅｓ）、適用パターン決定部２７は、計算結果記憶部２６に記憶された適用パターン毎の目的関数の値を参照し、目的関数の値が最小となる適用パターンを決定する（ステップＳ５）。ステップＳ５で処理を終了する。 When all the application patterns are selected (Yes in step S4), the application pattern determination unit 27 refers to the value of the objective function for each application pattern stored in the calculation result storage unit 26, and aims. The application pattern that minimizes the value of the function is determined (step S5). The process ends in step S5.

既に説明したように、適用パターンが決定されることで、ニューラルネットワークを用いた演算を最適化することができる。そして、ニューラルネットワークの個々の層に、低精度演算回路５および高精度演算回路６のどちらを適用するのかが決定されるので、ニューラルネットワークの各層の演算精度を、第１の演算精度（例えば、低精度演算回路５による８ビットの整数演算）とするのか、第２の演算精度（例えば、高精度演算回路６による３２ビットの浮動小数点演算）とするのかを決定することができる。 As described above, by determining the application pattern, it is possible to optimize the operation using the neural network. Then, since it is determined whether the low-precision arithmetic circuit 5 or the high-precision arithmetic circuit 6 is applied to each layer of the neural network, the arithmetic accuracy of each layer of the neural network is determined by the first arithmetic accuracy (for example, for example). It is possible to determine whether to perform 8-bit integer arithmetic by the low-precision arithmetic circuit 5 or second arithmetic precision (for example, 32-bit floating-point arithmetic by the high-precision arithmetic circuit 6).

また、本実施形態では、上記のステップＳ１〜Ｓ５の処理によって、適用パターンを決定するので、自動的に適用パターンを決定することができる。従って、ニューラルネットワークの各層の演算精度を、第１の演算精度とするのか、第２の演算精度とするのかを自動的に決定することができる。 Further, in the present embodiment, since the application pattern is determined by the above steps S1 to S5, the application pattern can be automatically determined. Therefore, it is possible to automatically determine whether the calculation accuracy of each layer of the neural network is the first calculation accuracy or the second calculation accuracy.

次に、本発明の実施形態の変形例として、目的関数を、「推論精度」および「処理速度」に加えさらに他の説明変数によって表した場合を説明する。なお、以下に示す変形例の説明では、既に説明した事項については、適宜、説明を省略する。 Next, as a modification of the embodiment of the present invention, a case where the objective function is represented by other explanatory variables in addition to "inference accuracy" and "processing speed" will be described. In the description of the modification shown below, the description of the matters already described will be omitted as appropriate.

目的関数は、「推論精度」および「処理速度」に加えて、さらに、「低精度演算回路５と高精度演算回路６との間で授受されるデータ量」も説明変数として、表されてもよい。以下、「低精度演算回路５と高精度演算回路６との間で授受されるデータ量」を、単に、データ授受量と記す。既に説明したように、データ授受量は、例えば、授受されるデータの個数と、データ１個当たりのバイト数との積によって表される。 In addition to "inference accuracy" and "processing speed", the objective function may also be represented by "the amount of data exchanged between the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6" as explanatory variables. good. Hereinafter, "the amount of data exchanged between the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6" is simply referred to as the amount of data exchanged. As described above, the amount of data exchanged is represented by, for example, the product of the number of data exchanged and the number of bytes per piece of data.

本例では、目的関数記憶部２４は、目的関数として、例えば、以下の式（４）で表される関数を記憶すればよい。 In this example, the objective function storage unit 24 may store, for example, a function represented by the following equation (4) as the objective function.

目的関数＝「推論精度」×α＋「処理速度」×β＋「データ授受量」×γ
・・・（４）Objective function = "inference accuracy" x α + "processing speed" x β + "data transfer amount" x γ
... (4)

γは、「データ授受量」の係数であり、予め決定されている。本例では、γが正の値として定められている場合を例にして説明する。 γ is a coefficient of “data transfer amount” and is predetermined. In this example, the case where γ is defined as a positive value will be described as an example.

本変形例では、説明変数値取得部２３は、「推論精度」および「処理速度」の他に、「データ授受量」の値も適用パターン毎に取得する。 In this modification, the explanatory variable value acquisition unit 23 acquires the value of the “data transfer amount” for each application pattern in addition to the “inference accuracy” and the “processing speed”.

説明変数値取得部２３が「データ授受量」の値を実測によって取得する動作を説明する。説明変数値取得部２３は、処理装置１８に対して適用パターンを指定する。そして、説明変数値取得部２３は、判別モデル記憶部２１に記憶されているニューラルネットワークと、データ記憶部２２に記憶されているデータとを、処理装置１８に入力し、処理装置１８に推論処理を実行させ、処理装置１８がそのデータに対する推論処理を行う際のデータ授受量を計測すればよい、この結果、説明変数値取得部２３は、データ授受量の値を取得する。また、このとき、処理装置１８は、指定された適用パターンに応じた動作で、推論処理を実行する。なお、説明変数値取得部２３は、１つのデータに関して、処理装置１８に推論処理を実行させることで、データ授受量の値を取得することができる。 The operation of the explanatory variable value acquisition unit 23 to acquire the value of the “data transfer amount” by actual measurement will be described. The explanatory variable value acquisition unit 23 specifies an application pattern for the processing device 18. Then, the explanatory variable value acquisition unit 23 inputs the neural network stored in the discrimination model storage unit 21 and the data stored in the data storage unit 22 into the processing device 18, and infer processing is performed in the processing device 18. , And the amount of data exchanged when the processing device 18 performs inference processing on the data may be measured. As a result, the explanatory variable value acquisition unit 23 acquires the value of the amount of data exchanged. Further, at this time, the processing device 18 executes the inference processing by the operation according to the designated application pattern. The explanatory variable value acquisition unit 23 can acquire the value of the data transfer amount by causing the processing device 18 to execute the inference processing for one data.

説明変数値取得部２３は、指定する適用パターンを順次、変更し、適用パターン毎に、実測によってデータ授受量の値を取得する。 The explanatory variable value acquisition unit 23 sequentially changes the designated application pattern, and acquires the value of the data transfer amount by actual measurement for each application pattern.

説明変数値取得部２３が「データ授受量」の値をシミュレーションによって取得する動作を説明する。説明変数値取得部２３は、適用パターンを選択し、設計情報記憶部１９に記憶された設計情報から定まる処理装置１８の動作であって選択した適用パターンに応じた動作を模擬することによって、データ授受量の値を導出すればよい。なお、説明変数値取得部２３は、１つのデータに関して、処理装置１８の動作を模擬することで、データ授受量の値を導出することができる。 The operation of the explanatory variable value acquisition unit 23 acquiring the value of the “data transfer amount” by simulation will be described. The explanatory variable value acquisition unit 23 selects an application pattern, and simulates the operation of the processing device 18 determined from the design information stored in the design information storage unit 19 according to the selected application pattern. The value of the transfer amount may be derived. The explanatory variable value acquisition unit 23 can derive the value of the data transfer amount by simulating the operation of the processing device 18 for one data.

説明変数値取得部２３は、選択する適用パターンを順次、変更し、適用パターン毎に、シミュレーションによってデータ授受量の値を導出する。 The explanatory variable value acquisition unit 23 sequentially changes the application pattern to be selected, and derives the value of the data transfer amount by simulation for each application pattern.

本変形例では、目的関数計算部２５は、「推論精度」の値、「処理速度」の値、および、「データ授受量」の値を式（４）に代入することによって、適用パターン毎に目的関数の値を計算すればよい。 In this modification, the objective function calculation unit 25 substitutes the value of "inference accuracy", the value of "processing speed", and the value of "data transfer amount" into the equation (4) for each application pattern. You just have to calculate the value of the objective function.

その他の点に関しては、上記の実施形態と同様である。 Other points are the same as those in the above embodiment.

本変形例によれば、「データ授受量」も加味して、ニューラルネットワークの各層の演算精度を、第１の演算精度とするのか、第２の演算精度とするのかを決定することができる。 According to this modification, it is possible to determine whether the calculation accuracy of each layer of the neural network is the first calculation accuracy or the second calculation accuracy in consideration of the “data transfer amount”.

また、目的関数は、「推論精度」および「処理速度」に加えて、さらに、「処理装置１８の回路規模（以下、単に回路規模と記す。）」を説明変数として、表されてもよい。 Further, the objective function may be expressed using "the circuit scale of the processing device 18 (hereinafter, simply referred to as the circuit scale)" as an explanatory variable in addition to the "inference accuracy" and the "processing speed".

本例では、目的関数記憶部２４は、目的関数として、例えば、以下の式（５）で表される関数を記憶すればよい。 In this example, the objective function storage unit 24 may store, for example, a function represented by the following equation (5) as the objective function.

目的関数＝「推論精度」×α＋「処理速度」×β＋「回路規模」×δ
・・・（５）Objective function = "inference accuracy" x α + "processing speed" x β + "circuit scale" x δ
... (5)

δは、「回路規模」の係数であり、予め決定されている。本例では、δが正の値として定められている場合を例にして説明する。 δ is a coefficient of “circuit scale” and is predetermined. In this example, the case where δ is defined as a positive value will be described as an example.

以下の説明では、低精度演算回路５に含まれる演算器（例えば、ＭＡＣ）、および、高精度演算回路６に含まれる演算器（例えば、ＭＡＣ）の個数を、低精度演算回路５に含まれる演算器、または、高精度演算回路６に含まれる演算器を基準として表した値を、「回路規模」とする場合を例にして説明する。本例では、低精度演算回路５に含まれる演算器を基準とするものとして説明する。低精度演算回路５に含まれる演算器を基準とする場合、高精度演算回路６に含まれる演算器の個数を、低精度演算回路５に含まれる演算器の何個分に相当するかという値に変換して表わす。また、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器何個分に相当するかは、高精度演算回路６に含まれる１個の演算器の占有面積が、低精度演算回路５に含まれる演算器何個分の占有面積に相当するかによって求めればよい。以下、説明を簡単にするために、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器Ｊ個分に相当するものとして説明する。 In the following description, the number of arithmetic units (for example, MAC) included in the low-precision arithmetic circuit 5 and the number of arithmetic units (for example, MAC) included in the high-precision arithmetic circuit 6 are included in the low-precision arithmetic circuit 5. A case where the value expressed with reference to the arithmetic unit or the arithmetic unit included in the high-precision arithmetic circuit 6 is referred to as “circuit scale” will be described as an example. In this example, it will be described assuming that the arithmetic unit included in the low-precision arithmetic circuit 5 is used as a reference. When the arithmetic unit included in the low-precision arithmetic circuit 5 is used as a reference, the value corresponding to the number of arithmetic units included in the high-precision arithmetic circuit 6 corresponds to the number of arithmetic units included in the low-precision arithmetic circuit 5. Converted to and expressed. Further, how many arithmetic units included in the high-precision arithmetic circuit 6 correspond to one arithmetic unit included in the low-precision arithmetic circuit 5 is determined by the number of arithmetic units included in the high-precision arithmetic circuit 6. It may be obtained depending on how many arithmetic units included in the low-precision arithmetic circuit 5 occupy the occupied area. Hereinafter, for the sake of simplicity, one arithmetic unit included in the high-precision arithmetic circuit 6 will be described as corresponding to J arithmetic units included in the low-precision arithmetic circuit 5.

本変形例では、説明変数値取得部２３は、「推論精度」および「処理速度」の他に、「回路規模」の値も取得する。 In this modification, the explanatory variable value acquisition unit 23 acquires the value of the “circuit scale” in addition to the “inference accuracy” and the “processing speed”.

説明変数値取得部２３が「回路規模」の値を実測によって取得する動作を説明する。処理装置１８が存在する場合には、その処理装置１８内の低精度演算回路５に含まれる演算器の個数、高精度演算回路６に含まれる演算器の個数、および、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器何個分に相当するかという情報は、既知の情報である。説明変数値取得部２３は、例えば、この既知の情報を、予め記憶しているものとする。また、説明を簡単にするために、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器Ｊ個分に相当するものとして説明する。 The operation of the explanatory variable value acquisition unit 23 to acquire the value of the "circuit scale" by actual measurement will be described. When the processing device 18 exists, the number of arithmetic units included in the low-precision arithmetic circuit 5 in the processing apparatus 18, the number of arithmetic units included in the high-precision arithmetic circuit 6, and the high-precision arithmetic circuit 6 Information on how many arithmetic units included in the low-precision arithmetic circuit 5 correspond to one included arithmetic unit is known information. It is assumed that the explanatory variable value acquisition unit 23 stores this known information in advance, for example. Further, for the sake of simplicity, one arithmetic unit included in the high-precision arithmetic circuit 6 will be described as corresponding to J arithmetic units included in the low-precision arithmetic circuit 5.

この場合、説明変数値取得部２３は、以下に示す式（６）の計算によって、「回路規模」の値を計算すればよい。 In this case, the explanatory variable value acquisition unit 23 may calculate the value of the “circuit scale” by the calculation of the following equation (6).

回路規模＝「低精度演算回路５に含まれる演算器の個数」＋
「高精度演算回路６に含まれる演算器の個数」×Ｊ
・・・（６）Circuit scale = "Number of arithmetic units included in low-precision arithmetic circuit 5" +
"Number of arithmetic units included in high-precision arithmetic circuit 6" x J
... (6)

なお、上記の例では、回路規模の値は、適用パターンに依存しないので、説明変数値取得部２３は、回路規模の値を、各適用パターンで共通の値として算出してよい。 In the above example, since the value of the circuit scale does not depend on the application pattern, the explanatory variable value acquisition unit 23 may calculate the value of the circuit scale as a value common to each application pattern.

説明変数値取得部２３が「回路規模」の値をシミュレーションによって取得する動作を説明する。この場合、設計情報記憶部１９（図１０参照）が記憶する設計情報に、低精度演算回路５に含まれる演算器の個数の設計値、高精度演算回路６に含まれる演算器の個数の設計値、および、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器何個分に相当するかという設計値を含めておけばよい。本例においても、高精度演算回路６に含まれる１個の演算器が、低精度演算回路５に含まれる演算器Ｊ個分に相当するものとして説明する。 The operation in which the explanatory variable value acquisition unit 23 acquires the value of the “circuit scale” by simulation will be described. In this case, the design information stored in the design information storage unit 19 (see FIG. 10) includes the design value of the number of arithmetic units included in the low-precision arithmetic circuit 5 and the design of the number of arithmetic units included in the high-precision arithmetic circuit 6. It suffices to include the value and the design value of how many arithmetic units included in the low-precision arithmetic circuit 5 correspond to one arithmetic unit included in the high-precision arithmetic circuit 6. Also in this example, one arithmetic unit included in the high-precision arithmetic circuit 6 will be described as corresponding to J arithmetic units included in the low-precision arithmetic circuit 5.

この場合、説明変数値取得部２３は、以下に示す式（７）の計算によって、「回路規模」の値を計算すればよい。 In this case, the explanatory variable value acquisition unit 23 may calculate the value of the "circuit scale" by the calculation of the following equation (7).

回路規模＝「低精度演算回路５に含まれる演算器の個数の設計値」＋
「高精度演算回路６に含まれる演算器の個数の設計値」×Ｊ
・・・（７）Circuit scale = "Design value of the number of arithmetic units included in the low-precision arithmetic circuit 5" +
"Design value of the number of arithmetic units included in the high-precision arithmetic circuit 6" x J
... (7)

また、シミュレーションによって「回路規模」の値を取得する場合、説明変数値取得部２３は、演算器の個数を低精度演算回路５に含まれる演算器等を基準として表した値とは異なる値で求めてもよい。例えば、説明変数値取得部２３は、「回路規模」の値を求めるための関数（以下、回路規模関数と記す。）によって、回路規模の値を計算してもよい。この場合、説明変数値取得部２３は、回路規模関数を予め保持する。また、回路規模関数は、予め定められている。回路規模関数は、例えば、低精度演算回路５に設けられる演算器の数、高精度演算回路６に設けられる演算器の数、低精度演算回路５がアクセスする第１メモリ７（図３参照）のメモリサイズ、高精度演算回路６がアクセスする第２メモリ８（図３参照）のメモリサイズ、および、データ授受量（低精度演算回路５と高精度演算回路６との間で授受されるデータ量）を変数とする。以下、回路規模関数が、上記の各変数で表される場合を例にして説明する。ただし、回路規模関数で用いられる変数は、上記の例に限定されない。第１メモリ７のメモリサイズ、および、第２メモリ８のメモリサイズは、設計情報として設計情報記憶部１９に記憶させておけばよい。 Further, when the value of "circuit scale" is acquired by simulation, the explanatory variable value acquisition unit 23 uses a value different from the value represented by the number of arithmetic units based on the arithmetic units included in the low-precision arithmetic circuit 5. You may ask. For example, the explanatory variable value acquisition unit 23 may calculate the value of the circuit scale by a function for obtaining the value of the “circuit scale” (hereinafter, referred to as a circuit scale function). In this case, the explanatory variable value acquisition unit 23 holds the circuit scale function in advance. Further, the circuit scale function is predetermined. The circuit scale function is, for example, the number of arithmetic units provided in the low-precision arithmetic circuit 5, the number of arithmetic instruments provided in the high-precision arithmetic circuit 6, and the first memory 7 accessed by the low-precision arithmetic circuit 5 (see FIG. 3). Memory size, memory size of the second memory 8 (see FIG. 3) accessed by the high-precision arithmetic circuit 6, and data transfer amount (data exchanged between the low-precision arithmetic circuit 5 and the high-precision arithmetic circuit 6). Amount) is a variable. Hereinafter, the case where the circuit scale function is represented by each of the above variables will be described as an example. However, the variables used in the circuit scale function are not limited to the above example. The memory size of the first memory 7 and the memory size of the second memory 8 may be stored in the design information storage unit 19 as design information.

説明変数値取得部２３は、上記の各変数の値を回路規模関数に代入することによって、回路規模の値を計算すればよい。ここで、変数のうち、低精度演算回路５に設けられる演算器の数、高精度演算回路６に設けられる演算器の数、第１メモリ７のメモリサイズ、および、第２メモリ８のメモリサイズは、設計情報で定められた値を用いればよい。データ授受量に関しては、説明変数値取得部２３が適用パターンを選択し、設計情報記憶部１９に記憶された設計情報から定まる処理装置１８の動作であって選択した適用パターンに応じた動作を模擬することによって、導出すればよい。説明変数値取得部２３は、上記の演算器の数やメモリサイズ、および、選択した適用パターンに基づいて導出したデータ授受量を回路規模関数に代入することによって、回路規模の値を計算すればよい。また、この場合、説明変数値取得部２３は、選択する適用パターンを順次、変更し、適用パターン毎に、シミュレーションに基づく回路規模の値を計算する。 The explanatory variable value acquisition unit 23 may calculate the value of the circuit scale by substituting the value of each of the above variables into the circuit scale function. Here, among the variables, the number of arithmetic units provided in the low-precision arithmetic circuit 5, the number of arithmetic units provided in the high-precision arithmetic circuit 6, the memory size of the first memory 7, and the memory size of the second memory 8. May use the value specified in the design information. Regarding the amount of data exchanged, the explanatory variable value acquisition unit 23 selects an application pattern, and the operation of the processing device 18 determined from the design information stored in the design information storage unit 19 is simulated according to the selected application pattern. By doing so, it may be derived. If the explanatory variable value acquisition unit 23 calculates the value of the circuit scale by substituting the number of the above arithmetic units, the memory size, and the data transfer amount derived based on the selected application pattern into the circuit scale function. good. Further, in this case, the explanatory variable value acquisition unit 23 sequentially changes the application pattern to be selected, and calculates the value of the circuit scale based on the simulation for each application pattern.

本変形例では、目的関数計算部２５は、「推論精度」の値、「処理速度」の値、および、「回路規模」の値を式（５）に代入することによって、適用パターン毎に目的関数の値を計算すればよい。 In this modification, the objective function calculation unit 25 assigns the value of "inference accuracy", the value of "processing speed", and the value of "circuit scale" to the equation (5), thereby performing the purpose for each application pattern. Just calculate the value of the function.

本変形例によれば、「回路規模」も加味して、ニューラルネットワークの各層の演算精度を、第１の演算精度とするのか、第２の演算精度とするのかを決定することができる。 According to this modification, it is possible to determine whether the calculation accuracy of each layer of the neural network is the first calculation accuracy or the second calculation accuracy in consideration of the “circuit scale”.

また、目的関数は、「推論精度」および「処理速度」に加えて、さらに、「処理装置１８の消費電力（以下、単に消費電力と記す。）」を説明変数として、表されてもよい。 Further, the objective function may be expressed with "power consumption of the processing device 18 (hereinafter, simply referred to as power consumption)" as an explanatory variable in addition to "inference accuracy" and "processing speed".

本例では、目的関数記憶部２４は、目的関数として、例えば、以下の式（８）で表される関数を記憶すればよい。 In this example, the objective function storage unit 24 may store, for example, a function represented by the following equation (8) as the objective function.

目的関数＝「推論精度」×α＋「処理速度」×β＋「消費電力」×ε
・・・（８）Objective function = "inference accuracy" x α + "processing speed" x β + "power consumption" x ε
... (8)

εは、「消費電力」の係数であり、予め決定されている。本例では、εが正の値として定められている場合を例にして説明する。 ε is a coefficient of "power consumption" and is predetermined. In this example, the case where ε is defined as a positive value will be described as an example.

本変形例では、説明変数値取得部２３は、「推論精度」および「処理速度」の他に、「消費電力」の値も適用パターン毎に取得する。 In this modification, the explanatory variable value acquisition unit 23 acquires the value of "power consumption" in addition to the "inference accuracy" and "processing speed" for each application pattern.

説明変数値取得部２３が「消費電力」の値を実測によって取得する動作を説明する。説明変数値取得部２３は、処理装置１８に対して適用パターンを指定する。そして、説明変数値取得部２３は、判別モデル記憶部２１に記憶されているニューラルネットワークと、データ記憶部２２に記憶されているデータとを、処理装置１８に入力し、処理装置１８に推論処理を実行させ、処理装置１８がそのデータに対する推論処理を行う際の消費電力を計測すればよい。この結果、説明変数値取得部２３は、消費電力の値を取得する。また、このとき、処理装置１８は、指定された適用パターンに応じた動作で、推論処理を実行する。なお、説明変数値取得部２３は、１つのデータに関して、処理装置１８に推論処理を実行させることで、消費電力の値を取得することができる。 The operation of the explanatory variable value acquisition unit 23 to acquire the value of "power consumption" by actual measurement will be described. The explanatory variable value acquisition unit 23 specifies an application pattern for the processing device 18. Then, the explanatory variable value acquisition unit 23 inputs the neural network stored in the discrimination model storage unit 21 and the data stored in the data storage unit 22 into the processing device 18, and infer processing is performed in the processing device 18. Is executed, and the power consumption when the processing device 18 performs inference processing on the data may be measured. As a result, the explanatory variable value acquisition unit 23 acquires the value of the power consumption. Further, at this time, the processing device 18 executes the inference processing by the operation according to the designated application pattern. The explanatory variable value acquisition unit 23 can acquire the value of the power consumption by causing the processing device 18 to execute the inference processing for one data.

説明変数値取得部２３は、指定する適用パターンを順次、変更し、適用パターン毎に、実測によって消費電力の値を取得する。 The explanatory variable value acquisition unit 23 sequentially changes the designated application pattern, and acquires the value of the power consumption by actual measurement for each application pattern.

説明変数値取得部２３が「消費電力」の値をシミュレーションによって取得する動作を説明する。「消費電力」の値をシミュレーションによって導出する場合、設計段階で定められている、消費電力値導出に必要なデータを、設計情報記憶部１９に記憶される設計情報に含めておく。説明変数値取得部２３は、適用パターンを選択し、設計情報記憶部１９に記憶された設計情報から定まる処理装置１８の動作であって選択した適用パターンに応じた動作を模擬することによって、消費電力の値を導出すればよい。なお、説明変数値取得部２３は、１つのデータに関して、処理装置１８の動作を模擬することで、消費電力の値を導出することができる。 The operation of the explanatory variable value acquisition unit 23 acquiring the value of "power consumption" by simulation will be described. When deriving the value of "power consumption" by simulation, the data necessary for deriving the power consumption value defined in the design stage is included in the design information stored in the design information storage unit 19. The explanatory variable value acquisition unit 23 selects an application pattern and consumes it by simulating the operation of the processing device 18 determined from the design information stored in the design information storage unit 19 according to the selected application pattern. The value of power may be derived. The explanatory variable value acquisition unit 23 can derive the value of the power consumption by simulating the operation of the processing device 18 for one data.

説明変数値取得部２３は、選択する適用パターンを順次、変更し、適用パターン毎に、シミュレーションによって消費電力の値を導出する。 The explanatory variable value acquisition unit 23 sequentially changes the application pattern to be selected, and derives the power consumption value by simulation for each application pattern.

本変形例では、目的関数計算部２５は、「推論精度」の値、「処理速度」の値、および、「消費電力」の値を式（８）に代入することによって、適用パターン毎に目的関数の値を計算すればよい。 In this modification, the objective function calculation unit 25 substitutes the value of "inference accuracy", the value of "processing speed", and the value of "power consumption" into the equation (8), thereby performing the purpose for each application pattern. Just calculate the value of the function.

本変形例によれば、「消費電力」も加味して、ニューラルネットワークの各層の演算精度を、第１の演算精度とするのか、第２の演算精度とするのかを決定することができる。 According to this modification, it is possible to determine whether the calculation accuracy of each layer of the neural network is the first calculation accuracy or the second calculation accuracy in consideration of "power consumption".

上記の各変形例では、目的関数が、「推論精度」および「処理速度」に加えて、さらに、「データ授受量」、「回路規模」および「消費電力」のいずれかを説明変数として表される場合を説明した。目的関数は、「推論精度」および「処理速度」に加えて、さらに、「データ授受量」、「回路規模」および「消費電力」のうちの任意の１つ以上の説明変数によって表されていてもよい。 In each of the above variants, the objective function is represented in addition to "inference accuracy" and "processing speed", with any of "data transfer amount", "circuit scale", and "power consumption" as explanatory variables. I explained the case. The objective function is represented by any one or more explanatory variables of "data transfer amount", "circuit scale" and "power consumption" in addition to "inference accuracy" and "processing speed". May be good.

目的関数が、「推論精度」、「処理速度」、「データ授受量」、「回路規模」および「消費電力」を説明変数として表されていてもよい。この場合、目的関数記憶部２４は、目的関数として、例えば、以下の式（９）で表される関数を記憶すればよい。 The objective function may be expressed with "inference accuracy", "processing speed", "data transfer amount", "circuit scale", and "power consumption" as explanatory variables. In this case, the objective function storage unit 24 may store, for example, a function represented by the following equation (9) as the objective function.

目的関数＝「推論精度」×α＋「処理速度」×β＋「データ授受量」×γ
＋「回路規模」×δ＋「消費電力」×ε
・・・（９）Objective function = "inference accuracy" x α + "processing speed" x β + "data transfer amount" x γ
+ "Circuit scale" x δ + "Power consumption" x ε
... (9)

この場合、説明変数値取得部２３は、実測により、または、シミュレーションにより、各説明変数（「推論精度」、「処理速度」、「データ授受量」、「回路規模」および「消費電力」）の値を、適用パターン毎に取得すればよい。 In this case, the explanatory variable value acquisition unit 23 determines each explanatory variable (“inference accuracy”, “processing speed”, “data transfer amount”, “circuit scale”, and “power consumption”) by actual measurement or simulation. The value may be acquired for each application pattern.

また、目的関数計算部２５は、説明変数値取得部２３によって取得された各説明変数の値を式（９）に代入することによって、適用パターン毎に目的関数の値を計算すればよい。 Further, the objective function calculation unit 25 may calculate the value of the objective function for each application pattern by substituting the value of each explanatory variable acquired by the explanatory variable value acquisition unit 23 into the equation (9).

この場合、「データ授受量」、「回路規模」および「消費電力」も加味して、ニューラルネットワークの各層の演算精度を、第１の演算精度とするのか、第２の演算精度とするのかを決定することができる。 In this case, whether the calculation accuracy of each layer of the neural network is the first calculation accuracy or the second calculation accuracy in consideration of "data transfer amount", "circuit scale", and "power consumption". Can be decided.

なお、式（９）において、「データ授受量」×γの項が含まれていなくてもよい。この場合、説明変数値取得部２３は、「データ授受量」の値を取得しなくてよい。 It should be noted that the term “data transfer amount” × γ may not be included in the equation (9). In this case, the explanatory variable value acquisition unit 23 does not have to acquire the value of the “data transfer amount”.

また、式（９）において、「回路規模」×δの項が含まれていなくてもよい。この場合、説明変数値取得部２３は、「回路規模」の値を取得しなくてよい。 Further, the term “circuit scale” × δ may not be included in the equation (9). In this case, the explanatory variable value acquisition unit 23 does not have to acquire the value of the “circuit scale”.

また、式（９）において、「消費電力」×εの項が設けられていなくてもよい。この場合、説明変数値取得部２３は、「消費電力」の値を取得しなくてよい。 Further, in the equation (9), the term “power consumption” × ε may not be provided. In this case, the explanatory variable value acquisition unit 23 does not have to acquire the value of "power consumption".

図１２は、本発明の実施形態またはその変形例に係るコンピュータの構成例を示す概略ブロック図である。コンピュータ１０００は、ＣＰＵ１００１と、主記憶装置１００２と、補助記憶装置１００３と、インタフェース１００４とを備える。 FIG. 12 is a schematic block diagram showing a configuration example of a computer according to an embodiment of the present invention or a modification thereof. The computer 1000 includes a CPU 1001, a main storage device 1002, an auxiliary storage device 1003, and an interface 1004.

本発明の演算最適化装置は、コンピュータ１０００に実装される。演算最適化装置の動作は、演算最適化プログラムの形式で補助記憶装置１００３に記憶されている。ＣＰＵ１００１は、その演算最適化プログラムを補助記憶装置１００３から読み出して主記憶装置１００２に展開し、その演算最適化プログラムに従って、上記の実施形態やその変形例で説明した処理を実行する。 The arithmetic optimization device of the present invention is mounted on the computer 1000. The operation of the calculation optimization device is stored in the auxiliary storage device 1003 in the form of a calculation optimization program. The CPU 1001 reads the calculation optimization program from the auxiliary storage device 1003, deploys it to the main storage device 1002, and executes the processing described in the above embodiment and its modification according to the calculation optimization program.

補助記憶装置１００３は、一時的でない有形の媒体の例である。一時的でない有形の媒体の他の例として、インタフェース１００４を介して接続される磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory ）、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory ）、半導体メモリ等が挙げられる。また、プログラムが通信回線によってコンピュータ１０００に配信される場合、配信を受けたコンピュータ１０００がそのプログラムを主記憶装置１００２に展開し、上記の処理を実行してもよい。 Auxiliary storage 1003 is an example of a non-temporary tangible medium. Other examples of non-temporary tangible media include magnetic disks, magneto-optical disks, CD-ROMs (Compact Disk Read Only Memory), DVD-ROMs (Digital Versatile Disk Read Only Memory), which are connected via interface 1004. Examples include semiconductor memory. When the program is distributed to the computer 1000 by the communication line, the distributed computer 1000 may expand the program to the main storage device 1002 and execute the above processing.

また、プログラムは、前述の処理の一部を実現するためのものであってもよい。さらに、プログラムは、補助記憶装置１００３に既に記憶されている他のプログラムとの組み合わせで前述の処理を実現する差分プログラムであってもよい。 Further, the program may be for realizing a part of the above-mentioned processing. Further, the program may be a difference program that realizes the above-mentioned processing in combination with another program already stored in the auxiliary storage device 1003.

また、各構成要素の一部または全部は、汎用または専用の回路（circuitry ）、プロセッサ等やこれらの組み合わせによって実現されてもよい。これらは、単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。各構成要素の一部または全部は、上述した回路等とプログラムとの組み合わせによって実現されてもよい。 Further, a part or all of each component may be realized by a general-purpose or dedicated circuitry, a processor, or a combination thereof. These may be composed of a single chip or may be composed of a plurality of chips connected via a bus. A part or all of each component may be realized by the combination of the circuit or the like and the program described above.

各構成要素の一部または全部が複数の情報処理装置や回路等により実現される場合には、複数の情報処理装置や回路等は集中配置されてもよいし、分散配置されてもよい。例えば、情報処理装置や回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When a part or all of each component is realized by a plurality of information processing devices and circuits, the plurality of information processing devices and circuits may be centrally arranged or distributed. For example, the information processing device, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client-and-server system and a cloud computing system.

次に、本発明の概要について説明する。図１３は、本発明の演算最適化装置の概要を示すブロック図である。本発明の演算最適化装置は、説明変数値取得手段７３と、目的関数計算手段７５と、適用パターン決定手段７７とを備える。 Next, the outline of the present invention will be described. FIG. 13 is a block diagram showing an outline of the calculation optimization device of the present invention. The arithmetic optimization apparatus of the present invention includes explanatory variable value acquisition means 73, objective function calculation means 75, and application pattern determination means 77.

説明変数値取得手段７３（例えば、説明変数値取得部２３）は、１つ以上のユニットでそれぞれ構成された複数の層が結合された判別モデル（例えば、ニューラルネットワーク）を用いた演算で、第１の演算精度で演算を行う第１の演算回路（例えば、低精度演算回路５）をどの層に適用し、第１の演算精度よりも高い第２の演算精度で演算を行う第２の演算回路（例えば、高精度演算回路６）をどの層に適用するかを定めた情報である適用パターン毎に、所定の説明変数の値を取得する。 The explanatory variable value acquisition means 73 (for example, the explanatory variable value acquisition unit 23) is an operation using a discrimination model (for example, a neural network) in which a plurality of layers each composed of one or more units are connected. A second calculation in which the first calculation circuit (for example, the low-precision calculation circuit 5) that performs the calculation with the calculation accuracy of 1 is applied to which layer, and the calculation is performed with the second calculation accuracy higher than the first calculation accuracy. The value of a predetermined explanatory variable is acquired for each application pattern, which is information that defines to which layer the circuit (for example, the high-precision arithmetic circuit 6) is applied.

目的関数計算手段７５（例えば、目的関数計算部２５）は、所定の説明変数で表される目的関数の値を、適用パターン毎に計算する。 The objective function calculation means 75 (for example, the objective function calculation unit 25) calculates the value of the objective function represented by a predetermined explanatory variable for each application pattern.

適用パターン決定手段７７(例えば、適用パターン決定部２７）は、目的関数の値が最小となる適用パターンを決定する。 The application pattern determination means 77 (for example, the application pattern determination unit 27) determines the application pattern that minimizes the value of the objective function.

そのような構成によって、判別モデルを用いた演算を最適化できるように、判別モデルの各層における演算精度を自動的に定めることができる。 With such a configuration, the calculation accuracy in each layer of the discrimination model can be automatically determined so that the calculation using the discrimination model can be optimized.

目的関数は、少なくとも、判別モデルを用いた演算の処理速度、および、演算結果の正確さを所定の説明変数として表されていてもよい。 The objective function may at least represent the processing speed of the operation using the discriminant model and the accuracy of the operation result as predetermined explanatory variables.

目的関数は、第１の演算回路と第２の演算回路との間で授受されるデータ量を所定の説明変数として表されていてもよい。 The objective function may represent the amount of data exchanged between the first arithmetic circuit and the second arithmetic circuit as a predetermined explanatory variable.

目的関数は、判別モデルを用いた演算を行う回路の回路規模を所定の説明変数として表されていてもよい。 The objective function may be expressed as a predetermined explanatory variable of the circuit scale of the circuit that performs the calculation using the discrimination model.

目的関数は、判別モデルを用いた演算での消費電力を所定の説明変数として表されていてもよい。 The objective function may express the power consumption in the calculation using the discriminant model as a predetermined explanatory variable.

説明変数値取得手段７３が、所定の説明変数の値を実測により取得する構成であってもよい。 The explanatory variable value acquisition means 73 may be configured to acquire the value of a predetermined explanatory variable by actual measurement.

説明変数値取得手段７３が、所定の説明変数の値をシミュレーションにより取得する構成であってもよい。 The explanatory variable value acquisition means 73 may be configured to acquire the value of a predetermined explanatory variable by simulation.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記の実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the invention of the present application has been described above with reference to the embodiments, the invention of the present application is not limited to the above-described embodiment. Various changes that can be understood by those skilled in the art can be made within the scope of the present invention in terms of the configuration and details of the present invention.

Possibility of industrial use

本発明は、例えば、ニューラルネットワーク等の判別モデルを用いた演算を最適化する演算最適化装置に好適に適用される。 The present invention is suitably applied to, for example, an operation optimization device that optimizes an operation using a discrimination model such as a neural network.

５低精度演算回路
６高精度演算回路
７第１メモリ
８第２メモリ
９第３メモリ
１８処理装置
１９設計情報記憶部
２１判別モデル記憶部
２２データ記憶部
２３説明変数値取得部
２４目的関数記憶部
２５目的関数計算部
２６計算結果記憶部
２７適用パターン決定部5 Low-precision arithmetic circuit 6 High-precision arithmetic circuit 7 1st memory 8 2nd memory 9 3rd memory 18 Processing device 19 Design information storage unit 21 Discrimination model storage unit 22 Data storage unit 23 Explanation variable value acquisition unit 24 Objective function storage unit 25 Arbitrary-precision calculation unit 26 Calculation result storage unit 27 Applicable pattern determination unit

Claims

In the calculation using the discrimination model in which a plurality of layers each composed of one or more units are combined, the first calculation circuit that performs the calculation with the first calculation accuracy is applied to which layer, and the first Explanatory variable value acquisition to acquire the value of a predetermined explanatory variable for each application pattern, which is information that defines to which layer the second arithmetic circuit that performs arithmetic with a second arithmetic accuracy higher than the arithmetic accuracy is applied. Means and
An objective function calculation means for calculating the value of the objective function represented by the predetermined explanatory variable for each application pattern,
An arithmetic optimization device including an application pattern determining means for determining an application pattern that minimizes the value of an objective function.

The calculation optimization device according to claim 1, wherein the objective function is at least the processing speed of the calculation using the discrimination model and the accuracy of the calculation result as predetermined explanatory variables.

The arithmetic optimization device according to claim 2, wherein the objective function represents the amount of data exchanged between the first arithmetic circuit and the second arithmetic circuit as a predetermined explanatory variable.

The operation optimization device according to claim 2 or 3, wherein the objective function represents the circuit scale of a circuit that performs an operation using a discrimination model as a predetermined explanatory variable.

The calculation optimization device according to any one of claims 2 to 4, wherein the objective function represents the power consumption in the calculation using the discrimination model as a predetermined explanatory variable.

The arithmetic optimization device according to any one of claims 1 to 5, wherein the explanatory variable value acquisition means acquires the value of a predetermined explanatory variable by actual measurement.

The arithmetic optimization device according to any one of claims 1 to 5, wherein the explanatory variable value acquisition means acquires the value of a predetermined explanatory variable by simulation.

In the calculation using a discrimination model in which a plurality of layers each composed of one or more units are combined, the first calculation circuit that performs the calculation with the first calculation accuracy is applied to which layer, and the first The value of a predetermined explanatory variable is acquired for each application pattern, which is information that defines to which layer the second arithmetic circuit that performs the arithmetic with the second arithmetic accuracy higher than the arithmetic accuracy is applied.
The value of the objective function represented by the predetermined explanatory variable is calculated for each application pattern, and the value is calculated.
An arithmetic optimization method characterized by determining the application pattern that minimizes the value of the objective function.

On the computer
In the calculation using the discrimination model in which a plurality of layers each composed of one or more units are combined, the first calculation circuit that performs the calculation with the first calculation accuracy is applied to which layer, and the first Explanatory variable value acquisition to acquire the value of a predetermined explanatory variable for each application pattern, which is information that defines to which layer the second arithmetic circuit that performs arithmetic with a second arithmetic accuracy higher than the arithmetic accuracy is applied. process,
Objective function calculation processing that calculates the value of the objective function represented by the predetermined explanatory variable for each application pattern, and
An arithmetic optimization program for executing the application pattern determination process that determines the application pattern that minimizes the value of the objective function.