JP2021168042A

JP2021168042A - Information processing device, information processing method, and program

Info

Publication number: JP2021168042A
Application number: JP2020071101A
Authority: JP
Inventors: 達也森; Tatsuya Mori; 暢彦大鳥羽; Nobuhiko Otoba
Original assignee: ARAYA KK
Current assignee: ARAYA KK
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2021-10-21

Abstract

To efficiently compress a learned model so as to contribute to acceleration of an arithmetic operation.SOLUTION: An information processing device is provided with an input/output arithmetic unit 14, a compression weight matrix calculation unit 15, and a re-learning unit 16. The input/output arithmetic unit 14 performs an arithmetic operation for inference data with a learned model and takes out input data and output data for performing a matrix operation in a specific layer being a compression object of the arithmetic operation. The compression weight calculation unit 15 performs an arithmetic operation for the input data taken out by the arithmetic unit 14 in a compression weight matrix in which a zero/nonzero pattern with an element at a specific suffix in the matrix being zero is applied to a matrix in a specific layer, and obtains a compression weight matrix with the weight being optimized by performing an arithmetic operation for reducing an error between the output data of the arithmetic operation result and the output data taken out by the arithmetic unit 14. The re-learning unit 16 re-learns the learned model in which the compression weight matrix obtained by the compression weight calculation unit 15 is applied to a specific layer by using correct data while maintaining a position of zero at zero.SELECTED DRAWING: Figure 4

Description

本発明は、人工知能に用いるニューラルネットワークの演算を行う情報処理装置及び情報処理方法、並びに情報処理方法を実行するプログラムに関し、特にニューラルネットワークの演算を行う際の演算量を削減する技術に関する。 The present invention relates to an information processing device and an information processing method for performing an operation on a neural network used for artificial intelligence, and a program for executing the information processing method, and more particularly to a technique for reducing the amount of calculation when performing an operation on a neural network.

一般に、ニューラルネットワークにおいて、特に認識性能や予測性能が高いディープニューラルネットワーク（以下「ＤＮＮ」と称する）や畳み込みニューラルネットワーク（以下「ＣＮＮ」と称する）が、インターネットサービスやクラウド経由・機器搭載等の手段により、スマートフォン、自動車機器、家電機器、工場用機器、ロボット等へのアプリケーションとして提供されている。 In general, in neural networks, deep neural networks (hereinafter referred to as "DNN") and convolutional neural networks (hereinafter referred to as "CNN"), which have particularly high recognition performance and prediction performance, are means such as Internet services, via the cloud, and on-board equipment. Is provided as an application to smartphones, automobile equipment, home appliances, factory equipment, robots, and the like.

ＷＯ２０１９／０９２９００号WO2019 / 092900

従来の人工知能機能の実現として多く採用されているＤＮＮやＣＮＮ等のニューラルネットワークは、演算量が大きく、計算機資源に大規模なサーバを用意したり、グラフィックプロセッシングユニット（以下「ＧＰＵ」と称する）等の追加のユニットを搭載したりする必要がある。このため、人工知能用設備の導入や機器への実装が高価になり、また、大量の消費電力が必要になるという問題が発生する。 Neural networks such as DNN and CNN, which are often used to realize conventional artificial intelligence functions, have a large amount of calculation, and a large-scale server is prepared for computer resources, or a graphic processing unit (hereinafter referred to as "GPU"). It is necessary to install additional units such as. For this reason, there is a problem that the introduction of artificial intelligence equipment and the mounting on the equipment become expensive, and a large amount of power consumption is required.

このため、ニューラルネットワークの演算を行う際に、ニューラルネットワークを構成する学習済モデルの各レイヤでの演算量を削減することが提案されている。例えば特許文献１には、ニューラルネットワークを構成する学習済モデルの各レイヤで行列演算を行う際に、行列の一部の要素を削除して、演算量を削減する技術が記載されている。 Therefore, it has been proposed to reduce the amount of calculation in each layer of the trained model constituting the neural network when performing the calculation of the neural network. For example, Patent Document 1 describes a technique for reducing the amount of calculation by deleting a part of the elements of the matrix when performing the matrix operation in each layer of the trained model constituting the neural network.

行列の一部の要素を削除するということは、削除対象の値を零にした行列演算を行うことに相当する。行列の一部の要素を零にすることで、零となった箇所を含む演算が不要になり、演算量を削減できる他、演算に必要なメモリ容量も削減することができる。 Deleting some elements of a matrix is equivalent to performing a matrix operation with the value to be deleted set to zero. By setting a part of the elements of the matrix to zero, the calculation including the zeroed part becomes unnecessary, the amount of calculation can be reduced, and the memory capacity required for the calculation can also be reduced.

このように、従来の圧縮処理の一つとして、行列の要素のうち絶対値の小さいものを零にして、圧縮することが知られている。
この圧縮技術は、例えば図１６の左側に示すように、３行×３列の重み行列の中で、その重み行列の内で値が１未満のもの（ここでは、０．１や０．２）を、図１６の右側に示すように、「０」に置き換えた圧縮後の重み行例に変換する技術である。 As described above, as one of the conventional compression processes, it is known that the elements of the matrix having a small absolute value are set to zero and compressed.
In this compression technique, for example, as shown on the left side of FIG. 16, in a weight matrix of 3 rows × 3 columns, the value of the weight matrix is less than 1 (here, 0.1 or 0.2). ) Is converted into a compressed weight line example in which "0" is replaced as shown on the right side of FIG.

ところが、このような重みの値に基づいて圧縮処理を行うようにすると、削除対象の位置が学習済モデルに依存してしまい、演算の高速化には寄与しないことがある。
演算速度を高速化するためには、値を零にする削除位置が演算の高速化に寄与する位置であることが好ましい。しかし、学習済モデルによる推論精度を維持した上で、ニューラルネットワークの演算を高速化することは、従来から行われていなかった。 However, if the compression process is performed based on such a weight value, the position of the deletion target depends on the trained model, which may not contribute to speeding up the calculation.
In order to increase the calculation speed, it is preferable that the deletion position where the value is set to zero is a position that contributes to the speeding up of the operation. However, speeding up the operation of the neural network while maintaining the inference accuracy of the trained model has not been performed conventionally.

また、従来、モデル全体の精度劣化を抑える、各層の圧縮率を決定する手法は知られていなかった。そのため、各層の圧縮率がモデル全体の精度劣化を抑制しているかどうかは、再学習を伴う処理を実行し、圧縮モデルを実際に作成しない限りわからなかった。したがって、学習済みモデルを、精度劣化を抑えつつ、効率よく圧縮することは困難であった。 Further, conventionally, a method for determining the compression ratio of each layer, which suppresses the deterioration of the accuracy of the entire model, has not been known. Therefore, whether or not the compression ratio of each layer suppresses the deterioration of the accuracy of the entire model could not be known unless a process involving re-learning was performed and the compression model was actually created. Therefore, it has been difficult to efficiently compress the trained model while suppressing deterioration in accuracy.

本発明は、演算の高速化に寄与するように効率よく学習済モデルを圧縮することができる情報処理装置、情報処理方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide an information processing device, an information processing method, and a program capable of efficiently compressing a trained model so as to contribute to speeding up the calculation.

本発明の情報処理装置は、推論を行う推論データに対して、複数のレイヤで構成された学習済モデルを適用してニューラルネットワークの演算を行う際の学習済モデルを圧縮する情報処理装置であって、学習済モデルにより推論データの演算を行うと共に、その演算時の圧縮対象となる特定のレイヤで行列演算を行う際の入力データと出力データを取り出す入出力演算部を備える。
さらに本発明の情報処理装置は、行列の特定の添字における要素を零としそれら以外は元々の値とする零・非零パターンを、特定のレイヤの行列に適用した圧縮重み行列で、演算部が取り出した入力データの演算を行って得た演算結果の出力データと、入出力演算部で取り出した出力データとの誤差を低減させる演算を行って、重みを適正化した圧縮重み行列を得る圧縮重み行列算出部と、圧縮重み行列算出部で得た重み行列を特定のレイヤに適用した学習済モデルを、圧縮重み行列の零の位置を零に保ったままで、正解データを使って再学習する再学習部と、を備える。 The information processing device of the present invention is an information processing device that compresses a trained model when performing a neural network calculation by applying a trained model composed of a plurality of layers to the reasoning data to be inferred. Therefore, it is provided with an input / output calculation unit that performs calculation of inference data by the trained model and extracts input data and output data when performing matrix calculation on a specific layer to be compressed at the time of the calculation.
Further, the information processing apparatus of the present invention is a compression weight matrix in which an element in a specific subscript of a matrix is set to zero and a zero / non-zero pattern in which other elements are the original values is applied to a matrix of a specific layer. A compression weight matrix with optimized weights is obtained by performing an operation to reduce the error between the output data of the calculation result obtained by performing the calculation of the extracted input data and the output data extracted by the input / output calculation unit. Re-learning the trained model in which the matrix calculation unit and the weight matrix obtained by the compression weight matrix calculation unit are applied to a specific layer, using the correct answer data while keeping the zero position of the compression weight matrix at zero. It has a learning department.

また、本発明の情報処理方法は、推論を行う推論データに対して、複数のレイヤで構成された学習済モデルを適用してニューラルネットワークの演算を行う際の学習済モデルを圧縮する情報処理方法であって、学習済モデルにより推論データの演算を行う共に、その演算時の圧縮対象となる特定のレイヤで行列演算を行う際の入力データと出力データを取り出す演算手順と、行列の特定の添字における要素を零としそれら以外は元々の値とする零・非零パターンを、特定のレイヤの行列に適用した圧縮重み行列で、演算手順が取り出した入力データの演算を行い、演算結果の出力データと、演算手順で取り出した出力データとの誤差を最小化する演算を行って、重みを適正化した圧縮重み行列を得る圧縮重み行列算出手順と、圧縮重み行列計算手順で得られた重み行列を特定のレイヤに適用した学習済モデルを、圧縮重み行列の零の位置を零に保ったままで、正解データを使って再学習する再学習手順と、を含む。 Further, the information processing method of the present invention is an information processing method that compresses a trained model when performing a neural network calculation by applying a trained model composed of a plurality of layers to the inference data to be inferred. The calculation procedure for extracting the input data and output data when the inference data is calculated by the trained model and the matrix operation is performed on the specific layer to be compressed at the time of the calculation, and the specific subscript of the matrix. The input data extracted by the calculation procedure is calculated by applying the zero / non-zero pattern, in which the elements in are zero and the other values are the original values, to the matrix of a specific layer, and the output data of the calculation result is performed. And the compression weight matrix calculation procedure to obtain the compression weight matrix with the appropriate weights by performing the operation to minimize the error with the output data extracted in the calculation procedure, and the weight matrix obtained by the compression weight matrix calculation procedure. It includes a retraining procedure in which a trained model applied to a particular layer is retrained using correct data while keeping the zero position of the compression weight matrix at zero.

また、本発明のプログラムは、上述の情報処理方法の各処理手順をステップ化して、コンピュータに実行させるものである。 Further, in the program of the present invention, each processing procedure of the above-mentioned information processing method is stepped and executed by a computer.

本発明によれば、演算量の削減に適したパターンで各レイヤを圧縮することができ、学習済モデルの圧縮を効率よく行うことができる。 According to the present invention, each layer can be compressed with a pattern suitable for reducing the amount of calculation, and the trained model can be efficiently compressed.

学習済モデルのレイヤ構造の例を示す図である。It is a figure which shows the example of the layer structure of the trained model. 学習済モデルの重み行列を用いた畳み込み演算の例の例を示す図である。It is a figure which shows the example of the convolution operation using the weight matrix of a trained model. 図２の重み行列の一部に零パターンを適用した例を示す図である。It is a figure which shows the example which applied the zero pattern to a part of the weight matrix of FIG. 本発明の第１の実施の形態例による情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus according to 1st Embodiment of this invention. 本発明の第１の実施の形態例による再学習部の例を示す図である。It is a figure which shows the example of the re-learning part by the example of 1st Embodiment of this invention. 本発明の第１の実施の形態例による情報処理装置のハードウェア構成例を示すブロック図である。It is a block diagram which shows the hardware configuration example of the information processing apparatus by the 1st Embodiment example of this invention. 本発明の第１の実施の形態例による処理手順の流れを示すフローチャートである。It is a flowchart which shows the flow of the processing procedure by the example of 1st Embodiment of this invention. 本発明の第２の実施の形態例による情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus according to the 2nd Embodiment of this invention. 本発明の第２の実施の形態例での各レイヤの圧縮による正解率の変動の例を示す特性図である。It is a characteristic diagram which shows the example of the fluctuation of the correct answer rate by compression of each layer in the example of the 2nd Embodiment of this invention. 本発明の第２の実施の形態例による処理手順の流れを示すフローチャートである。It is a flowchart which shows the flow of the processing procedure by the 2nd Embodiment of this invention. 重みを零にしたままで学習した場合と線形回帰した上で再学習した場合を比較した例を示す特性図である。It is a characteristic diagram which shows the example which compared the case of learning with the weight set to zero, and the case of re-learning after linear regression. 本発明を画像認識に適用した場合のモデルの具体例を示す図である。It is a figure which shows the specific example of the model when this invention is applied to image recognition. 図１２に示すモデルを圧縮した例を示す図である。It is a figure which shows the example which compressed the model shown in FIG. モデルの特定レイヤを再学習した例を示す図である。It is a figure which shows the example which relearned a specific layer of a model. 本発明による重み行列選択部で実施する処理で用いるデータの一例を示す図である。It is a figure which shows an example of the data used in the process performed by the weight matrix selection part by this invention. 従来の重み行列を圧縮した例を示す図である。It is a figure which shows the example which compressed the conventional weight matrix.

以下、本発明の実施の形態例を説明する。
本発明は、推論を行う推論データに対して、複数のレイヤで構成された学習済モデルを適用してニューラルネットワークの演算を行う際の学習済モデルを圧縮する処理を行うものである。 Hereinafter, examples of embodiments of the present invention will be described.
INDUSTRIAL APPLICABILITY The present invention applies a trained model composed of a plurality of layers to inference data to be inferred, and performs a process of compressing the trained model when performing a neural network operation.

［学習済モデルのレイヤ構成と重み行列の圧縮］
まず、本発明が適用されるニューラルネットワークの演算を行う学習済モデルの例について、図１〜図３を参照して説明する。
図１は、学習済モデルのレイヤ構造の一例を示す。
図１に示すように、学習済モデルは、第１レイヤＬ１、第２レイヤＬ２、第３レイヤＬ３、・・・、第ｎレイヤＬｎ（ｎは任意の整数）と複数のレイヤを有する構造になっており、それぞれのレイヤＬ１〜Ｌｎで演算が行われる。 [Trained model layer structure and weight matrix compression]
First, an example of a trained model that performs an operation on a neural network to which the present invention is applied will be described with reference to FIGS. 1 to 3.
FIG. 1 shows an example of the layer structure of the trained model.
As shown in FIG. 1, the trained model has a structure having a first layer L1, a second layer L2, a third layer L3, ..., An nth layer Ln (n is an arbitrary integer) and a plurality of layers. The calculation is performed on each of the layers L1 to Ln.

各レイヤＬ１〜Ｌｎでの演算は、例えば図１に第３レイヤＬ３を拡大して示すように、線形な行列計算と、非線形な活性化関数による演算とに分けることができる。但し、全てのレイヤが、線形な行列計算と、非線形な活性化関数による演算との双方を備えてなくてもよい。 The calculation in each layer L1 to Ln can be divided into a linear matrix calculation and a calculation by a non-linear activation function, for example, as shown in an enlarged view of the third layer L3 in FIG. However, not all layers need to have both linear matrix calculations and non-linear activation function calculations.

線形な行列計算と非線形な活性化関数による演算とを備えたレイヤの場合には、対象となるレイヤへの入力をｘとしたとき、まず、線形な行列計算で、関数ｆ（ｘ）の線形な行列計算が行われて値ｙが得られる。さらに、値ｙは、非線形な活性化関数による演算で、値ｚに変換され、値ｚが対象となるレイヤの出力になる。
このような演算が、各レイヤＬ１〜Ｌｎで順に実行されて、最終段の第ｎレイヤＬｎの出力として、推論結果が得られる。 In the case of a layer having a linear matrix calculation and an operation by a non-linear activation function, when the input to the target layer is x, first, in the linear matrix calculation, the linear function f (x) is linear. Matrix calculation is performed to obtain the value y. Further, the value y is converted into the value z by the operation by the non-linear activation function, and the value z becomes the output of the target layer.
Such operations are sequentially executed in each of the layers L1 to Ln, and an inference result is obtained as the output of the nth layer Ln in the final stage.

ここで、例えば第３レイヤＬ３を圧縮する場合には、線形な行列計算ｆ（ｘ）が、圧縮された線形な行列計算ｆｃ（ｘ）に変換される。
仮に、線形な行列計算ｆ（ｘ）と、圧縮された線形な行列計算ｆｃ（ｘ）とで、入出力関係が同じであれば、圧縮によって推論の精度が落ちないことになる。 Here, for example, when the third layer L3 is compressed, the linear matrix calculation f (x) is converted into the compressed linear matrix calculation fc (x).
If the linear matrix calculation f (x) and the compressed linear matrix calculation fc (x) have the same input / output relationship, the inference accuracy will not be reduced by the compression.

図２は、線形な行列計算ｆ（ｘ）の行列表現の例を示す。
ここでは、図２の左上に示すように、入力ｘとして、ｉｗ＝１８、ｉｈ＝１８、ｉｃ＝１０、ストライド＝１の畳み込み演算をしたとき、図２の左下に示すように、ｋｗ＊ｋｈ＊ｉｃ＝ｎの行列（ｘ_１１から始まる行列）に展開される。ここでは、ストライドで展開される数をｍとする。 FIG. 2 shows an example of the matrix representation of the linear matrix calculation f (x).
Here, as shown in the upper left of FIG. 2, when the convolution calculation of iw = 18, ih = 18, ic = 10, and stride = 1 is performed as the input x, as shown in the lower left of FIG. 2, kw * kh * it is expanded to ic = n matrix of _(matrix starting from _{x 11).} Here, let m be the number developed by stride.

重み行列については、図２の中央の上段に示すように、ｋｗ＝３、ｋｈ＝３、ｉｃ＝１０がフィルタ数（filter_num）だけ用意される。
この重み行列は、図２の中央の下段に示すように、ｏｃ＝（filter_num）＝ｌの行列（ｗ_１１から始まる行列）で示される。
これにより、入力ｘに重み行列を掛けて得られた出力ｙが、図２の右上に示すように得られ、図２の右下に示すように、出力ｙ_１１の行列（ｙ_１１から始まる行列）で示される。 As for the weight matrix, as shown in the upper center of FIG. 2, kw = 3, kh = 3, and ic = 10 are prepared for the number of filters (filter_num).
This weight matrix is represented by a matrix of oc = (filter_num) = l (a matrix _{starting from w 11} ) as shown in the lower center of FIG.
As a result, the output y obtained by multiplying the input x by the weight matrix is obtained as shown in the upper right of FIG. 2, and as shown in the lower right of FIG. 2, the matrix of _{the output y 11} (the matrix starting from _{y 11).} ).

図３は、この図２に示す線形な行列計算ｆ（ｘ）を、圧縮された線形な行列計算で行う例を示す。
図３の例では、図３の中央の下段に示すように、重み行列（ｗ′_１１から始まる行列）のパターンの一部の箇所が零（０）に置き換えられ、非零の箇所は、図２の中央の下段に示す重み行列の値と同じになっている。以下の説明では、この重み行列の零と非零のパターンを「零・非零パターン」と称する。
図３の中央の下段に示す重み行列は入力行列に乗算され、図３の右下に示す出力ｙ′_１１の行列（ｙ′_１１から始まる行列）が得られる。 FIG. 3 shows an example in which the linear matrix calculation f (x) shown in FIG. 2 is performed by a compressed linear matrix calculation.
In the example of FIG. 3, as shown in the lower part of the center of FIG. 3, replaced by a partial portion is zero (0) of the pattern of the weight matrix (w 'matrix starting from _11), portions of the non-zero, FIG. It is the same as the value of the weight matrix shown in the lower part of the center of 2. In the following description, the zero and non-zero patterns of this weight matrix are referred to as "zero / non-zero patterns".
The weight matrix shown in the lower center of FIG. 3 is multiplied by the input matrix to obtain the matrix of _{output y ′ 11} shown in the lower right of FIG. 3 (matrix starting from _{y ′ 11).}

この出力ｙ′_１１を得る行列演算は、零・非零パターンを有する重み行列の演算であり、零の乗算や加算を省略することができる。このため、出力ｙ′_１１を得るための演算回数を、図２に示すｙ_１１を得るための演算回数よりも削減することができるので、出力ｙ′_１１を得る行列演算は、零の位置が高速化に寄与する位置であれば、ニューラルネットワークの演算量の削減に貢献すると言ってよい。 The matrix operation for obtaining the output y ′ ₁₁ is an operation for a weight matrix having a zero / non-zero pattern, and multiplication and addition of zeros can be omitted. Therefore, the output y 'the number of operations for obtaining _11, it is possible to reduce than the number of operations for obtaining y ₁₁ shown in FIG. 2, the output y' is a matrix operation to obtain _11, the position of zero If it is a position that contributes to speeding up, it can be said that it contributes to reducing the amount of calculation of the neural network.

ここで、圧縮しない出力ｙと、圧縮した出力ｙ′を比較すると、重み行列のｋ列目（＝ｋ番目のフィルタ）と入力xのi行目を使った結果は、数１式に示すような一次式になる。ｗは重み行列の値である。 Here, comparing the uncompressed output y and the compressed output y', the result using the k-th column (= k-th filter) of the weight matrix and the i-th row of the input x is shown in Equation 1. It becomes a linear expression. w is the value of the weight matrix.

ここで、入力ｘと出力ｙのサンプルが十分にあれば、圧縮された重み行列の値ｗ′_ｊｋを最小二乗法で求めることが可能である。
最小二乗法は、複数の入力データから算出される各入力データの誤差Σ_ｉ ^m（ｙ_ｉｋ−ｙ′_ｉｋ）^２の和を最小にする、圧縮された重み行列の重みの値ｗ′_ｊｋを求める手法である。
つまり、該当するレイヤについての非圧縮の行列の入力データ及び出力データと、零・非零パターンを適用した圧縮重み行列の入力データ及び出力データとを使って、最小二乗法による演算で、誤差を最小化して、圧縮重み行列の非零の箇所の重みについての適正な値を得る処理が行われる。
但し、後述する実施の形態例で説明するように、最小二乗法を適用するのは一例であり、その他の誤差を低減する演算を行って、重みを適正化した圧縮重み行列を得るようにしてもよい。最小二乗法の場合には誤差を最小化するものであるが、適用する手法によっては、結果として必ずしも誤差が最小化されない場合もある。
以下に説明する本発明の各実施の形態例は、１つ又は複数のレイヤについて、最小二乗法などを適用して、誤差の和を最小にする圧縮された重み行列の値ｗ′_ｊｋを求め、圧縮された行列として、学習済モデルの行列の演算量を削減している。 Here, if there are sufficient samples of the input x and the output y, it is possible to obtain _{the value w'jk of the compressed weight matrix by the method of least squares.}
Least squares method, 'the _(ik ² sum to a minimum, the value w of the weight of the compressed weight matrix error Σ _i ^m y ik _-y)' for each input data calculated from a plurality of input data _jk This is the method to be sought.
That is, using the input data and output data of the uncompressed matrix for the corresponding layer and the input data and output data of the compressed weight matrix to which the zero / non-zero pattern is applied, the error is calculated by the minimum square method. Processing is performed to minimize and obtain an appropriate value for the weight of the non-zero part of the compression weight matrix.
However, as described in the example of the embodiment described later, the method of applying the least squares method is an example, and other operations for reducing errors are performed to obtain a compressed weight matrix with appropriate weights. May be good. In the case of the least squares method, the error is minimized, but depending on the method to be applied, the error may not always be minimized as a result.
In each embodiment of the present invention described below, the least squares method or the like is applied to one or a plurality of layers to obtain _{the value w'jk of the compressed weight matrix that minimizes the sum of the errors.} , As a compressed matrix, the amount of calculation of the matrix of the trained model is reduced.

［第１の実施の形態例］
次に、本発明の第１の実施の形態例で、学習済モデルを圧縮する処理について、図４〜図７を参照して説明する。
図４は、本実施の形態例の情報処理装置１０の構成例を示すブロック図である。
本実施の形態例の情報処理装置１０は、推論データ入力部１１、学習済モデル入力部１２、正解データ入力部１３、入出力演算部１４、圧縮重み行列算出部１５、再学習部１６、再学習モデル出力部１７、及び圧縮率指定部１８を備える。
推論データ入力部１１は、推論処理を行うための推論データを出力する。学習済モデル入力部１２は過去に学習された学習済みモデルを出力する。また、正解データ入力部１３は推論データ入力部１１から出力される推論データに対して、正解データを出力する。 [Example of the first embodiment]
Next, in the first embodiment of the present invention, the process of compressing the trained model will be described with reference to FIGS. 4 to 7.
FIG. 4 is a block diagram showing a configuration example of the information processing device 10 of the embodiment.
The information processing device 10 of the embodiment of the present embodiment includes an inference data input unit 11, a trained model input unit 12, a correct answer data input unit 13, an input / output calculation unit 14, a compression weight matrix calculation unit 15, a relearning unit 16, and a re-learning unit 16. A learning model output unit 17 and a compression rate designation unit 18 are provided.
The inference data input unit 11 outputs inference data for performing inference processing. The trained model input unit 12 outputs the trained model learned in the past. Further, the correct answer data input unit 13 outputs correct answer data to the inference data output from the inference data input unit 11.

入出力演算部１４は、学習済モデル入力部１２から与えられた学習済モデルを、推論データ入力部１１から入力された推論データに適用して、推論処理を実行する。このときの学習済モデルは、図１で説明したように、複数のレイヤから構成される。各レイヤでは、線形な行列演算と非線形な活性化関数による演算が行われる。 The input / output calculation unit 14 applies the trained model given by the trained model input unit 12 to the inference data input from the inference data input unit 11 to execute the inference process. The trained model at this time is composed of a plurality of layers as described with reference to FIG. In each layer, a linear matrix operation and a non-linear activation function operation are performed.

正解データ入力部１３に得られる正解データは、推論データ入力部１１から入力された推論データの正解を示すデータである。例えば、学習済モデル入力部１２から与えられた学習済モデルで、画像データを解析して画像に含まれる物体を判別する解析処理を行う場合、画像データを推論データとして入出力演算部１４に与える。正解データ入力部１３に得られる正解データは、このときの推論データとしての画像に含まれる物体の正解を示すデータである。 The correct answer data obtained in the correct answer data input unit 13 is data indicating the correct answer of the inference data input from the inference data input unit 11. For example, when the trained model given from the trained model input unit 12 is used for analysis processing to analyze the image data and determine an object included in the image, the image data is given to the input / output calculation unit 14 as inference data. .. The correct answer data obtained in the correct answer data input unit 13 is data indicating the correct answer of the object included in the image as the inference data at this time.

なお、入出力演算部１４で演算に使用される推論データは、学習済モデル入力部１２から与えられた学習済モデルの学習時に利用したデータであることが望ましいが、学習時のデータと同種のデータであれば、学習時に利用したデータでなくてもよい。例えば、学習済モデルでは、日本で撮影した犬の写真を学習に使用していたとしても、アメリカで撮影された犬のデータを用いて推論データとしてもよい。 The inference data used for the calculation in the input / output calculation unit 14 is preferably the data used at the time of learning the trained model given from the trained model input unit 12, but is of the same type as the data at the time of learning. If it is data, it does not have to be the data used at the time of learning. For example, in the trained model, a picture of a dog taken in Japan may be used for learning, or data of a dog taken in the United States may be used as inference data.

入出力演算部１４は、学習済モデルを適用して推論データについての推論の演算処理を行う際に、それぞれのレイヤ（圧縮対象となるレイヤ）の入力データ及び出力データを取り出すことができる。
入出力演算部１４で取り出した圧縮対象となる１つ又は複数のレイヤの入力データ及び出力データは、圧縮重み行列算出部１５に供給される。 The input / output calculation unit 14 can take out input data and output data of each layer (layer to be compressed) when performing inference calculation processing on inference data by applying the trained model.
The input data and output data of one or more layers to be compressed taken out by the input / output calculation unit 14 are supplied to the compression weight matrix calculation unit 15.

圧縮重み行列算出部１５は、入出力演算部１４で演算された、圧縮対象レイヤの入力データと出力データ、及び一つ以上の零・非零パターンを入力として、出力データを得る。零・非零パターンは、圧縮率指定部１８に予め用意され、圧縮率指定部１８から圧縮重み行列算出部１５に供給される。圧縮率指定部１８は、ユーザ操作などで圧縮率が指定されたとき、その圧縮率に適合した一つ以上の零・非零パターンを選択し、選択された一つ以上の零・非零パターンを圧縮重み行列算出部１５に供給する処理を行う。圧縮率指定部１８での圧縮率の指定は、ユーザ操作で行われる他、自動的に行われるようにしてもよい。
圧縮重み行列算出部１５では、入力された零・非零パターンを重み行列に適用した圧縮重み行列に入力データを与えて、圧縮重み行列の出力データを得る。
なお、既に述べたように、零・非零パターンは、重み行列の内の特定の複数箇所の重み係数の値を零とし、他の箇所の重み係数の値を、最小二乗法などの演算で得られた値（非零）とするパターンである。 The compression weight matrix calculation unit 15 obtains output data by inputting the input data and output data of the compression target layer and one or more zero / non-zero patterns calculated by the input / output calculation unit 14. The zero / non-zero pattern is prepared in advance in the compression rate designation unit 18, and is supplied from the compression rate designation unit 18 to the compression weight matrix calculation unit 15. When the compression rate is specified by a user operation or the like, the compression rate specifying unit 18 selects one or more zero / non-zero patterns suitable for the compression rate, and selects one or more zero / non-zero patterns. Is supplied to the compression weight matrix calculation unit 15. The compression rate designation in the compression rate designation unit 18 may be performed automatically by the user operation.
The compression weight matrix calculation unit 15 gives input data to the compression weight matrix in which the input zero / non-zero pattern is applied to the weight matrix, and obtains output data of the compression weight matrix.
As already mentioned, in the zero / non-zero pattern, the value of the weighting coefficient at a specific plurality of points in the weighting matrix is set to zero, and the value of the weighting coefficient at other points is set by an operation such as the least squares method. It is a pattern to be the obtained value (non-zero).

零・非零パターンとしては、行列内の零の数によって、gamma＝２、gamma＝４、gamma＝８、gamma＝１６のものが用意される。gamma＝２の零・非零パターンは、（非零）／（零＋非零）の数が、１／２である。同様に、gamma＝４、gamma＝８、gamma＝１６のものは、ぞれぞれ、（非零）／（零＋非零）が１／４、１／８、１／１６である。
例えば、gamma＝４の零・非零パターンでは、重み行列の内の１／４が非零の係数値であり、残りの３／４は零である。したがって、gamma＝１６が最も圧縮率が高い零・非零パターンになる。 As the zero / non-zero pattern, gamma = 2, gamma = 4, gamma = 8, gamma = 16 are prepared depending on the number of zeros in the matrix. In the zero / non-zero pattern of gamma = 2, the number of (non-zero) / (zero + non-zero) is 1/2. Similarly, for gamma = 4, gamma = 8, and gamma = 16, (non-zero) / (zero + non-zero) are 1/4, 1/8, and 1/16, respectively.
For example, in the zero / non-zero pattern of gamma = 4, 1/4 of the weight matrix is a non-zero coefficient value, and the remaining 3/4 is zero. Therefore, gamma = 16 is a zero / non-zero pattern having the highest compression ratio.

次に、最小二乗法により最適な重み係数の値を得る解法を説明する。
例えば、複数の入力データｘ（ｘ_１，・・・，ｘ_ｎ）と出力データＹ（Ｙ_１，・・・，Ｙ_ｎ）が得られているとする。ここで、ｘ_ｉ＝（ｘ_ｉ１，・・・，ｘ_ｉｄ）∈Ｒ^ｄとする。このとき、パラメータをθ＝（θ_１，・・・，θ_ｄ）として、以下の多次元線形回帰モデルを仮定する。すなわち、パラメータθが重み行列の一つのフィルタに対応する。パラメータを求める際、θから零となる要素を予め省き、ｘからもθの要素のうち０が乗算される列は省くことができる。
Ｙ_ｉ＝ｘ_ｉ１θ_１＋ｘ_ｉ２θ_２＋・・・ｘ_ｉｄθ_ｄ＋εｉ、ｉ＝１，・・・，ｎ
この多次元線形回帰モデルを、行列を用いて表現すると、数２式に示すようになる。 Next, a solution method for obtaining the optimum weighting coefficient value by the least squares method will be described.
For example, it is assumed that a plurality of input data x (x ₁ , ..., X _n ) and output data Y (Y ₁ , ..., Y _{n) are obtained.} Here, let x _i = (x _i1 , ···, x _id ) ∈ R ^d . At this time, the following multidimensional linear regression model is assumed with the parameters θ = (θ ₁ , ···, θ _d). That is, the parameter θ corresponds to one filter in the weight matrix. When obtaining the parameter, the element from θ to zero can be omitted in advance, and the column in which 0 is multiplied by 0 from the elements of θ can be omitted from x as well.
Y _i = x _i1 θ ₁ + x _i2 θ ₂ + ... x _id θ _d + εi, i = 1, ..., n
When this multidimensional linear regression model is expressed using a matrix, it is shown in Equation 2.

よって、Ｙ＝ｘ_θ＋εとなる。
最小二乗法による推定量は、数３式に示す二乗誤差を最小にするパラメータの極値条件から求めることができる。 Therefore, Y = x _θ + ε.
The estimator by the least squares method can be obtained from the extreme value condition of the parameter that minimizes the square error shown in Equation 3.

二乗誤差をパラメータθ_ｔで微分して、極値条件を計算すると、数４式のようになる。 When the extremum condition is calculated by differentiating the squared error with the parameter θ _{t, the equation 4 is obtained.}

この式を、入力Ｘ、出力Ｙ、パラメータθの行列を用いて表現すると、数５式のようになる。 When this equation is expressed using a matrix of input X, output Y, and parameter θ, it becomes equation 5.

したがって、最小二乗法の解は、数６式のようになる。 Therefore, the solution of the least squares method is as shown in Equation 6.

なお、ここでは最小二乗法によって圧縮重み行列の最適な重みを求めるようにしたが、他の方法を用いて大局解を求めてもよい。
最小二乗法は、線形・非線形の分類・回帰問題を、出力の二乗誤差を最小化することで解こうとするものである。最小二乗法以外の手法としては、出力の絶対誤差を最小化する定式化、クロスエントロピー誤差を最小化する定式化（分類問題の場合）などの手法で、誤差を最小化することができる。
つまり、本実施の形態例で、最小二乗法を適用して、圧縮重み行列の非零の重みの最適値を求めることは、出力の誤差を測る尺度に最小二乗法を適用して、二乗誤差を尺度としたことに相当する。但し、二乗誤差を測る尺度の代わりに別の手法を適用してもよい。 Here, the optimum weight of the compression weight matrix is obtained by the least squares method, but the global solution may be obtained by using another method.
The least squares method attempts to solve a linear / non-linear classification / regression problem by minimizing the root-mean square error of the output. As a method other than the least squares method, the error can be minimized by a formulation that minimizes the absolute error of the output and a formulation that minimizes the cross entropy error (in the case of a classification problem).
That is, in the example of the present embodiment, applying the least squares method to obtain the optimum value of the non-zero weight of the compression weight matrix applies the least squares method to the scale for measuring the output error, and the square error. Is equivalent to using. However, another method may be applied instead of the scale for measuring the square error.

また、回帰問題が線形である場合（線形回帰問題）、二乗誤差の最小化問題（最小二乗法）の解法には主に、閉形式による方法と種々の勾配降下法との二通りがある。この問題には、目的関数の凸性から、一意な解が存在する。特に、閉形式解と、種々の勾配降下法によって得られる解は一致する。なお、ここで言っている種々の勾配降下法には、バッチ勾配降下法（勾配の計算に学習データ全体を一度に用いる手法）、確率勾配降下法（学習データのサンプル一つ一つに対して勾配を計算する手法）、そしてそれらの中間に属するミニバッチ勾配降下法の三つがある。 When the regression problem is linear (linear regression problem), there are mainly two methods for solving the problem of minimizing the square error (least squares method): a closed form method and various gradient descent methods. There is a unique solution to this problem due to the convexity of the objective function. In particular, the closed form solution and the solutions obtained by various gradient descent methods are in agreement. The various gradient descent methods referred to here include the batch gradient descent method (a method in which the entire training data is used at once for gradient calculation) and the stochastic gradient descent method (for each sample of training data). There are three methods to calculate the gradient) and the mini-batch gradient descent method that belongs to the middle of them.

上述の閉形式による方法と種々の勾配降下法には、それぞれの長所と短所がある。閉形式解の計算は、種々の勾配降下法と違い、ハイパーパラメタ（学習アルゴリズムのパラメタ）の探索が必要ないという利点を持つ。一方、ミニバッチ勾配降下法と確率勾配降下法には、閉形式による解法とは違い、サンプル数や特徴量の数が多いデータに対応しやすいという特徴がある。さらに、種々の勾配降下法は、線形とは限らない分類・回帰問題に対しても適用できるという点が特に重要である。機械学習（特にディープラーニング）における分類・回帰問題は、一般にサンプル数も特徴量の数も多い非線形問題であり、通常はミニバッチ勾配降下法で（局所）解が求められる。 The closed form method and various gradient descent methods described above have their own strengths and weaknesses. The calculation of the closed form solution has the advantage that it does not require the search for hyperparameters (parameters of the learning algorithm), unlike various gradient descent methods. On the other hand, the mini-batch gradient descent method and the stochastic gradient descent method are different from the closed form solution method in that they can easily handle data with a large number of samples and features. Furthermore, it is particularly important that various gradient descent methods can be applied to classification / regression problems that are not always linear. The classification / regression problem in machine learning (especially deep learning) is generally a non-linear problem with a large number of samples and features, and a (local) solution is usually obtained by the mini-batch gradient descent method.

そのため、確率勾配降下法を用いれば、非線形演算部があったとしても少数回であり、かつ、線形演算に近い演算なので、最小二乗法と同様な手法を用いることができる。例えば、活性化関数のreluやrelu6などが挙げられる。 Therefore, if the stochastic gradient descent method is used, even if there is a nonlinear calculation unit, the number of operations is small and the operation is close to a linear operation, so that the same method as the least squares method can be used. For example, the activation functions relu and relu6 can be mentioned.

ここで、確率勾配降下法によって大局解を求める例について説明する。一般に、線形回帰問題に対して、種々の勾配降下法による、反復的な解法のアプローチがある。このアプローチは、二乗誤差の最小化問題と等価な方程式である下記の数７式を陰に解くものである。 Here, an example of finding a global solution by the stochastic gradient descent method will be described. In general, there are various gradient descent approaches to iterative solutions to linear regression problems. This approach implicitly solves the following equation, which is an equation equivalent to the root-mean-squared error minimization problem.

一方、勾配降下法による反復的な解法は、反復的な修正である数９式のように解いて、大局解へ収束させるアプローチである。これは、バッチ勾配降下法とも呼ばれる。 On the other hand, the iterative solution by the gradient descent method is an approach that solves like the equation 9 which is an iterative modification and converges to a global solution. This is also called the batch gradient descent method.

確率勾配降下法は、データ行列全体Ｘをランダムにサンプルしたデータの一部に置き換える手法である。
確率勾配降下法の利点としては、サンプル数と特徴量の数の両方が大きい場合にも対応しやすい点と、非線形の回帰問題にも応用できる点がある。しかしながら、確率勾配法では学習率、反復数、バッチサイズなどのいくつかのハイパーパラメータの設定が必要である。 The stochastic gradient descent method is a method of replacing the entire data matrix X with a part of randomly sampled data.
The advantages of the stochastic gradient descent method are that it is easy to handle when both the number of samples and the number of features are large, and that it can be applied to nonlinear regression problems. However, the stochastic gradient descent method requires the setting of some hyperparameters such as learning rate, number of iterations, and batch size.

圧縮重み行列算出部１５は、このようにして、零・非零パターンを適用して圧縮した圧縮重み行列について、最適な重み係数の値を得る。
例えば、図５の左側に示すように、学習済モデルとして、レイヤＬ１、レイヤＬ２、レイヤＬ３、レイヤＬ４を有する構成を考える。その４つのレイヤＬ１〜Ｌ４の全てが圧縮対象となるレイヤであるとき、圧縮重み行列算出部１５は、各レイヤＬ１〜Ｌ４について、零・非零パターンが適用された圧縮重み行列Ｗ１′〜Ｗ４′を得る。このとき、圧縮重み行列Ｗ１′〜Ｗ４′の重み係数の値が、最小二乗法などで適正化されたものになる。 In this way, the compression weight matrix calculation unit 15 obtains the optimum weight coefficient value for the compression weight matrix compressed by applying the zero / non-zero pattern.
For example, as shown on the left side of FIG. 5, consider a configuration having layers L1, layer L2, layer L3, and layer L4 as trained models. When all of the four layers L1 to L4 are layers to be compressed, the compression weight matrix calculation unit 15 applies compression weight matrices W1'to W4 to which a zero / non-zero pattern is applied to each of the layers L1 to L4. ′ Is obtained. At this time, the values of the weighting coefficients of the compression weight matrices W1'to W4' are optimized by the least squares method or the like.

この圧縮重み行列算出部１５で得られた圧縮された圧縮重み行列Ｗ１′〜Ｗ４′のデータは、再学習部１６に送られる。
再学習部１６は、圧縮重み行列Ｗ１′〜Ｗ４′によるモデルを初期値として、推論データ入力部１１からの推論データに対して推論の演算処理を実行する。そして、再学習部１６は、圧縮重み行列Ｗ１′〜Ｗ４′によるモデルでの推論結果と、正解データ入力部１３から与えられた正解データとを比較する。そして、この比較結果から、図５の右側に示すように、再学習部１６は、圧縮重み行列Ｗ１′〜Ｗ４′を含むモデルを再学習して、再学習モデル出力部１７に再学習モデルを供給する。
なお、再学習部１６が再学習する際には、零・非零パターンが適用された圧縮重み行列については、零の位置を零に保ったままで再学習を行い、圧縮状態は再学習を行った後でも維持されるようにしてある。 The data of the compressed compression weight matrices W1'to W4' obtained by the compression weight matrix calculation unit 15 is sent to the re-learning unit 16.
The re-learning unit 16 executes inference arithmetic processing on the inference data from the inference data input unit 11 with the model based on the compression weight matrices W1'to W4' as the initial value. Then, the re-learning unit 16 compares the inference result in the model based on the compression weight matrices W1'to W4' with the correct answer data given by the correct answer data input unit 13. Then, from this comparison result, as shown on the right side of FIG. 5, the re-learning unit 16 relearns the model including the compression weight matrices W1'to W4', and the re-learning model output unit 17 receives the re-learning model. Supply.
When the re-learning unit 16 re-learns, the compression weight matrix to which the zero / non-zero pattern is applied is re-learned while keeping the zero position at zero, and the compressed state is re-learned. It is designed to be maintained even after the event.

このようにして、再学習部１６で再学習することで、モデル全体としての誤差が小さくなるように学習された再学習モデルが再学習モデル出力部１７から得られる。すなわち、圧縮重み行列算出部１５で得られた重み行列Ｌ１′〜Ｌ４′は、対象となるレイヤの線形部分に対してのみ誤差を最小化したものであるため、各レイヤの非線形部分の誤差や、モデル全体としての誤差が小さくなっていない。したがって、再学習部１６における再学習が必要になり、これにより、再学習部１６から最適な再学習モデルが得られる。 By re-learning in the re-learning unit 16 in this way, a re-learning model learned so as to reduce the error of the model as a whole is obtained from the re-learning model output unit 17. That is, since the weight matrices L1'to L4' obtained by the compression weight matrix calculation unit 15 minimize the error only with respect to the linear portion of the target layer, the error of the non-linear portion of each layer and the error , The error of the model as a whole is not small. Therefore, re-learning in the re-learning unit 16 is required, and an optimum re-learning model can be obtained from the re-learning unit 16.

図６は、図４に示す情報処理装置１０をコンピュータで構成した場合のハードウェア構成例を示す。
情報処理装置１０としてのコンピュータは、バスにそれぞれ接続されたＣＰＵ（Central Processing Unit：中央処理ユニット）１０ａと、ＲＯＭ（Read Only Memory）１０ｂと、ＲＡＭ（Random Access Memory）１０ｃを備える。さらに、情報処理装置１０は、不揮発性ストレージ１０ｄと、ネットワークインタフェース１０ｅと、入力部１０ｆと、表示部１０ｇとを備える。 FIG. 6 shows an example of hardware configuration when the information processing device 10 shown in FIG. 4 is configured by a computer.
The computer as the information processing device 10 includes a CPU (Central Processing Unit) 10a, a ROM (Read Only Memory) 10b, and a RAM (Random Access Memory) 10c, which are connected to the bus, respectively. Further, the information processing device 10 includes a non-volatile storage 10d, a network interface 10e, an input unit 10f, and a display unit 10g.

ＣＰＵ１０ａは、入出力演算部１４、圧縮重み行列算出部１５、及び再学習部１６での演算処理を実行するソフトウェアのプログラムコードをＲＯＭ１０ｂから読み出して実行する演算処理部である。ＲＡＭ１０ｃには、演算処理の途中に発生した変数やパラメータ等が一時的に書き込まれる。 The CPU 10a is an arithmetic processing unit that reads out from the ROM 10b the program code of the software that executes the arithmetic processing in the input / output arithmetic unit 14, the compression weight matrix calculation unit 15, and the relearning unit 16, and executes the arithmetic processing. Variables, parameters, etc. generated during the arithmetic processing are temporarily written in the RAM 10c.

不揮発性ストレージ１０ｄには、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）などの大容量の情報記憶部が用いられる。不揮発性ストレージ１０ｄには、正解データ、推論データ、学習済モデルなどのデータが格納される。 For the non-volatile storage 10d, for example, a large-capacity information storage unit such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) is used. Data such as correct answer data, inference data, and trained model are stored in the non-volatile storage 10d.

ネットワークインタフェース１０ｅには、例えば、ＮＩＣ（Network Interface Card）などが用いられる。
入力部１０ｆは、情報処理装置１０を操作するユーザの入力操作を受付ける。例えば、入力部１０ｆは、学習済モデルのどのレイヤを圧縮するのかといった入力操作を受付ける。但し、圧縮対象となるレイヤは、情報処理装置１０内での演算で、自動的に選択される場合もある。
表示部２０ｇには、情報処理装置１０での演算結果が表示される。 For the network interface 10e, for example, a NIC (Network Interface Card) or the like is used.
The input unit 10f receives an input operation of a user who operates the information processing device 10. For example, the input unit 10f accepts an input operation such as which layer of the trained model is to be compressed. However, the layer to be compressed may be automatically selected by the calculation in the information processing apparatus 10.
The calculation result of the information processing apparatus 10 is displayed on the display unit 20g.

なお、情報処理装置１０を図６に示すコンピュータで構成するのは一例であり、コンピュータ以外のその他の演算処理を行う装置で構成してもよい。例えば、情報処理装置１０が行う機能の一部または全部を、ＦＰＧＡ（Field Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）などのハードウェアによって実現してもよい。
また、情報処理装置１０が入力部１０ｆや表示部１０ｇを備える構成とした点についても一例であり、情報処理装置１０として、入力部１０ｆや表示部１０ｇのいずれか一方、又は双方を備えないコンピュータとして構成してもよい。 The information processing device 10 is configured by the computer shown in FIG. 6 as an example, and may be configured by a device other than the computer that performs arithmetic processing. For example, a part or all of the functions performed by the information processing apparatus 10 may be realized by hardware such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit).
Another example is that the information processing device 10 is configured to include an input unit 10f and a display unit 10g, and the information processing device 10 is a computer that does not have either one or both of the input unit 10f and the display unit 10g. It may be configured as.

図７は、情報処理装置１０が圧縮した再学習モデルを得る処理の流れを示すフローチャートである。
まず、推論データ入力部１１、学習済モデル入力部１２、正解データ入力部１３に、それぞれ推論データ、学習済モデル、正解データが用意される（ステップＳ１１）。そして、入出力演算部１４は、学習済モデルを使って、圧縮対象レイヤを含む各レイヤの演算処理を実行する（ステップＳ１２）。 FIG. 7 is a flowchart showing a flow of processing for obtaining a compressed relearning model by the information processing apparatus 10.
First, the inference data, the trained model, and the correct answer data are prepared in the inference data input unit 11, the trained model input unit 12, and the correct answer data input unit 13, respectively (step S11). Then, the input / output calculation unit 14 executes the calculation processing of each layer including the compression target layer using the trained model (step S12).

その後、入出力演算部１４は、演算した学習済モデルの内で、圧縮対象レイヤについての入力データと出力データとを取り出し、圧縮重み行列算出部１５に供給する（ステップＳ１３）。そして、入出力演算部１４は、圧縮対象レイヤが学習済モデルの全てのレイヤである場合には、全てのレイヤの入力データと出力データを圧縮重み行列算出部１５に供給する。圧縮対象レイヤがどのレイヤであるのかは、例えばユーザ操作により設定される。 After that, the input / output calculation unit 14 takes out the input data and the output data of the compression target layer from the calculated trained model and supplies them to the compression weight matrix calculation unit 15 (step S13). Then, when the compression target layer is all the layers of the trained model, the input / output calculation unit 14 supplies the input data and the output data of all the layers to the compression weight matrix calculation unit 15. Which layer is the compression target layer is set by, for example, a user operation.

圧縮重み行列算出部１５は、圧縮対象レイヤの線形重み行列について、予め用意された１つ又は複数の零・非零パターンを適用して圧縮重み行列を得ると共に、その圧縮重み行列の非零の重み係数の値を、最小二乗法により適切な値とした圧縮重み行列を得る（ステップＳ１４）。なお、適用する零・非零パターンは、圧縮率指定部１８での圧縮率の指定により決定される。 The compression weight matrix calculation unit 15 applies one or a plurality of zero / non-zero patterns prepared in advance to the linear weight matrix of the layer to be compressed to obtain a compression weight matrix, and the compression weight matrix is non-zero. A compressed weight matrix in which the value of the weighting coefficient is set to an appropriate value by the minimum square method is obtained (step S14). The zero / non-zero pattern to be applied is determined by the designation of the compression rate in the compression rate designation unit 18.

圧縮重み行列算出部１５で圧縮対象レイヤの圧縮された重み行列が得られた後、再学習部１６は、該当する圧縮された重み行列を含むモデル全体について再学習し、再学習モデルを得る（ステップＳ１５）。但し、再学習モデルを得る際には、圧縮重み行列の零の位置は零に維持された状態として、圧縮状態が維持されるようにする。 After the compressed weight matrix calculation unit 15 obtains the compressed weight matrix of the layer to be compressed, the re-learning unit 16 relearns the entire model including the corresponding compressed weight matrix to obtain a re-learning model ( Step S15). However, when the re-learning model is obtained, the zero position of the compression weight matrix is maintained at zero so that the compressed state is maintained.

なお、再学習部１６が再学習に使う圧縮重み行列は、同じ学習済みモデルから計算された圧縮重み行列でなくてもよい。
例えば、再学習部１６は、以下のループを繰り返してもよい。
１．圧縮対象のモデルを学習済みの非圧縮モデルとする。
２．i番目の層の入力データｘ、出力データｙを計算する。
３．i番目の層の圧縮重み行列を計算する。
４．i番目の層の重みを圧縮重み行列として再学習する。
５．圧縮対象モデルを４．で再学習した重みとする。
６．ｉ＝ｉ＋１として、１．から５．の処理を繰り返す。 The compression weight matrix used by the relearning unit 16 for relearning does not have to be the compression weight matrix calculated from the same trained model.
For example, the re-learning unit 16 may repeat the following loop.
1. 1. Let the model to be compressed be a trained uncompressed model.
2. The input data x and the output data y of the i-th layer are calculated.
3. 3. Compute the compression weight matrix for the i-th layer.
4. Relearn the weights of the i-th layer as a compressed weight matrix.
5. 4. The model to be compressed. Let it be the weight relearned in.
6. As i = i + 1, 1. From 5. Repeat the process of.

以上説明したように、本実施の形態例によると、１つ又は複数のレイヤについて、最小二乗法などを適用して、誤差を最小にする圧縮された重み行列を求める。そして、圧縮された行列をつかってモデルを再学習して、適切に演算処理量が圧縮された学習済モデルを取得することができる。 As described above, according to the embodiment of the present embodiment, the least squares method or the like is applied to one or a plurality of layers to obtain a compressed weight matrix that minimizes the error. Then, the model can be retrained using the compressed matrix to obtain a trained model in which the amount of arithmetic processing is appropriately compressed.

［第２の実施の形態例］
次に、本発明の第２の実施の形態例で、学習済モデルを圧縮する処理について、図８〜図９を参照して説明する。
図８は、本実施の形態例の情報処理装置２０の構成例を示すブロック図である。
本実施の形態例の情報処理装置２０は、推論データ入力部２１、学習済モデル入力部２２、正解データ入力部２３、入出力演算部２４、圧縮重み行列算出部２５、重み行列選択部２６、再学習部２７、再学習モデル出力部２８、及び圧縮率指定部２９を備える。 [Example of the second embodiment]
Next, in the second embodiment of the present invention, the process of compressing the trained model will be described with reference to FIGS. 8 to 9.
FIG. 8 is a block diagram showing a configuration example of the information processing device 20 of the embodiment.
The information processing device 20 of the embodiment of the present embodiment includes an inference data input unit 21, a trained model input unit 22, a correct answer data input unit 23, an input / output calculation unit 24, a compression weight matrix calculation unit 25, and a weight matrix selection unit 26. It includes a re-learning unit 27, a re-learning model output unit 28, and a compression rate designation unit 29.

入出力演算部２４及び圧縮重み行列算出部２５は、第１の実施の形態例で説明した入出力演算部１４及び圧縮重み行列算出部１５と同じ処理を行う。圧縮率指定部２９についても、第１の実施の形態例で説明した圧縮率指定部１８と同じ処理を行う。但し、本実施の形態例の場合には、後述する重み行列選択部２６での各層で複数の圧縮率の零・非零パターンを適用した結果を選択できるように、複数の圧縮率の零・非零パターンを選択して、圧縮重み行列算出部２５に供給するのが好ましい。
また、推論データ入力部２１、学習済モデル入力部２２、及び正解データ入力部２３から入力される推論データ、学習済モデル、正解データについても、図４に示した推論データ入力部１１、学習済モデル入力部１２、及び正解データ入力部１３から入力される推論データ、学習済モデル、正解データと同じである。 The input / output calculation unit 24 and the compression weight matrix calculation unit 25 perform the same processing as the input / output calculation unit 14 and the compression weight matrix calculation unit 15 described in the first embodiment. The compression rate designation unit 29 is also subjected to the same processing as the compression rate designation unit 18 described in the first embodiment. However, in the case of the present embodiment, a plurality of zero / non-zero compression rates can be selected so that the result of applying the zero / non-zero patterns of a plurality of compression rates in each layer in the weight matrix selection unit 26 described later can be selected. It is preferable to select a non-zero pattern and supply it to the compression weight matrix calculation unit 25.
Further, the inference data, the trained model, and the correct answer data input from the inference data input unit 21, the trained model input unit 22, and the correct answer data input unit 23 are also the inference data input unit 11 shown in FIG. It is the same as the inference data, the trained model, and the correct answer data input from the model input unit 12 and the correct answer data input unit 13.

そして、本実施の形態例においては、圧縮重み行列算出部２５で複数の零・非零パターンを使って得た出力データが、重み行列選択部２６に供給される点が第１の実施形態例とは異なっている。また、これに関連して、重み行列選択部２６には、正解データ入力部２３から正解データが供給され、さらに、重み行列選択部２６には、入出力演算部２４で演算された、対象となるレイヤの圧縮前の行列の演算結果の出力データが供給されている。但し、重み行列選択部２６で正解データを必要としない場合は、正解データ入力部２３から正解データが供給されない。 Then, in the first embodiment, the point that the output data obtained by the compression weight matrix calculation unit 25 using the plurality of zero / non-zero patterns is supplied to the weight matrix selection unit 26 is the first embodiment. Is different. Further, in connection with this, the weight matrix selection unit 26 is supplied with the correct answer data from the correct answer data input unit 23, and the weight matrix selection unit 26 is supplied with the target calculated by the input / output calculation unit 24. The output data of the calculation result of the matrix before compression of the layer is supplied. However, if the weight matrix selection unit 26 does not require the correct answer data, the correct answer data input unit 23 does not supply the correct answer data.

重み行列選択部２６は、圧縮重み行列算出部２５で算出された、圧縮重み行列の良し悪しを判断し、最適な圧縮重み行列を選択する。例えば、重み行列選択部２６は、圧縮率が異なる複数の零・非零パターンを使って得た出力データを取得し、圧縮前の同じレイヤの出力データと比較して、正解率が予め定めた閾値以上で、しかも最も圧縮率が高い零・非零パターンを使った圧縮重み行列を選択するようにする。 The weight matrix selection unit 26 determines whether the compression weight matrix is good or bad calculated by the compression weight matrix calculation unit 25, and selects the optimum compression weight matrix. For example, the weight matrix selection unit 26 acquires output data obtained by using a plurality of zero / non-zero patterns having different compression rates, compares it with the output data of the same layer before compression, and determines the correct answer rate in advance. Select a compression weight matrix that uses a zero / non-zero pattern that is above the threshold and has the highest compression ratio.

図９は、圧縮率によりレイヤごとの正解率が変動する例を示している。図９の横軸はレイヤを示し、縦軸は各レイヤ単体を圧縮したときの、モデルの正解率を示す。ここでは、正解率の値は、１に近いほど正解率が高いことを示し、０に近いほど正解率が低いことを示す。
図９に示すデータＧ０は、各レイヤを圧縮しない場合の学習済モデルの正解率である。この図９の例では、圧縮していないデータＧ０は、各レイヤでの正解率が約０．７となっている。 FIG. 9 shows an example in which the correct answer rate for each layer varies depending on the compression rate. The horizontal axis of FIG. 9 shows the layers, and the vertical axis shows the accuracy rate of the model when each layer is compressed. Here, as for the value of the correct answer rate, the closer it is to 1, the higher the correct answer rate is, and the closer it is to 0, the lower the correct answer rate is.
The data G0 shown in FIG. 9 is the correct answer rate of the trained model when each layer is not compressed. In the example of FIG. 9, the uncompressed data G0 has a correct answer rate of about 0.7 in each layer.

また、図９に示すデータＧ２、Ｇ４、Ｇ８、Ｇ１６は、それぞれgamma＝２、gamma＝４、gamma＝８、gamma＝１６の圧縮率、つまり（非零）／（零＋非零）が１／２、１／４、１／８、１／１６での正解率を示している。
例えば、閾値として正解率０．６としたとき、Ｇ０、Ｇ２、Ｇ４、Ｇ８、Ｇ１６のいずれの圧縮率を選択するかを考える。
重み行列選択部２６は、左端に示す最初のレイヤでは、圧縮率Ｇ２つまりgamma＝２の重みを選択する。 Further, the data G2, G4, G8, and G16 shown in FIG. 9 have a compression ratio of gamma = 2, gamma = 4, gamma = 8, gamma = 16, that is, (non-zero) / (zero + non-zero) of 1. The correct answer rates at / 2, 1/4, 1/8, and 1/16 are shown.
For example, when the correct answer rate is 0.6 as the threshold value, consider which compression rate of G0, G2, G4, G8, and G16 should be selected.
The weight matrix selection unit 26 selects a weight having a compression ratio of G2, that is, gamma = 2 in the first layer shown at the left end.

また、２番目のレイヤでは、正解率０．６を超えた中で、最も高い圧縮率であるデータＧ８として示すgamma＝８の零・非零パターンを使った圧縮重み行列が選択される。
このようにして、重み行列選択部２６は、閾値０．６を超えた中で、最も高い圧縮率の零・非零パターンを使った圧縮重み行列を選択するようにする。そして、全てのレイヤについて、このような選択を行うことで、圧縮重み行列を含むモデルを生成する。 Further, in the second layer, a compression weight matrix using a zero / non-zero pattern of gamma = 8 shown as data G8 having the highest compression rate among the accuracy rate exceeding 0.6 is selected.
In this way, the weight matrix selection unit 26 selects the compression weight matrix using the zero / non-zero pattern having the highest compression rate among the thresholds exceeding 0.6. Then, by making such a selection for all layers, a model including a compression weight matrix is generated.

なお、ここでは重み行列選択部２６が正解率を使って、圧縮率が高い零・非零パターンを使った圧縮重み行列を選択するようにした例を説明した。しかし、近似の度合いはモデルの正解率に限らず、後述するほかの近似度合いを表す指標を用いても良い。
また、重み行列選択部２６は、例えば、零と非零の割合が同じ圧縮重み行列であれば、近似の度合いが良い圧縮重み行列を選択するようにしている。 Here, an example has been described in which the weight matrix selection unit 26 uses the correct answer rate to select a compression weight matrix using a zero / non-zero pattern having a high compression rate. However, the degree of approximation is not limited to the accuracy rate of the model, and other indexes representing the degree of approximation, which will be described later, may be used.
Further, the weight matrix selection unit 26 selects, for example, a compression weight matrix having a good degree of approximation if the ratio of zero and non-zero is the same compression weight matrix.

ここで、近似の度合いは、例えば以下のようにして求められる。まず、入力データを圧縮重み行列算出部２５に入力し、圧縮重み行列算出部２５から出力された値と、圧縮しない場合の出力データとの誤差を得る。近似度合いを表す指標として他にも次のようなものを用いることもできる。すなわち、決定係数、相関係数、また圧縮重み行列をモデルに挿入し、推論データを入力したときの正解データとの誤差を近似の度合いとして採用することができる。 Here, the degree of approximation is obtained, for example, as follows. First, the input data is input to the compression weight matrix calculation unit 25, and an error between the value output from the compression weight matrix calculation unit 25 and the output data when not compressed is obtained. The following can also be used as an index indicating the degree of approximation. That is, the coefficient of determination, the correlation coefficient, and the compression weight matrix can be inserted into the model, and the error from the correct answer data when the inference data is input can be adopted as the degree of approximation.

また、複数の層の圧縮重み行列を比較する場合には、同程度の演算の削減回数が見込まれる場合、精度がよい圧縮重み行列を選択する。重み行列選択部２６は、これらの各項目を判断して、最も優れていると思われる圧縮重み行列を選択するのが好ましい。
優れていると思われる圧縮重み行列を選択する際には、ユーザ指示などで、どの項目を優先して選ぶのかを決めるようにしてもよい。例えば、近似の度合いを優先して選択する場合や、演算量の削除を優先して選ぶ場合など、状況に応じて判断して決めるようにしてもよい。 Further, when comparing the compression weight matrices of a plurality of layers, if the same degree of reduction in operations is expected, a compression weight matrix with good accuracy is selected. It is preferable that the weight matrix selection unit 26 determines each of these items and selects the compression weight matrix that seems to be the best.
When selecting a compression weight matrix that seems to be excellent, a user instruction or the like may be used to determine which item should be prioritized. For example, the degree of approximation may be prioritized for selection, or the deletion of the amount of calculation may be prioritized for selection, depending on the situation.

重み行列選択部２６で生成されたモデルのデータは、再学習部２７に送られる。再学習部２７では、得られたモデルを初期値として、推論データ入力部２１からの推論データに対して推論の演算処理を実行する。そして、再学習部２７は、圧縮された重み行列を含むモデルでの推論結果と、正解データ入力部２３から与えられた正解データとを比較することで、圧縮された重み行列を含むモデルを再学習し、再学習モデルを再学習モデル出力部２８に供給する。ここでの再学習モデルを得る際についても、圧縮重み行列の零の位置は零に維持された状態として、圧縮状態が維持されるようにする。 The model data generated by the weight matrix selection unit 26 is sent to the relearning unit 27. The re-learning unit 27 uses the obtained model as an initial value and executes inference arithmetic processing on the inference data from the inference data input unit 21. Then, the re-learning unit 27 re-learns the model including the compressed weight matrix by comparing the inference result in the model including the compressed weight matrix with the correct answer data given from the correct answer data input unit 23. It learns and supplies the re-learning model to the re-learning model output unit 28. Even when obtaining the re-learning model here, the zero position of the compression weight matrix is assumed to be maintained at zero so that the compressed state is maintained.

図１０は、情報処理装置２０が圧縮した再学習モデルを得る処理の流れを示すフローチャートである。
まず、推論データ入力部２１、学習済モデル入力部２２、正解データ入力部２３は、それぞれ推論データ、学習済モデル、正解データを用意する（ステップＳ２１）。そして、入出力演算部２４は、学習済モデルを使って、圧縮対象レイヤを含む各レイヤの演算処理を実行する（ステップＳ２２）。 FIG. 10 is a flowchart showing a flow of processing for obtaining a compressed relearning model by the information processing apparatus 20.
First, the inference data input unit 21, the trained model input unit 22, and the correct answer data input unit 23 prepare the inference data, the trained model, and the correct answer data, respectively (step S21). Then, the input / output calculation unit 24 executes the calculation processing of each layer including the compression target layer using the trained model (step S22).

その後、入出力演算部２４は、演算した学習済モデルの中で、圧縮対象レイヤについての入力データと出力データとを取り出し、圧縮重み行列算出部２５に供給する（ステップＳ２３）。入出力演算部２４は、圧縮対象レイヤが学習済モデルの全てのレイヤである場合には、全てのレイヤの入力データと出力データを圧縮重み行列算出部２５に供給する。圧縮対象レイヤがどのレイヤであるのかは、例えばユーザ操作により設定される。 After that, the input / output calculation unit 24 takes out the input data and the output data of the compression target layer from the calculated trained model and supplies them to the compression weight matrix calculation unit 25 (step S23). When the compression target layer is all the layers of the trained model, the input / output calculation unit 24 supplies the input data and the output data of all the layers to the compression weight matrix calculation unit 25. Which layer is the compression target layer is set by, for example, a user operation.

圧縮重み行列算出部２５は、圧縮対象レイヤの線形重み行列について、予め用意された１つ又は複数の零・非零パターンを適用した圧縮重み行列を得た上で、その圧縮重み行列の非零の重みの係数値について、最小二乗法などで適切な値を得る（ステップＳ２４）。
そして、得られた圧縮重み行列の出力データは、重み行列選択部２６に供給される。
重み行列選択部２６は、得られた演算結果の出力データと、入出力演算部２４の該当するレイヤの出力データとを比較して、両出力データの近似の度合い、適用した零・非零パターンの零の割合、演算削減量などから、与えられた条件に合致する適切な圧縮重み行列を選択する（ステップＳ２５）。 The compression weight matrix calculation unit 25 obtains a compression weight matrix to which one or a plurality of zero / non-zero patterns prepared in advance are applied to the linear weight matrix of the layer to be compressed, and then the non-zero compression weight matrix. With respect to the coefficient value of the weight of, an appropriate value is obtained by the minimum square method or the like (step S24).
Then, the output data of the obtained compressed weight matrix is supplied to the weight matrix selection unit 26.
The weight matrix selection unit 26 compares the output data of the obtained calculation result with the output data of the corresponding layer of the input / output calculation unit 24, and compares the degree of approximation of both output data and the applied zero / non-zero pattern. An appropriate compression weight matrix that matches the given conditions is selected from the ratio of zeros, the amount of calculation reduction, and the like (step S25).

重み行列選択部２６で圧縮対象レイヤの線形重み行列が選択されると、再学習部２７は、その選択された線形重み行列を含むモデル全体について再学習し、再学習モデルを得る（ステップＳ２６）。再学習モデルを得る際には、圧縮重み行列の零の位置は零に維持された状態として、圧縮状態が維持されるようにする。 When the linear weight matrix of the layer to be compressed is selected by the weight matrix selection unit 26, the relearning unit 27 relearns the entire model including the selected linear weight matrix to obtain a relearning model (step S26). .. When obtaining the re-learning model, the zero position of the compression weight matrix is assumed to be maintained at zero so that the compressed state is maintained.

以上説明したように、重み行列選択部２６で圧縮対象レイヤの線形重み行列を選択する処理を行った場合であっても、第１の実施の形態例と同様に、適切に演算処理量が圧縮された学習済モデルを取得することができる。 As described above, even when the weight matrix selection unit 26 performs the process of selecting the linear weight matrix of the layer to be compressed, the amount of arithmetic processing is appropriately compressed as in the first embodiment. You can get the trained model that has been trained.

図１１は、単に圧縮重み行列の零の位置は零に維持された状態として（線形回帰は行わずに）学習した場合と、最小二乗法などを用いて圧縮重み行列を計算した上で再学習した場合での精度を比較したものである。
図１１の横軸はモデルのエポック数（演算処理数）であり、縦軸が精度である。
最小二乗法などを用いて圧縮重み行列を計算した上で再学習した場合の学習済モデルｄ１での推論の精度の方が、いずれのエポック数でも、単に圧縮重み行列の零の位置は零に維持された状態として取得した学習済モデルｄ２より高くなってことから、本発明の処理によって演算処理量が適切に圧縮された学習済モデルが取得できることが分かる。 FIG. 11 shows a case where the zero position of the compression weight matrix is simply maintained at zero (without performing linear regression) and a case where the compression weight matrix is calculated using the least squares method and then relearned. It is a comparison of the accuracy in the case of.
The horizontal axis of FIG. 11 is the number of epochs (number of arithmetic processes) of the model, and the vertical axis is the accuracy.
The accuracy of inference in the trained model d1 when the compression weight matrix is calculated using the least square method and then relearned is more accurate than the zero position of the compression weight matrix, regardless of the number of epochs. Since it is higher than the trained model d2 acquired as the maintained state, it can be seen that the trained model in which the amount of arithmetic processing is appropriately compressed by the processing of the present invention can be acquired.

［具体的なモデルを圧縮した例］
次に、本発明を適用して、学習済モデルを圧縮した具体的な例を、図１２〜図１４を参照して説明する・
まず、図１２に示すように、画像から飛行機、車などの１０種類の物体を認識する学習済モデルを取得する。既に用意された学習済モデルを適用して左上に示す画像の認識を行ったとき、推論結果として、飛行機である確率0.90、車である確率0.01、・・・、馬である確率0.01が得られたとする。 [Example of compressed concrete model]
Next, a specific example of compressing the trained model by applying the present invention will be described with reference to FIGS. 12 to 14.
First, as shown in FIG. 12, a trained model that recognizes 10 types of objects such as an airplane and a car is acquired from an image. When the trained model already prepared is applied and the image shown in the upper left is recognized, the inference results are 0.90 for an airplane, 0.01 for a car, ..., 0.01 for a horse. Suppose.

用意された学習済モデルでは、入力層（input_1）、二次元の畳み込み層（conv_1,conv_2,conv_3）、テンソルのリシェイプ層（flatten_1）、全結合層（dense_1）などのレイヤが存在する。畳み込み層や全結合層については、重み係数が設定されている。
なお、図１２の右側の具体的な数値は、二次元の畳み込み層（conv_1）での重み行列の値の例を示している。 In the prepared trained model, there are layers such as an input layer (input_1), a two-dimensional convolution layer (conv_1, conv_2, conv_3), a tensor reshape layer (flatten_1), and a fully connected layer (dense_1). Weighting factors are set for the convolution layer and the fully connected layer.
The specific numerical values on the right side of FIG. 12 show an example of the value of the weight matrix in the two-dimensional convolution layer (conv_1).

図１３は、この図１２に示す学習済モデルの特定の層を、gamma＝２、gamma＝４、gamma＝８、gamma＝１６の圧縮率としたときのデータの例を示す。圧縮率が増えることで、零の箇所が増えることを示している。 FIG. 13 shows an example of data when the specific layer of the trained model shown in FIG. 12 has a compression ratio of gamma = 2, gamma = 4, gamma = 8, and gamma = 16. It shows that as the compression ratio increases, the number of zeros increases.

図１４は、さらに学習済モデルの特定の圧縮率の層が選択されたとき、その層が、再学習でさらに修正された例を示す。図１４はgamma＝８の重み行列を示すが、図１３に示すgamma＝８の重み行列とは別のものである。 FIG. 14 shows an example in which, when a layer having a specific compression ratio of the trained model is selected, the layer is further modified by retraining. FIG. 14 shows a weight matrix of gamma = 8, but it is different from the weight matrix of gamma = 8 shown in FIG.

図１５は、本発明の第２の実施の形態例において、重み行列選択部２６が本発明による圧縮重み行列を選択する具体例を示す図である。
図１５は、３つの層（conv_1,conv_2,conv_3）について、各層を圧縮しない非圧縮と、圧縮した場合の各圧縮率とを組み合わせた場合の合計のパラメータ削減数と、各層での平均の近似度合いと、近似度合いの最低値とを一覧で示したものである。
重み行列選択部２６での選択時には、例えば合計のパラメータ削減数が所定の値以上の場合の中で、各層で平均の近似度合いが最も高いもの、又は、近似度合いの最低値が最も高い組み合わせを選択する。 FIG. 15 is a diagram showing a specific example in which the weight matrix selection unit 26 selects the compression weight matrix according to the present invention in the second embodiment of the present invention.
FIG. 15 shows the total number of parameter reductions for the three layers (conv_1, conv_2, conv_3) when the uncompressed uncompressed layers and the compressed compressibility are combined, and an approximation of the average for each layer. The degree and the minimum value of the degree of approximation are shown in a list.
At the time of selection by the weight matrix selection unit 26, for example, when the total number of parameter reductions is equal to or greater than a predetermined value, the one with the highest average degree of approximation in each layer or the combination with the highest degree of approximation is selected. select.

あるいは、平均の近似度合いと、近似度合いの最低値との両方を用いて選択してもよい。例えば、合計のパラメータ削減数が所定以上のものののうち、近似度合いの最低値が最大のものが0.43であった場合、最低値が0.05（所定の値）から0.43の中のデータの中で、各層で平均の近似度合いが最も高い重み行列を選択することもできる。
但し、図１５に示すようなデータは、全ての層について作成する必要はない。例えば、近似度合いの最低値となっている層の零の割合をさらに上げた場合、近似度合いの最低値も減少する場合が多いため、このような計算は省くことができる。 Alternatively, both the average degree of approximation and the lowest degree of approximation may be used for selection. For example, if the total number of parameter reductions is greater than or equal to the specified value and the minimum value of the degree of approximation is 0.43, the minimum value is 0.05 (predetermined value) to 0.43. It is also possible to select the weight matrix with the highest degree of approximation of the average in each layer.
However, it is not necessary to create the data shown in FIG. 15 for all layers. For example, when the ratio of zeros in the layer having the lowest degree of approximation is further increased, the minimum value of the degree of approximation often decreases, so such a calculation can be omitted.

なお、本発明の実施形態例では、二次元の行列を用いて説明したが、実際には二次元である必要はなく、３次元以上の行列であっても本発明を適用できることは言うまでもない。
以上本発明の実施形態例について説明したが、本発明は上述した実施形態例に限定されるものではなく、特許請求の範囲に記載される事項の範囲を逸脱しない限りにおいて、その他の応用例、変形例を含むことは言うまでもない。 In the example of the embodiment of the present invention, a two-dimensional matrix has been described, but it is not necessary to be two-dimensional in practice, and it goes without saying that the present invention can be applied to a matrix of three or more dimensions.
Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and other application examples, as long as they do not deviate from the scope of the matters described in the claims. Needless to say, it includes a modified example.

１０…情報処理装置、１０ａ…ＣＰＵ、１０ｂ…ＲＡＭ、１０ｃ…ＲＯＭ、１０ｄ…不揮発性ストレージ、１０ｅ…ネットワークインタフェース、１０ｆ…入力部、１０ｇ…表示部、１１…推論データ入力部、１２…学習済モデル入力部、１３…正解データ入力部、１４…入出力演算部、１５…圧縮重み行列算出部、１６…再学習部、１７…再学習モデル出力部、１８…圧縮率指定部、２０…情報処理装置、２１…推論データ入力部、２２…学習済モデル入力部、２３…正解データ入力部、２４…入出力演算部、２５…圧縮重み行列算出部、２６…重み行列選択部、２７…再学習部、２８…再学習モデル出力部、２９…圧縮率指定部 10 ... Information processing device, 10a ... CPU, 10b ... RAM, 10c ... ROM, 10d ... Non-volatile storage, 10e ... Network interface, 10f ... Input unit, 10g ... Display unit, 11 ... Inference data input unit, 12 ... Learned Model input unit, 13 ... Correct data input unit, 14 ... Input / output calculation unit, 15 ... Compression weight matrix calculation unit, 16 ... Relearning unit, 17 ... Relearning model output unit, 18 ... Compression rate specification unit, 20 ... Information Processing device, 21 ... Inference data input unit, 22 ... Learned model input unit, 23 ... Correct answer data input unit, 24 ... Input / output calculation unit, 25 ... Compressed weight matrix calculation unit, 26 ... Weight matrix selection unit, 27 ... Re Learning unit, 28 ... Re-learning model output unit, 29 ... Compression rate specification unit

Claims

An information processing device that compresses the trained model when performing neural network operations by applying a trained model composed of a plurality of layers to the inference data to be inferred.
An input / output calculation unit that performs an operation on the inference data using the trained model and extracts input data and output data when performing a matrix operation on a specific layer to be compressed at the time of the operation.
It is obtained by calculating the input data taken out by the input / output calculation unit with a compression weight matrix in which a zero / non-zero pattern in which the element in a specific subscript of the matrix is zero is applied to the matrix of the specific layer. A compression weight matrix calculation unit that obtains a compression weight matrix with appropriate weights by performing an operation to reduce the error between the output data of the calculation result and the output data extracted by the input / output calculation unit.
The trained model in which the compression weight matrix obtained by the compression weight matrix calculation unit is applied to the specific layer is retrained using correct answer data while keeping the zero position of the compression weight matrix at zero. An information processing device equipped with a re-learning unit.

The information processing apparatus according to claim 1, wherein the operation for reducing the error performed by the compression weight matrix calculation unit is a process for obtaining a weight having the minimum error by the least squares method.

The compression weight matrix calculation unit prepares one or more patterns as the zero / non-zero pattern to be used at the time of calculation.
The output data of the calculation results of the plurality of patterns obtained by the compression weight matrix calculation unit is compared with the output data extracted by the input / output calculation unit, and any one of the compression weight matrix or the uncompressed weight is compared. It also has a weight matrix selection section that selects a matrix.
The information processing apparatus according to claim 1, wherein the re-learning unit relearns the trained model in which the weight matrix selected by the weight matrix selection unit is applied to the specific layer.

The plurality of patterns include a plurality of zero / non-zero patterns having different compression rates, which is a ratio of the number of zero elements and the number of non-zero elements included in the matrix.
The weight matrix selection unit is a compression that is a ratio of zero to an approximate state of the output data of the calculation result in the weight matrix to which the respective zero / non-zero patterns are applied and the output data extracted by the information processing operation unit. The information processing apparatus according to claim 3, wherein one of the compression weight matrices is selected by comprehensively determining the rate.

It is an information processing method that compresses the trained model when performing a neural network operation by applying a trained model composed of a plurality of layers to the inference data to be inferred.
By performing the calculation of the inference data by the trained model, a calculation procedure for extracting input data and output data when performing a matrix operation on a specific layer to be compressed at the time of the calculation, and a calculation procedure.
The input data extracted in the above calculation procedure is calculated by applying the zero / non-zero pattern in which the element in the specific subscript of the matrix is zero to the matrix of the specific layer, and the calculation result is output. A compression weight matrix calculation procedure for obtaining a compression weight matrix with appropriate weights by performing an operation to reduce the error between the data and the output data extracted in the above calculation procedure.
The trained model in which the weight matrix obtained in the compression weight matrix calculation procedure is applied to the specific layer is retrained using correct answer data while keeping the zero position of the compression weight matrix at zero. Information processing methods, including learning procedures.

One or more patterns are prepared as the zero / non-zero pattern to be used at the time of calculation in the compression weight matrix calculation procedure.
The output data of the calculation results of the plurality of patterns obtained in the compression weight matrix calculation procedure is compared with the output data extracted in the calculation procedure, and any one of the compression weight matrix or the uncompressed weight matrix is obtained. Including the weight matrix selection procedure to select
The information processing method according to claim 5, wherein the trained model in which the weight matrix selected in the weight matrix selection procedure is applied to the specific layer is relearned in the relearning procedure.

A program that causes a computer to execute information processing that compresses the trained model when a trained model composed of a plurality of layers is applied to the inference data to be inferred and a neural network operation is performed.
A calculation step of performing the calculation of the inference data by the trained model and extracting input data and output data when performing a matrix operation on a specific layer to be compressed at the time of the calculation, and
The input data extracted in the calculation step is calculated by applying the zero / non-zero pattern in which the element in the specific subscript of the matrix is zero to the matrix of the specific layer, and the calculation result is output. A compression weight matrix calculation step for obtaining a compression weight matrix with appropriate weights by performing an operation to reduce the error between the data and the output data extracted in the calculation step.
Re-learning the trained model using the correct answer data by applying the weight matrix obtained in the compression weight matrix calculation step to the specific layer while keeping the zero position of the compression weight matrix at zero. Learning steps and
A program that causes the computer to execute.

One or more patterns are prepared as the zero / non-zero pattern to be used at the time of calculation in the compression weight matrix calculation step.
As a step executed by the computer, the output data of the calculation results of the plurality of patterns obtained in the compression weight matrix calculation step is compared with the output data extracted in the calculation step, and any one of the compression weights is compared. It further includes a weight matrix selection step to select a matrix or an uncompressed weight matrix.
The program according to claim 7, wherein the trained model in which the weight matrix selected in the weight matrix selection step is applied to the specific layer is relearned in the relearning step.