JP2022054660A

JP2022054660A - Neural network weight saving device, neural network weight saving method and program

Info

Publication number: JP2022054660A
Application number: JP2020161812A
Authority: JP
Inventors: 康平山本; Kohei Yamamoto; 素子加賀谷; Motoko Kagaya
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2022-04-07
Anticipated expiration: 2040-09-28
Also published as: JP6992864B1

Abstract

To improve the accuracy of a neural network while suppressing a decrease in its processing efficiency.SOLUTION: Provided is a neural network weight saving device for: introducing a first quantization function that includes a trainable first coefficient, a second quantization function that includes a trainable second coefficient and a channel attenuation function that includes a trainable third coefficient in channel units into the correction target layer of a first neural network and generating a second neural network; training the weight parameter, the first coefficient, the second coefficient and the third coefficient of the first neural network by learning based on the second neural network; re-training the weight parameter by re-learning based on the second neural network after learning; and outputting a third neural network in which the channel attenuation function and the weight parameter of redundant channel corresponding to the third coefficient in the correction target layer after training are deleted from the second neural network after re-learning.SELECTED DRAWING: Figure 1

Description

本発明は、ニューラルネットワーク軽量化装置、ニューラルネットワーク軽量化方法およびプログラムに関する。 The present invention relates to a neural network weight reduction device, a neural network weight reduction method and a program.

近年、各種の分野においてニューラルネットワークが用いられている。例えば、物体の認識または物体の位置検出などを目的とした一般的なニューラルネットワークのモデルが知られている。かかる一般的なニューラルネットワークのモデルでは、畳み込み層または全結合層における演算に、１６～３２ビットの浮動小数点によってそれぞれ表現される特徴量および重みパラメータが使用される。 In recent years, neural networks have been used in various fields. For example, a general neural network model for the purpose of object recognition or object position detection is known. In such a general neural network model, features and weight parameters represented by 16-32 bit floating point numbers are used for operations in the convolutional layer or the fully connected layer, respectively.

一方、例えば、量子化ニューラルネットワークの形態の一つである２値化ニューラルネットワークでは、畳み込み層または全結合層における演算に使用される特徴量および重みパラメータは、それぞれ（－１と１で表現される）１ビット（すなわち２値）で表現され得る。これによって、畳み込み層または全結合層での浮動小数点演算がビット演算に置き換えられ得る。ビット演算が使用される場合には、浮動小数点演算が使用される場合よりも、低消費電力かつ高速な演算処理が実行され得るとともに、メモリ使用量も低減され得るため、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）またはモバイル端末などといった、演算リソースの限られたデバイス上でもディープラーニングモデルの効率的な処理が可能となることが知られている。 On the other hand, for example, in a binarized neural network, which is one of the forms of a quantized neural network, the feature amount and the weight parameter used for the operation in the convolution layer or the fully connected layer are represented by (-1 and 1), respectively. Can be represented by 1 bit (ie, binary). This can replace floating point operations in the convolution layer or fully connected layer with bit operations. When bit operation is used, lower power consumption and higher speed operation processing can be performed than when floating point operation is used, and memory usage can be reduced. Therefore, FPGA (Field Programmable Gate Array) can be used. ) Or mobile terminals, etc., it is known that efficient processing of deep learning models is possible even on devices with limited computing resources.

例えば、２値化ニューラルネットワークの構築方法が開示されている（例えば、非特許文献１参照）。より詳細に、かかる非特許文献１には、全ての畳み込み層または全結合層において、浮動小数点で表現される重みパラメータを符号関数により－１または１で表現される２値に変換するとともに、入力される特徴量も符号関数により－１または１で表現される２値に変換する方法が開示されている。 For example, a method for constructing a binarized neural network is disclosed (see, for example, Non-Patent Document 1). More specifically, in such Non-Patent Document 1, in all convolutional layers or fully connected layers, a weight parameter represented by a floating point number is converted into a binary value represented by -1 or 1 by a sign function and input. A method of converting a feature amount to be a binary value represented by -1 or 1 by a sign function is disclosed.

また、量子化ニューラルネットワークの構築方法が開示されている（例えば、非特許文献２参照）。より詳細に、かかる非特許文献２には、あらかじめ層ごとに、異なる量子化ビット数（ビット精度）を持つ重みパラメータと入力特徴量との組み合わせが複数定義されており、畳み込みニューラルネットワークは、各層において複数の組み合わせの中から最適な組み合わせを一つ選択するように学習される方法が開示されている。 Further, a method for constructing a quantized neural network is disclosed (see, for example, Non-Patent Document 2). More specifically, in Non-Patent Document 2, a plurality of combinations of weight parameters having different quantization bit numbers (bit precision) and input features are defined in advance for each layer, and the convolutional neural network is used for each layer. Discloses a method of learning to select one of the most suitable combinations from a plurality of combinations.

特開２０１９－２１２２０６号公報Japanese Unexamined Patent Publication No. 2019-212206

Itay Hubara、他４名、"Binarized Neural Networks"、[online]、Neural Information ProcessingSystems (2016)、［令和2年9月16日検索］、インターネット＜http://papers.nips.cc/paper/6573-binarized-neural-networks＞Itay Hubara, 4 others, "Binarized Neural Networks", [online], Neural Information Processing Systems (2016), [Search on September 16, 2nd year of Reiwa], Internet <http://papers.nips.cc/paper/ 6573-binarized-neural-networks ＞ Bichen Wu、他５名、"MixedPrecision Quantization of ConvNets via Differentiable Neural ArchitectureSearch"、[online]、2018年、［令和2年9月16日検索］、インターネット＜https://arxiv.org/abs/1812.00090＞Bichen Wu, 5 others, "MixedPrecision Quantization of ConvNets via Differentiable Neural Architecture Search", [online], 2018, [Search on September 16, 2nd year of Reiwa], Internet <https://arxiv.org/abs/1812.00090 ＞ Benoit Jacob、他７名、"Quantization and Training of Neural Networksfor Efficient Integer-Arithmetic-Only Inference"、[online]、2017年、［令和2年9月16日検索］、インターネット＜https://arxiv.org/abs/1712.05877＞Benoit Jacob, 7 others, "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", [online], 2017, [Search on September 16, 2nd year of Reiwa], Internet <https: // arxiv. org / abs / 1712.05877 ＞

しかしながら、非特許文献１に開示されている方法によれば、畳み込み層または全結合層に入力されるデータ（例えば、特徴量および重みパラメータなど）の２値化に伴って生じる誤差（量子化誤差）が大きく量子化ニューラルネットワークモデルの精度が大きく劣化してしまう可能性がある。また、非特許文献１に開示されている方法によれば、チャネル数が固定であるため、量子化ニューラルネットワークが冗長なチャネルを含んでしまう可能性がある。 However, according to the method disclosed in Non-Patent Document 1, an error (quantization error) caused by binarization of data (for example, features and weight parameters) input to the convolutional layer or the fully connected layer is obtained. ) Is large and the accuracy of the quantized neural network model may be significantly deteriorated. Further, according to the method disclosed in Non-Patent Document 1, since the number of channels is fixed, the quantized neural network may include redundant channels.

また、非特許文献２に開示されている方法によれば、量子化ビット数を層ごとに推定することが可能である一方、学習の過程において、重みパラメータと入力特徴量との複数の組み合わせそれぞれに対応する畳み込み層をメモリ上にロードし、その全ての畳み込み層に対する順伝播および逆伝播を何度も繰り返す必要がある。そのため、非特許文献２に開示されている方法によれば、収束までに多くの時間を要する上に、全部の組み合わせをあらかじめ定義するのは困難であり得る。 Further, according to the method disclosed in Non-Patent Document 2, the number of quantization bits can be estimated for each layer, while a plurality of combinations of weight parameters and input features are used in the learning process. It is necessary to load the convolution layer corresponding to the above into the memory and repeat the forward propagation and back propagation to all the convolution layers many times. Therefore, according to the method disclosed in Non-Patent Document 2, it takes a lot of time to converge, and it may be difficult to define all the combinations in advance.

そこで、あらかじめ用意すべきデータ（例えば、非特許文献２に開示されている複数の組み合わせそれぞれに対応する畳み込み層）の量を低減しつつ、処理効率の向上と精度劣化の抑制とが可能なニューラルネットワークを構築することを可能とする技術が提供されることが望まれる。 Therefore, while reducing the amount of data to be prepared in advance (for example, the convolution layer corresponding to each of the plurality of combinations disclosed in Non-Patent Document 2), the neural network can improve the processing efficiency and suppress the deterioration of accuracy. It is hoped that technology that makes it possible to build a network will be provided.

上記問題を解決するために、本発明のある観点によれば、複数の処理層を含んだ第１のニューラルネットワークを取得する入力部と、前記複数の処理層の少なくとも一つの処理層を修正対象層として特定し、前記修正対象層に対して、訓練可能な第１の係数を含んだ第１の量子化関数と、訓練可能な第２の係数を含んだ第２の量子化関数と、チャネル単位の訓練可能な第３の係数を含んだチャネル減衰関数とを導入して第２のニューラルネットワークを生成する修正部と、前記第２のニューラルネットワークに基づく学習により、前記第１のニューラルネットワークの重みパラメータと前記第１の係数と前記第２の係数と前記第３の係数とを訓練する学習部と、学習後の第２のニューラルネットワークに基づく再学習により、前記重みパラメータを再訓練する再学習部と、再学習後の第２のニューラルネットワークから前記チャネル減衰関数と前記修正対象層における訓練後の第３の係数に応じた冗長チャネルの重みパラメータとが削除された第３のニューラルネットワークを出力する出力部と、を備える、ニューラルネットワーク軽量化装置が提供される。 In order to solve the above problem, according to a certain viewpoint of the present invention, an input unit for acquiring a first neural network including a plurality of processing layers and at least one processing layer of the plurality of processing layers are to be modified. A first quantization function containing a first trainable coefficient, a second quantization function containing a trainable second coefficient, and a channel for the layer to be modified, which are specified as layers. A modification part that introduces a channel attenuation function containing a trainable third coefficient of the unit to generate a second neural network, and learning based on the second neural network of the first neural network. The weight parameter is retrained by a learning unit that trains the weight parameter, the first coefficient, the second coefficient, and the third coefficient, and retraining based on the second neural network after learning. The learning unit and the third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are deleted from the second neural network after re-learning. A neural network weight reduction device including an output unit for output is provided.

前記学習部は、前記重みパラメータを訓練する第１の学習と、前記第１の係数、前記第２の係数および前記第３の係数を訓練する第２の学習とを、片方ずつ行ってもよい。 The learning unit may perform the first learning for training the weight parameter and the second learning for training the first coefficient, the second coefficient, and the third coefficient one by one. ..

前記チャネル減衰関数は、前記第２の学習に際して、前記修正対象層への入力に対して前記第３の係数に応じた値の乗算をチャネル単位に実行する処理を含んでもよい。 The channel decay function may include a process of executing a multiplication of a value corresponding to the third coefficient with respect to the input to the correction target layer on a channel-by-channel basis in the second learning.

前記チャネル減衰関数は、前記第１の学習に際して、前記修正対象層への入力のうち、前記第３の係数に応じた値が所定の閾値を下回るチャネルに対応する入力を零にする処理を含んでもよい。 The channel decay function includes a process of zeroing an input corresponding to a channel whose value corresponding to the third coefficient is lower than a predetermined threshold value among the inputs to the correction target layer in the first learning. But it may be.

前記冗長チャネルは、訓練後の前記第３の係数に応じた値が前記所定の閾値を下回るチャネルであってもよい。 The redundant channel may be a channel in which the value corresponding to the third coefficient after training is lower than the predetermined threshold value.

前記チャネル減衰関数は、前記第２の学習に際して、前記修正対象層への入力に対して、前記第３の係数に応じた値の乗算とともに、段階的に値が小さくなる調整パラメータの乗算を実行する処理を含んでもよい。 In the second learning, the channel decay function executes multiplication of a value corresponding to the third coefficient and multiplication of an adjustment parameter whose value is gradually reduced with respect to the input to the correction target layer. It may include the processing to be performed.

前記学習部は、前記第２の学習に際して、前記調整パラメータが組み込まれた損失関数に基づいて学習を行うことにより前記調整パラメータを段階的に小さくしてもよい。 In the second learning, the learning unit may gradually reduce the adjustment parameter by performing learning based on the loss function in which the adjustment parameter is incorporated.

前記学習部は、前記第２の学習に際して、あらかじめ定められたスケジュールに従って、前記調整パラメータを段階的に小さくしてもよい。 In the second learning, the learning unit may gradually reduce the adjustment parameters according to a predetermined schedule.

前記修正部は、前記修正対象層への入力に対して前記チャネル減衰関数および前記第１の量子化関数が適用されるように、前記チャネル減衰関数および前記第１の量子化関数を導入してもよい。 The modification unit introduces the channel attenuation function and the first quantization function so that the channel attenuation function and the first quantization function are applied to the input to the modification target layer. May be good.

前記第１の量子化関数は、前記チャネル減衰関数からの出力に対して第１の正規化を行った後に前記第１の係数を乗算する処理を含んでもよい。 The first quantization function may include a process of multiplying the output from the channel decay function by the first coefficient after performing the first normalization.

前記第１の正規化は、前記チャネル減衰関数からの出力を第１の値域に収める変換を含んでもよい。 The first normalization may include a transformation that keeps the output from the channel decay function in the first range.

前記修正部は、前記修正対象層の重みパラメータに対して前記第２の量子化関数が適用されるように、前記第２の量子化関数を導入してもよい。 The modification unit may introduce the second quantization function so that the second quantization function is applied to the weight parameter of the modification target layer.

前記第２の量子化関数は、前記修正対象層の重みパラメータに対して第２の正規化を行った後に前記第２の係数を乗算する処理を含んでもよい。 The second quantization function may include a process of multiplying the weight parameter of the layer to be modified by the second coefficient after performing the second normalization.

前記第２の正規化は、前記修正対象層の重みパラメータを第２の値域に収める変換を含んでもよい。 The second normalization may include a transformation that puts the weight parameter of the layer to be modified into the second range.

前記修正対象層は、畳み込み層および全結合層の少なくともいずれか一つを含んでもよい。 The modification target layer may include at least one of a convolution layer and a fully connected layer.

前記再学習部は、前記第１の係数を訓練後の第１の係数に固定し、前記第２の係数を訓練後の第２の係数に固定し、前記第３の係数を訓練後の第３の係数に固定した状態において、前記重みパラメータを再訓練してもよい。 The re-learning unit fixes the first coefficient to the first coefficient after training, fixes the second coefficient to the second coefficient after training, and fixes the third coefficient to the second coefficient after training. The weight parameter may be retrained in a state fixed to the coefficient of 3.

また、本発明の別の観点によれば、複数の処理層を含んだ第１のニューラルネットワークを取得することと、前記複数の処理層の少なくとも一つの処理層を修正対象層として特定し、前記修正対象層に対して、訓練可能な第１の係数を含んだ第１の量子化関数と、訓練可能な第２の係数を含んだ第２の量子化関数と、チャネル単位の訓練可能な第３の係数を含んだチャネル減衰関数とを導入して第２のニューラルネットワークを生成することと、前記第２のニューラルネットワークに基づく学習により、前記第１のニューラルネットワークの重みパラメータと前記第１の係数と前記第２の係数と前記第３の係数とを訓練することと、学習後の第２のニューラルネットワークに基づく再学習により、前記重みパラメータを再訓練することと、再学習後の第２のニューラルネットワークから前記チャネル減衰関数と前記修正対象層における訓練後の第３の係数に応じた冗長チャネルの重みパラメータとが削除された第３のニューラルネットワークを出力することと、を含む、ニューラルネットワーク軽量化方法が提供される。 Further, according to another aspect of the present invention, the first neural network including the plurality of processing layers is acquired, and at least one processing layer of the plurality of processing layers is specified as the modification target layer. A first quantization function containing a trainable first coefficient, a second quantization function containing a trainable second coefficient, and a trainable first on a channel-by-channel basis for the layer to be modified. By introducing a channel attenuation function including a coefficient of 3 to generate a second neural network and learning based on the second neural network, the weight parameter of the first neural network and the first Retraining the weighting parameters by training the coefficients, the second coefficient, and the third coefficient, and retraining based on the second neural network after training, and the second after retraining. To output a third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are removed from the neural network of the neural network. A weight reduction method is provided.

また、本発明の別の観点によれば、コンピュータを、複数の処理層を含んだ第１のニューラルネットワークを取得する入力部と、前記複数の処理層の少なくとも一つの処理層を修正対象層として特定し、前記修正対象層に対して、訓練可能な第１の係数を含んだ第１の量子化関数と、訓練可能な第２の係数を含んだ第２の量子化関数と、チャネル単位の訓練可能な第３の係数を含んだチャネル減衰関数とを導入して第２のニューラルネットワークを生成する修正部と、前記第２のニューラルネットワークに基づく学習により、前記第１のニューラルネットワークの重みパラメータと前記第１の係数と前記第２の係数と前記第３の係数とを訓練する学習部と、学習後の第２のニューラルネットワークに基づく再学習により、前記重みパラメータを再訓練する再学習部と、再学習後の第２のニューラルネットワークから前記チャネル減衰関数と前記修正対象層における訓練後の第３の係数に応じた冗長チャネルの重みパラメータとが削除された第３のニューラルネットワークを出力する出力部と、を備えるニューラルネットワーク軽量化装置として機能させるプログラムが提供される。 Further, according to another aspect of the present invention, the computer has an input unit for acquiring a first neural network including a plurality of processing layers and at least one processing layer of the plurality of processing layers as a modification target layer. A first quantization function containing a first trainable coefficient, a second quantization function containing a trainable second coefficient, and a channel-by-channel unit for the layer to be modified. The weight parameter of the first neural network by the correction part which introduces the channel attenuation function including the trainable third coefficient to generate the second neural network and the learning based on the second neural network. And a learning unit that trains the first coefficient, the second coefficient, and the third coefficient, and a re-learning unit that retrains the weight parameter by re-learning based on the second neural network after learning. And outputs a third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are deleted from the second neural network after retraining. A program is provided that functions as a neural network weight reduction device including an output unit.

以上説明したように本発明によれば、あらかじめ用意すべきデータ量を低減しつつ、処理効率の向上と精度劣化の抑制とが可能なニューラルネットワークを構築することを可能とする技術が提供される。 As described above, according to the present invention, there is provided a technique capable of constructing a neural network capable of improving processing efficiency and suppressing deterioration of accuracy while reducing the amount of data to be prepared in advance. ..

本発明の実施形態に係るニューラルネットワーク軽量化装置の機能構成例を示す図である。It is a figure which shows the functional structure example of the neural network weight reduction apparatus which concerns on embodiment of this invention. 軽量化対象のニューラルネットワークの一例を示す図である。It is a figure which shows an example of the neural network which is the object of weight reduction. 第ｌ層の畳み込み層の一般的な構成例を示す図である。It is a figure which shows the general composition example of the convolution layer of the 1st layer. 修正部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the correction part. 第ｌ層の畳み込み層へのチャネル減衰関数および量子化関数の導入例を説明するための図である。It is a figure for demonstrating the introduction example of the channel decay function and the quantization function into the convolution layer of the 1st layer. 学習部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of a learning part. 係数更新の変形例について説明するための図である。It is a figure for demonstrating the modification of the coefficient update. 本発明の実施形態に係るニューラルネットワーク軽量化装置の例としての情報処理装置のハードウェア構成を示す図である。It is a figure which shows the hardware composition of the information processing apparatus as an example of the neural network weight reduction apparatus which concerns on embodiment of this invention.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the present specification and the drawings, components having substantially the same functional configuration are designated by the same reference numerals, so that duplicate description will be omitted.

また、本明細書および図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の符号の後に異なる数字を付して区別する場合がある。ただし、実質的に同一の機能構成を有する複数の構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。また、異なる実施形態の類似する構成要素については、同一の符号の後に異なるアルファベットを付して区別する場合がある。ただし、異なる実施形態の類似する構成要素等の各々を特に区別する必要がない場合、同一符号のみを付する。 Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding different numbers after the same reference numerals. However, if it is not necessary to distinguish each of a plurality of components having substantially the same functional configuration, only the same reference numerals are given. Further, similar components of different embodiments may be distinguished by adding different alphabets after the same reference numerals. However, if it is not necessary to distinguish each of the similar components of different embodiments, only the same reference numerals are given.

（１．実施形態の詳細）
続いて、本発明の実施形態の詳細について説明する。 (1. Details of the embodiment)
Subsequently, the details of the embodiment of the present invention will be described.

（１－１．構成の説明）
まず、本発明の実施形態に係るニューラルネットワーク軽量化装置の構成例について説明する。図１は、本発明の実施形態に係るニューラルネットワーク軽量化装置の機能構成例を示す図である。図１に示されるように、本発明の実施形態に係るニューラルネットワーク軽量化装置１０は、入力部１００、修正部１０１、学習部１０２、再学習部１０３および出力部１０４を備える。 (1-1. Explanation of configuration)
First, a configuration example of the neural network weight reduction device according to the embodiment of the present invention will be described. FIG. 1 is a diagram showing a functional configuration example of the neural network weight reduction device according to the embodiment of the present invention. As shown in FIG. 1, the neural network weight reduction device 10 according to the embodiment of the present invention includes an input unit 100, a correction unit 101, a learning unit 102, a re-learning unit 103, and an output unit 104.

ニューラルネットワーク軽量化装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などの演算装置を含み、図示しないメモリにより記憶されているプログラムがＣＰＵによりＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）に展開されて実行されることにより、その機能が実現され得る。このとき、当該プログラムを記録した、コンピュータに読み取り可能な記録媒体も提供され得る。あるいは、ニューラルネットワーク軽量化装置１０は、専用のハードウェアにより構成されていてもよいし、複数のハードウェアの組み合わせにより構成されてもよい。 The neural network weight reduction device 10 includes an arithmetic unit such as a CPU (Central Processing Unit), and a program stored in a memory (not shown) is expanded and executed by the CPU in a RAM (Random Access Memory). The function can be realized. At this time, a computer-readable recording medium on which the program is recorded may also be provided. Alternatively, the neural network weight reduction device 10 may be configured by dedicated hardware, or may be configured by a combination of a plurality of hardware.

（入力部１００）
入力部１００は、軽量化対象のニューラルネットワーク（第１のニューラルネットワーク）および軽量化対象のニューラルネットワークの学習に使用されるデータ（学習用データセット）を取得する。例えば、入力部１００は、軽量化対象のニューラルネットワークおよび学習用データセットを、図示しないメモリから読み出すことによって取得してもよい。例えば、軽量化対象のニューラルネットワークは、学習前のニューラルネットワークの構造（モデル構造）であってよい。 (Input unit 100)
The input unit 100 acquires the data (learning data set) used for learning the lightening target neural network (first neural network) and the lightening target neural network. For example, the input unit 100 may acquire the neural network to be reduced in weight and the training data set by reading them from a memory (not shown). For example, the neural network to be reduced in weight may be the structure (model structure) of the neural network before learning.

図２は、軽量化対象のニューラルネットワークの一例を示す図である。図２に示されるように、軽量化対象のニューラルネットワークは、第１層から第Ｎ層（Ｎは２以上の整数）までの複数の層によって構成される。第１層には、入力データが入力され、第Ｎ層からは、出力データが出力される。第１層から第Ｎ層までの各層には、処理層が含まれており、第１層から第Ｎ層までの各層の次層には、活性化関数が挿入されている。第１層から第Ｎ層までの各層に含まれる処理層は、次層に含まれる活性化関数に対して出力を行う。 FIG. 2 is a diagram showing an example of a neural network to be reduced in weight. As shown in FIG. 2, the neural network to be lightened is composed of a plurality of layers from the first layer to the Nth layer (N is an integer of 2 or more). Input data is input to the first layer, and output data is output from the Nth layer. Each layer from the first layer to the Nth layer contains a treatment layer, and an activation function is inserted in the next layer of each layer from the first layer to the Nth layer. The processing layer included in each layer from the first layer to the Nth layer outputs to the activation function included in the next layer.

図２に示された例では、第１層から第（Ｎ－１）層までの各層に含まれる処理層は、畳み込み層であり、第Ｎ層に含まれる処理層は、全結合層である。しかし、第１層から第Ｎ層までの各層に含まれる処理層の種類は、図２に示された例に限定されない。例えば、軽量化対象のニューラルネットワークは、処理層として畳み込み層および全結合層の一方を、１または複数含んでもよいし、処理層として畳み込み層および全結合層のそれぞれを、１または複数含んでもよい。また、軽量化対象のニューラルネットワークは、畳み込み層以外かつ全結合層以外の処理層を含んでもよい。 In the example shown in FIG. 2, the treated layer included in each layer from the first layer to the (N-1) layer is a convolutional layer, and the treated layer included in the Nth layer is a fully connected layer. .. However, the type of the treatment layer included in each layer from the first layer to the Nth layer is not limited to the example shown in FIG. For example, the neural network to be lightened may include one or more of the convolution layer and the fully connected layer as the processing layer, and may include one or more of each of the convolution layer and the fully connected layer as the processing layer. .. Further, the neural network to be reduced in weight may include a processing layer other than the convolutional layer and the fully connected layer.

また、図２には、軽量化対象のニューラルネットワークの第１層から第Ｎ層までの処理層によって使用される重みパラメータとして、重みパラメータｗ^１～ｗ^Ｎが示されている。本発明の実施形態では、軽量化対象のニューラルネットワークの各処理層による演算に、１６～３２ビットの浮動小数点によってそれぞれ表現される特徴量および重みパラメータが使用される場合を想定する。しかし、軽量化対象のニューラルネットワークの各処理層によって使用される特徴量および重みパラメータそれぞれの形式は、かかる例に限定されない。 Further, FIG. 2 shows weight parameters w1 to ^wN as weight parameters used by the processing layers from the first layer to the ^Nth layer of the neural network to be reduced in weight. In the embodiment of the present invention, it is assumed that the feature amount and the weight parameter represented by the floating point of 16 to 32 bits are used for the operation by each processing layer of the neural network to be lightened. However, the format of each feature quantity and weight parameter used by each processing layer of the neural network to be lightened is not limited to such an example.

図１に戻って説明を続ける。入力部１００によって取得された軽量化対象のニューラルネットワークおよび学習用データセットは、修正部１０１に出力される。 The explanation will be continued by returning to FIG. The neural network to be reduced in weight and the training data set acquired by the input unit 100 are output to the correction unit 101.

（修正部１０１）
修正部１０１は、入力部１００から入力された軽量化対象のニューラルネットワークに基づいて、軽量化対象のニューラルネットワークに含まれる少なくとも一つの処理層を修正対象層として特定する。ここでは、第１層から第（Ｎ－１）層に含まれる畳み込み層、および、第Ｎ層に含まれる全結合層の全部を、修正対象層として特定する場合を想定する。しかし、修正部１０１は、軽量化対象のニューラルネットワークに含まれる畳み込み層および全結合層の一部のみを修正対象層として特定してもよい（すなわち、修正対象層は、畳み込み層および全結合層の少なくともいずれか一つを含んでもよい）。 (Correction part 101)
The correction unit 101 specifies at least one processing layer included in the weight reduction target neural network as the correction target layer based on the lightening target neural network input from the input unit 100. Here, it is assumed that all of the convolutional layer included in the first layer to the (N-1) layer and the fully connected layer included in the Nth layer are specified as the modification target layer. However, the modification unit 101 may specify only a part of the convolution layer and the fully connected layer included in the neural network to be lightened as the modification target layer (that is, the modification target layer is the convolution layer and the fully connected layer. May include at least one of).

例えば、修正部１０１は、軽量化対象のニューラルネットワークに含まれる畳み込み層および全結合層のうち、あらかじめ定められた一部のみを修正対象層として特定してもよい。一例として、最初の畳み込み層（すなわち、第１層に含まれる畳み込み層）、および、最後の畳み込み層（すなわち、第（Ｎ－１）層に含まれる畳み込み層）は、他の層の畳み込み層よりもニューラルネットワークの精度に与える影響が大きい可能性があるため、修正対象層として特定されなくてもよい。 For example, the correction unit 101 may specify only a predetermined part of the convolution layer and the fully connected layer included in the neural network to be lightened as the correction target layer. As an example, the first convolution layer (that is, the convolution layer contained in the first layer) and the last convolution layer (that is, the convolution layer contained in the (N-1) layer) are the convolution layers of other layers. It does not have to be specified as a layer to be modified because it may have a greater effect on the accuracy of the neural network than.

修正部１０１は、修正対象層に対して（複数の修正対象層が特定された場合には、複数の修正対象層それぞれに対して）、訓練可能な係数γ１（第１の係数）を含んだ第１の量子化関数と、訓練可能な係数γ２（第２の係数）を含んだ第２の量子化関数と、チャネル単位の訓練可能な係数α（第３の係数）を含んだチャネル減衰関数とを導入する。例えば、量子化関数は、連続的な値を離散的な値に変換する関数を意味し得る。これによって、修正部１０１は、訓練対象のニューラルネットワーク（第２のニューラルネットワーク）を生成する。修正部１０１によって生成された訓練対象のニューラルネットワークおよび学習用データセットは、学習部１０２に出力される。 The correction unit 101 includes a trainable coefficient γ1 (first coefficient) for the correction target layer (for each of the plurality of correction target layers when a plurality of correction target layers are specified). A first quantization function, a second quantization function containing a trainable coefficient γ2 (second coefficient), and a channel attenuation function containing a trainable coefficient α (third coefficient) on a channel-by-channel basis. And introduce. For example, a quantization function can mean a function that transforms a continuous value into a discrete value. As a result, the correction unit 101 generates a neural network (second neural network) to be trained. The neural network to be trained and the training data set generated by the correction unit 101 are output to the learning unit 102.

（学習部１０２）
学習部１０２は、修正部１０１から入力された学習用データセットに基づいて、修正部１０１から入力された訓練対象のニューラルネットワークに基づく学習を行う。例えば、学習部１０２は、誤差逆伝播法（バックプロパゲーション）などを用いて、訓練対象のニューラルネットワークに基づく学習を行う。これによって、重みパラメータと、第１の量子化関数に含まれる係数γ１と、第２の量子化関数に含まれる係数γ２と、チャネル減衰関数に含まれる係数αとが訓練される。 (Learning unit 102)
The learning unit 102 performs learning based on the training target neural network input from the correction unit 101 based on the learning data set input from the correction unit 101. For example, the learning unit 102 performs learning based on the neural network to be trained by using an error back propagation method (backpropagation) or the like. As a result, the weight parameter, the coefficient γ1 included in the first quantization function, the coefficient γ2 included in the second quantization function, and the coefficient α included in the channel attenuation function are trained.

なお、後に詳細に説明するように、学習部１０２は、重みパラメータを訓練する第１の学習と、係数γ１、係数γ２および係数αを訓練する第２の学習とを、片方ずつ行うのが望ましい。また、重みパラメータの初期値には、乱数が使用されてよいが、軽量化対象のニューラルネットワークの訓練済みの重みパラメータがあれば、訓練済みの重みパラメータが初期値として利用されてもよい。学習部１０２による学習後の訓練対象のニューラルネットワークおよび学習用データセットは、再学習部１０３に出力される。 As will be described in detail later, it is desirable that the learning unit 102 performs the first learning for training the weight parameter and the second learning for training the coefficients γ1, the coefficient γ2, and the coefficient α one by one. .. A random number may be used as the initial value of the weight parameter, but if there is a trained weight parameter of the neural network to be lightened, the trained weight parameter may be used as the initial value. The neural network and the training data set to be trained after learning by the learning unit 102 are output to the re-learning unit 103.

（再学習部１０３）
再学習部１０３は、学習部１０２から入力された学習用データセットに基づいて、学習部１０２から入力された学習後の訓練対象のニューラルネットワークに基づく再学習を行う。例えば、再学習部１０３は、重みパラメータを初期化し、誤差逆伝播法などを用いて、学習後の訓練対象のニューラルネットワークに基づく再学習を行う。これによって、重みパラメータが再訓練される。再学習部１０３による再学習後の訓練対象のニューラルネットワークは、出力部１０４に出力される。 (Re-learning unit 103)
The re-learning unit 103 performs re-learning based on the post-learning training target neural network input from the learning unit 102 based on the learning data set input from the learning unit 102. For example, the re-learning unit 103 initializes the weight parameter and performs re-learning based on the neural network to be trained after training by using an error back-propagation method or the like. This retrains the weight parameters. The neural network to be trained after re-learning by the re-learning unit 103 is output to the output unit 104.

（出力部１０４）
出力部１０４は、再学習部１０３から入力された再学習後の訓練対象のニューラルネットワークから、チャネル減衰関数を削除するとともに、修正対象層における訓練後の係数αに応じた冗長チャネルの重みパラメータを削除して、出力対象のニューラルネットワーク（第３のニューラルネットワーク）を生成する。そして、出力部１０４は、出力対象のニューラルネットワークを出力する。なお、出力対象のニューラルネットワークは、どのように出力されてもよい。例えば、出力部１０４は、出力対象のニューラルネットワークを記録媒体に出力することによって、記録媒体に出力対象のニューラルネットワークを記録してもよい。あるいは、出力部１０４は、出力対象のニューラルネットワークを通信装置に出力することによって、通信装置を介して出力対象のニューラルネットワークを他の装置に送信してもよい。 (Output unit 104)
The output unit 104 deletes the channel attenuation function from the retrained neural network input from the relearning unit 103, and sets the weight parameter of the redundant channel according to the post-training coefficient α in the modified target layer. Delete it to generate a neural network (third neural network) to be output. Then, the output unit 104 outputs the neural network to be output. The neural network to be output may be output in any way. For example, the output unit 104 may record the output target neural network on the recording medium by outputting the output target neural network to the recording medium. Alternatively, the output unit 104 may transmit the output target neural network to another device via the communication device by outputting the output target neural network to the communication device.

（１－２．動作の説明）
続いて、本発明の実施形態に係るニューラルネットワーク軽量化装置１０の動作例について説明する。上記したように、入力部１００によって、軽量化対象のニューラルネットワーク（図２）および学習用データセットが取得される。ここでは一例として、学習用データとして２次元画像が使用される場合を想定する。このとき、軽量化対象のニューラルネットワークに含まれる第ｌ（エル）層の畳み込み層によって行われる演算は、下記の数式（１）のように示される。 (1-2. Explanation of operation)
Subsequently, an operation example of the neural network weight reduction device 10 according to the embodiment of the present invention will be described. As described above, the input unit 100 acquires the neural network (FIG. 2) to be reduced in weight and the training data set. Here, as an example, it is assumed that a two-dimensional image is used as learning data. At this time, the operation performed by the convolution layer of the l-th layer included in the neural network to be reduced in weight is shown by the following mathematical formula (1).

ここで、ｘ^ｌは、第ｌ（エル）層の畳み込み層への入力特徴量を示し、ｗ^ｌは、第ｌ層の畳み込み層によって使用される重みパラメータを示し、添え字ｉ、ｊ、ｎ、ｍは、それぞれ出力チャネル、入力チャネル、画像の幅、画像の高さを示し、ｆ（）は、活性化関数を示している。ただし、数式（１）において第（ｌ＋１）層への入力特徴量ｘ_ｉ ^ｌ＋１に対応する画像の幅と画像の高さを示す添え字は省略されている。数式（１）に示されるように、重みパラメータと入力特徴量との内積が計算された後に、活性化関数が適用される。例えば、活性化関数にはランプ関数などが利用されてよい。また、活性化関数の適用前にバッチ正規化が適用されてもよい。 Here, x ^l indicates the input feature amount to the convolutional layer of the l-th layer, and wl indicates the weight parameter used by the convolutional layer of the ^l -th layer, and the subscripts i, j, n , M indicate an output channel, an input channel, an image width, and an image height, respectively, and f () indicates an activation function. However, in the formula (1), the subscripts indicating the width and height of the image corresponding to the input feature amount x ^{il + 1} to the ( _l + 1) layer are omitted. As shown in equation (1), the activation function is applied after the inner product of the weight parameter and the input feature is calculated. For example, a ramp function or the like may be used as the activation function. Also, batch normalization may be applied before applying the activation function.

図３は、第ｌ層の畳み込み層の一般的な構成例を示す図である。図３を参照すると、第ｌ層の畳み込み層２０２が示されている。第ｌ層の畳み込み層２０２には、前層からの出力に対して活性化関数が適用されたデータが入力特徴量ｘ^ｌとして入力される。また、第ｌ層の畳み込み層２０２は、重みパラメータｗ^ｌを有している。第ｌ層の畳み込み層２０２は、入力特徴量ｘ^ｌと重みパラメータｗ^ｌとの内積を計算する。かかる計算結果は、次層に出力される。入力部１００は、軽量化対象のニューラルネットワークおよび学習用データセットを修正部１０１に出力する。 FIG. 3 is a diagram showing a general configuration example of the convolutional layer of the first layer. Referring to FIG. 3, the convolutional layer 202 of the first layer is shown. In the convolution layer 202 of the first layer, data to which the activation function is applied to the output from the previous layer is input as an input feature amount x ^l . Further, the convolutional layer 202 of the first layer has a weight parameter ^wl . The convolutional layer 202 of the first layer calculates the inner product of the input feature amount x ^l and the weight parameter w ^l . The calculation result is output to the next layer. The input unit 100 outputs the neural network to be reduced in weight and the learning data set to the correction unit 101.

図４は、修正部１０１の動作例を示すフローチャートである。修正部１０１は、入力部１００から入力された軽量化対象のニューラルネットワークに基づいて、修正対象層を特定する（Ｓ１００）。ここでは、第１層から第（Ｎ－１）層に含まれる畳み込み層、および、第Ｎ層に含まれる全結合層の全部を、修正対象層として特定する場合を想定する。修正部１０１は、修正対象層に対して、訓練可能な係数γ１を含んだ第１の量子化関数と、訓練可能な係数γ２を含んだ第２の量子化関数と、チャネル単位の訓練可能な係数αを含んだチャネル減衰関数とを導入する（Ｓ１０１）。一例として、第ｌ層の畳み込み層へのチャネル減衰関数および量子化関数の導入例について説明する。 FIG. 4 is a flowchart showing an operation example of the correction unit 101. The correction unit 101 identifies the correction target layer based on the neural network of the weight reduction target input from the input unit 100 (S100). Here, it is assumed that all of the convolutional layer included in the first layer to the (N-1) layer and the fully connected layer included in the Nth layer are specified as the modification target layer. The modification unit 101 can train the modification target layer in units of a channel, a first quantization function including a trainable coefficient γ1, a second quantization function including a trainable coefficient γ2, and a channel unit. A channel attenuation function including the coefficient α is introduced (S101). As an example, an example of introducing a channel decay function and a quantization function into the convolutional layer of the first layer will be described.

図５は、第ｌ層の畳み込み層へのチャネル減衰関数および量子化関数の導入例を説明するための図である。図５を参照すると、第ｌ層の畳み込み層２０２が示されている。また、図５を参照すると、第ｌ層の畳み込み層２０２への入力として、入力特徴量ｘ^ｌが示され、第ｌ層の畳み込み層２０２が有する重みパラメータｗ^ｌが示されている。 FIG. 5 is a diagram for explaining an example of introducing a channel decay function and a quantization function into the convolution layer of the first layer. Referring to FIG. 5, the convolutional layer 202 of the first layer is shown. Further, referring to FIG. 5, the input feature amount x ^l is shown as the input to the convolution layer 202 of the first layer, and the weight parameter ^wl of the convolution layer 202 of the first layer is shown.

図５に示されるように、修正部１０１は、第ｌ層の畳み込み層２０２への入力（入力特徴量ｘ^ｌ）に対して、チャネル減衰関数２０４および量子化関数２０５（第１の量子化関数）が適用されるように、チャネル減衰関数２０４および量子化関数２０５を導入する。チャネル減衰関数２０４は、チャネル単位の訓練可能な係数α^ｌを含んでいる。量子化関数２０５は、訓練可能な係数γ１^ｌを含んでいる。 As shown in FIG. 5, the correction unit 101 has a channel attenuation function 204 and a quantization function 205 (first quantization function) with respect to an input (input feature amount x ^l ) to the convolution layer 202 of the first layer. ) Is applied, and the channel attenuation function 204 and the quantization function 205 are introduced. The channel decay function 204 includes a trainable coefficient α ^l for each channel. The quantization function 205 includes a trainable coefficient γ ^{1 l} .

一方、修正部１０１は、第ｌ層の畳み込み層２０２の重みパラメータｗ^ｌに対して、量子化関数２０６（第２の量子化関数）が適用されるように、量子化関数２０６を導入する。量子化関数２０６は、訓練可能な係数γ２^ｌを含んでいる。 On the other hand, the correction unit 101 introduces the quantization function 206 so that the quantization function 206 (second quantization function) is applied to the weight parameter ^wl of the convolution layer 202 of the first layer. The quantization function 206 includes a trainable coefficient γ 2 ^l .

チャネル減衰関数２０４は、第ｌ層の畳み込み層２０２への入力（入力特徴量ｘ^ｌ）の各チャネルに対応する値を減衰させる関数である。後にも説明するように、重みパラメータｗ^ｌを訓練する第１の学習（以下、単に「重みパラメータ訓練」とも言う）と、係数α^ｌと係数γ１^ｌと係数γ２^ｌとを訓練する第２の学習（以下、単に「係数訓練」とも言う）とが、片方ずつ行われる。チャネル減衰関数２０４は、重みパラメータ訓練時に適用される処理と、係数訓練時に適用される処理とを含んでいる。 The channel attenuation function 204 is a function that attenuates the value corresponding to each channel of the input (input feature amount x ^l ) to the convolution layer 202 of the first layer. As will be described later, the first learning for training the weight parameter w (hereinafter, also simply referred to as “weight parameter training”) and the second training for the coefficient α ^l , the coefficient γ1 ^l , and the coefficient ^{γ2 l} ^. Learning (hereinafter, also simply referred to as "coefficient training") is performed one by one. The channel decay function 204 includes a process applied during weight parameter training and a process applied during coefficient training.

より詳細に、チャネル減衰関数２０４は、係数訓練に際して、第ｌ層の畳み込み層２０２への入力（入力特徴量ｘ^ｌ）に対して、係数α^ｌに応じた値の乗算をチャネル単位に実行する処理を含む。さらに、チャネル減衰関数２０４は、係数訓練に際して、第ｌ層の畳み込み層２０２への入力（入力特徴量ｘ^ｌ）に対して、段階的に値が小さくなる調整パラメータη^ｌの乗算を実行する処理を含む。 More specifically, the channel attenuation function 204 executes the multiplication of the input (input feature amount x ^l ) to the convolution layer 202 of the first layer by the value corresponding to the coefficient α ^l on a channel-by-channel basis during the coefficient training. Including processing. Further, the channel attenuation function 204 is a process of executing the multiplication of the adjustment parameter η ^l whose value gradually decreases with respect to the input (input feature amount x ^l ) to the convolution layer 202 of the first layer during the coefficient training. including.

例えば、入力特徴量ｘ^ｌのチャネル数がＣであるとすると、ｘ^ｌは、ｘ_ｉ ^ｌ（ｉ＝１，２，．．．，Ｃ）と表現され、係数α^ｌは、入力特徴量ｘ^ｌのチャネル数Ｃと同数の要素を持つベクトルα_ｉ ^ｌ（ｉ＝１，２，．．．，Ｃ）として表現され得る。係数α^ｌに応じた値の例としては、係数α^ｌにsoftmax関数を適用して得られる値が用いられ得る。このとき、チャネルｉに対応する係数α_ｉ ^ｌに応じた値は、softmax_ｉ（α^ｌ）と表現され得る。一例として、チャネル減衰関数２０４は、係数訓練時の処理として、下記の数式（２）のように表現される処理を含む。 For example, assuming that the number of channels of the input feature amount x ^l is C, x ^l is expressed as x _i ^l (i = 1, 2, ..., C), and the coefficient α ^l is the input feature amount x. It can be expressed as a vector α il ( _i = ¹ , 2, ..., C) having the same number of elements as the number of channels C of ^l . As an example of the value corresponding to the coefficient α ^l , the value obtained by applying the softmax function to the coefficient α ^l can be used. At this time, the value corresponding to the coefficient α ^il corresponding to the channel _{i can be expressed as softmax i} ₍ α ^l ). As an example, the channel attenuation function 204 includes a process expressed by the following mathematical formula (2) as a process at the time of coefficient training.

後にも説明するように、α^ｌが訓練されていくと、α^ｌの値はチャネル間において差が生じるようになる。より詳細には、α^ｌの値が０に近いチャネルほど、ニューラルネットワークの精度への寄与が小さいチャネルである（冗長チャネルである可能性が高い）とみなされ得る。また、後にも説明するように、調整パラメータη^ｌは、係数訓練時に、０以上の範囲内において段階的に値が小さくなる。調整パラメータη^ｌが小さくなるほど、チャネル間に生じるα^ｌの値の差が大きくなると考えられるため、冗長チャネルの特定が容易になることが期待される。 As will be explained later, as α ^l is trained, the value of α ^l will differ between channels. More specifically, a channel having a value of α ^l closer to 0 can be regarded as a channel having a smaller contribution to the accuracy of the neural network (more likely to be a redundant channel). Further, as will be described later, the value of the adjustment parameter η ^l gradually decreases within the range of 0 or more at the time of coefficient training. As the adjustment parameter η ^l becomes smaller, it is considered that the difference in the values of α ^l generated between the channels becomes larger, so that it is expected that it becomes easier to identify the redundant channel.

一方、チャネル減衰関数２０４は、重みパラメータ訓練に際して、第ｌ層の畳み込み層２０２への入力（入力特徴量ｘ^ｌ）のうち、係数α^ｌに応じた値が所定の閾値δを下回るチャネルに対応する入力（入力特徴量ｘ^ｌ）を零にする処理を含む。閾値δは、所与の非負値であってよい。一例として、チャネル減衰関数２０４は、重みパラメータ訓練時の処理として、下記の数式（３）のように表現される処理を含む。 On the other hand, the channel attenuation function 204 corresponds to a channel in which the value corresponding to the coefficient α ^l is lower than the predetermined threshold value δ among the inputs (input feature amount x ^l ) to the convolution layer 202 of the first layer during the weight parameter training. Includes a process to make the input (input feature amount x ^l ) to be zero. The threshold δ may be a given non-negative value. As an example, the channel decay function 204 includes a process expressed as the following mathematical formula (3) as a process at the time of weight parameter training.

すなわち、softmax_ｉ（α^ｌ）が閾値δを下回った場合、チャネルｉは冗長チャネルであるとみなされ、チャネルｉに対応する入力特徴量ｘ_ｉ ^ｌは、零にされる。 That is, when softmax _i (α ^l ) falls below the threshold value δ, the channel i is considered to be a redundant channel, and the input feature amount x _i ^l corresponding to the channel i is set to zero.

量子化関数２０５は、チャネル減衰関数２０４からの出力Ｘ^ｌに対して正規化（第１の正規化）を行った後に係数γ１^ｌを乗算する処理を含む。チャネル減衰関数２０４からの出力Ｘ^ｌに対する正規化は、チャネル減衰関数２０４からの出力Ｘ^ｌを所定の値域（第１の値域）に収める変換を含んでよい。ここでは、所定の値域に収める変換として、チャネル減衰関数２０４からの出力Ｘ^ｌを、出力Ｘ^ｌの絶対値の第ｌ層における全チャネルにおける最大値であるｍａｘ｜Ｘ^ｌ｜で割る演算を用いる場合を想定する。 The quantization function 205 includes a process of performing normalization (first normalization) on the output X ^l from the channel attenuation function 204 and then multiplying the coefficient γ ^{1 l} . The normalization for the output X ^l from the channel decay function 204 may include a transformation that puts the output X ^l from the channel decay function 204 into a predetermined range (first range). Here, as a conversion within a predetermined range, an operation of dividing the output X ^l from the channel decay function 204 by max | X ^l |, which is the maximum value of the absolute value of the output X ^l in all channels in the first layer, is used. Imagine a case.

一例として、チャネル減衰関数２０４からの出力Ｘ^ｌを量子化関数２０５によってｋビットの符号付き整数に量子化する場合には、量子化関数２０５は、下記の数式（４）のように表現される処理を含む。 As an example, when the output X ^l from the channel attenuation function 204 is quantized to a k-bit signed integer by the quantization function 205, the quantization function 205 is expressed by the following equation (4). Including processing.

数式（４）において、Round関数は、値を（例えば、四捨五入によって）整数に丸める関数である。β１は、２^ｋ-１／ｍａｘ（｜Ｘ^ｌ｜）の逆数（すなわち、ｍａｘ（｜Ｘ^ｌ｜／２^ｋ-１）である。つまり、数式（４）に示されるquantize関数は、値を整数に丸めた後に浮動小数点で表現されるβ１を乗じて浮動小数点に戻す形態をしている。例えば、量子化関数２０５は、学習段階においては、かかる形態を有していてよい。しかし、β１の乗算は、畳み込み層２０２による演算の後に適用されても次層への出力は変わらない。したがって、推論段階においては、β１は畳み込み層２０２の後段に移動されてもよい。これによって、畳み込み層２０２にはRound関数によって値が丸められた後の整数が入力され、畳み込み演算による負荷が軽減され得る。 In formula (4), the Round function is a function that rounds a value to an integer (eg, by rounding). β1 is the reciprocal of 2 ^k-1 / max (| X ^l |) (that is, max (| X ^l | / 2 ^k-1 ). That is, the quantize function shown in the equation (4) has a value. It has a form of rounding to an integer and then multiplying it by β1 expressed by a floating point number to return it to a floating point number. For example, the quantization function 205 may have such a form at the learning stage. The output to the next layer does not change even if the multiplication of is applied after the calculation by the convolution layer 202. Therefore, in the inference stage, β1 may be moved to the rear stage of the convolution layer 202. An integer after the value is rounded by the Round function is input to 202, and the load due to the convolution operation can be reduced.

上記した非特許文献３にも量子化についての記載があるが、このように既に開示されている量子化においては、γ１＝１（固定値）である。一方、数式（４）に示された量子化は、このように既に開示されている量子化とは異なり、quantize関数の中に訓練可能なγ１が含まれている。γ１の訓練によって最適な量子化ビット数が推定され得る。一例として、γ１＝１かつｋ＝８ビットである場合には、Round関数が適用された後の最大値は、２^７－１となる。一方、γ１＝２^－４かつｋ＝８ビットである場合には、Round関数が適用された後の最大値は、２^３となり、Round関数が適用された後の値は、４ビットで表現可能となる。 Although there is a description about quantization in the above-mentioned Non-Patent Document 3, in the quantization already disclosed as described above, γ1 = 1 (fixed value). On the other hand, in the quantization shown in the equation (4), unlike the quantization already disclosed as described above, the trainable γ1 is included in the quantize function. The optimum number of quantization bits can be estimated by training γ1. As an example, when γ1 = 1 and k = ⁸ bits, the maximum value after the Round function is applied is 27-1. On the other hand, when γ1 = ² ^-4 and k = 8 bits, the maximum value after the Round function is applied is 23, and the value after the Round function is applied can be expressed by 4 bits. Will be.

なお、上記では、量子化関数２０５によってｋビットの符号付き整数への量子化が行われる場合を想定した。しかし、量子化関数２０５は、ｋビットの符号なし整数への量子化を行ってもよい。かかる場合には、数式（４）における２^ｋ-１は、２^ｋに置き換えられればよい。 In the above, it is assumed that the quantization function 205 is used to quantize the k-bit into a signed integer. However, the quantization function 205 may quantize k-bits into unsigned integers. In such a case, 2 ^k-1 in the mathematical formula (4) may be replaced with 2 ^k .

量子化関数２０６は、重みパラメータｗ^ｌに対して正規化（第２の正規化）を行った後に係数γ２^ｌを乗算する処理を含む。重みパラメータｗ^ｌに対する正規化は、重みパラメータｗ^ｌを所定の値域（第２の値域）に収める変換を含んでよい。ここでは、所定の値域に収める変換として、重みパラメータｗ^ｌを、ｗ^ｌの絶対値の第ｌ層における全チャネルにおける最大値であるｍａｘ｜ｗ^ｌ｜で割る演算を用いる場合を想定する。 The quantization function 206 includes a process of performing normalization (second normalization) on the weight parameter w ^l and then multiplying the coefficient γ 2 ^l . The normalization for the weight parameter ^wl may include a transformation that puts the weight parameter ^wl within a predetermined range (second range). Here, it is assumed that the weight parameter ^wl is divided by max | ^wl |, which is the maximum value of all channels in the first layer of the absolute value of ^wl , as a conversion to be within a predetermined range.

一例として、重みパラメータｗ^ｌを量子化関数２０６によってｋビットの符号付き整数に量子化する場合には、量子化関数２０６は、下記の数式（５）のように表現される処理を含む。 As an example, when the weight parameter ^wl is quantized to a k-bit signed integer by the quantization function 206, the quantization function 206 includes a process expressed as the following equation (5).

数式（５）において、Round関数は、数式（４）に示されたRound関数と同様の特徴を有する。β２は、２^ｋ-１／ｍａｘ（｜ｗ^ｌ｜）の逆数（すなわち、ｍａｘ（｜ｗ^ｌ｜／２^ｋ-１）である。数式（５）に示されるquantize関数も、数式（４）に示されたquantize関数と同様に、値を整数に丸めた後に浮動小数点で表現されるβ２を乗じて浮動小数点に戻す形態をしている。例えば、量子化関数２０６は、学習段階においては、かかる形態を有していてよい。また、推論段階においては、β２は畳み込み層２０２の後段に移動されてもよい。 In the formula (5), the Round function has the same characteristics as the Round function shown in the formula (4). β2 is the reciprocal of 2 ^k-1 / max (| ^wl |) (that is, max (| wl | / 2 ^k-1 ⁾ . The quantize function shown in the formula (5) is also the formula (4). Similar to the quantize function shown in, the value is rounded to an integer and then multiplied by β2 expressed by the floating point to return to the floating point. For example, the quantization function 206 is used in the learning stage. It may have such a form. Further, in the reasoning stage, β2 may be moved to the subsequent stage of the folding layer 202.

なお、量子化関数２０６は、量子化関数２０５と同様に、ｋビットの符号なし整数への量子化を行ってもよい。かかる場合には、数式（５）における２^ｋ-１は、２^ｋに置き換えられればよい。 The quantization function 206 may be quantized to an unsigned integer of k bits in the same manner as the quantization function 205. In such a case, 2 ^k-1 in the mathematical formula (5) may be replaced with 2 ^k .

図４に戻って説明を続ける。修正部１０１は、チャネル減衰関数２０４と量子化関数２０５と量子化関数２０６とを導入していない修正対象層が存在する場合には（Ｓ１０２において「ＮＯ」）、まだチャネル減衰関数２０４と量子化関数２０５と量子化関数２０６とを導入していない修正対象層に対してＳ１０１を実行する。一方、修正部１０１は、全部の修正対象層に対して、チャネル減衰関数２０４と量子化関数２０５と量子化関数２０６とを導入し終わった場合には（Ｓ１０２において「ＹＥＳ」）、修正を終了する。 The explanation will be continued by returning to FIG. If there is a layer to be modified that does not introduce the channel attenuation function 204, the quantization function 205, and the quantization function 206 (“NO” in S102), the modification unit 101 still quantizes with the channel attenuation function 204. S101 is executed for the modification target layer in which the function 205 and the quantization function 206 are not introduced. On the other hand, when the correction unit 101 finishes introducing the channel decay function 204, the quantization function 205, and the quantization function 206 for all the correction target layers (“YES” in S102), the correction unit 101 ends the correction. do.

図１に戻って説明を続ける。修正部１０１は、チャネル減衰関数２０４と量子化関数２０５と量子化関数２０６との導入によって生成した訓練対象のニューラルネットワークおよび学習用データセットを、学習部１０２に出力する。学習部１０２は、上記したように、修正部１０１から入力された学習用データセットに基づいて、修正部１０１から入力された訓練対象のニューラルネットワークに基づく学習を行う。これによって、重みパラメータｗと係数αと係数γ１と係数γ２とが訓練される。 The explanation will be continued by returning to FIG. The correction unit 101 outputs the neural network to be trained and the training data set generated by the introduction of the channel attenuation function 204, the quantization function 205, and the quantization function 206 to the learning unit 102. As described above, the learning unit 102 performs learning based on the training target neural network input from the correction unit 101 based on the learning data set input from the correction unit 101. As a result, the weight parameter w, the coefficient α, the coefficient γ1 and the coefficient γ2 are trained.

図６は、学習部１０２の動作例を示すフローチャートである。上記したように、学習部１０２は、重みパラメータ訓練と係数訓練とを、片方ずつ行う。まず、学習部１０２は、訓練対象のニューラルネットワークの重みパラメータｗを初期化し（Ｓ１１０）、重みパラメータ訓練を行う。より詳細に、学習部１０２は、係数αと係数γ１と係数γ２とを固定した状態において、損失関数に基づく誤差逆伝播法（例えば、誤差逆伝播法に基づく確率的勾配降下法）によって、重みパラメータｗを更新する（Ｓ１１１）。重みパラメータ訓練においては、冗長チャネルに対応する入力特徴量が零にされる（数式（３））。 FIG. 6 is a flowchart showing an operation example of the learning unit 102. As described above, the learning unit 102 performs weight parameter training and coefficient training one by one. First, the learning unit 102 initializes the weight parameter w of the neural network to be trained (S110), and performs weight parameter training. More specifically, the learning unit 102 is weighted by an error backpropagation method based on the loss function (for example, a stochastic gradient descent method based on the error backpropagation method) in a state where the coefficient α, the coefficient γ1 and the coefficient γ2 are fixed. The parameter w is updated (S111). In the weight parameter training, the input features corresponding to the redundant channels are set to zero (formula (3)).

なお、本発明の実施形態において用いられる損失関数は特定の関数に限定されず、一般的なニューラルネットワークにおいて用いられる損失関数と同様の損失関数が用いられてよい。例えば、学習部１０２は、学習用データセットに基づいて、訓練対象のニューラルネットワークからの出力値と正解値との差分を算出し、当該差分に基づく平均二乗誤差を損失関数として算出してもよい。 The loss function used in the embodiment of the present invention is not limited to a specific function, and a loss function similar to the loss function used in a general neural network may be used. For example, the learning unit 102 may calculate the difference between the output value from the neural network to be trained and the correct answer value based on the training data set, and calculate the mean square error based on the difference as a loss function. ..

続いて、学習部１０２は、重みパラメータｗの更新回数が所定の回数に達したか否かを判定する（Ｓ１１２）。例えば、重みパラメータｗの更新回数は、イテレーション数であってもよく、所定の回数は、イテレーション数の閾値（例えば、５回など）であってもよい。学習部１０２は、重みパラメータｗの更新回数が所定の回数に達していないと判定した場合には（Ｓ１１２において「ＮＯ」）、Ｓ１１１に戻る。 Subsequently, the learning unit 102 determines whether or not the number of updates of the weight parameter w has reached a predetermined number of times (S112). For example, the number of updates of the weight parameter w may be the number of iterations, and the predetermined number of times may be the threshold value of the number of iterations (for example, 5 times). When the learning unit 102 determines that the number of updates of the weight parameter w has not reached a predetermined number of times (“NO” in S112), the learning unit 102 returns to S111.

一方、学習部１０２は、重みパラメータｗの更新回数が所定の回数に達したと判定した場合には（Ｓ１１２において「ＹＥＳ」）、係数訓練を行う。より詳細に、学習部１０２は、重みパラメータｗを固定した状態において、正則化項が付与された損失関数に基づく誤差逆伝播法（例えば、誤差逆伝播法に基づく確率的勾配降下法）によって、係数αと係数γ１と係数γ２とを更新する（Ｓ１１３）。例えば、正則化項が付与された損失関数は、以下の数式（６）のように表現され得る。 On the other hand, when it is determined that the number of updates of the weight parameter w has reached a predetermined number of times (“YES” in S112), the learning unit 102 performs coefficient training. More specifically, the learning unit 102 uses an error backpropagation method based on a loss function to which a regularization term is added (for example, a stochastic gradient descent method based on the error backpropagation method) in a state where the weight parameter w is fixed. The coefficient α, the coefficient γ1, and the coefficient γ2 are updated (S113). For example, the loss function to which the regularization term is given can be expressed by the following mathematical formula (6).

数式（６）において、第１項である損失関数Ｌは、重みパラメータ訓練の損失関数と同様に限定されない。第２項、第３項および第４項それぞれは、正則化項である。λ_１、λ_２およびλ_３は、正則化の強度を決める係数であり、所与の非負値であってよい。第２項には、調整パラメータη^ｌのＬ１ノルムの修正対象層全体における総和が含まれている。学習部１０２は、このように調整パラメータηが付与された損失関数に基づいて学習を行うことにより調整パラメータηを段階的に小さくすることが可能である。 In the equation (6), the loss function L, which is the first term, is not limited in the same manner as the loss function of the weight parameter training. Each of the second, third and fourth terms is a regularization term. λ ₁ , λ ₂ and λ ₃ are coefficients that determine the strength of the regularization and may be given non-negative values. The second term contains the sum of the L1 norms of the adjustment parameter η ^l over the entire layer to be modified. The learning unit 102 can gradually reduce the adjustment parameter η by performing learning based on the loss function to which the adjustment parameter η is given.

しかしながら、調整パラメータηを段階的に小さくする手法は、かかる例に限定されない。例えば、学習部１０２は、係数訓練に際して、あらかじめ定められたスケジュールに従って、調整パラメータηを段階的に小さくしてもよい。一例として、学習部１０２は、所定の回数のイテレーションごとに所定の幅だけ調整パラメータηを小さくしてもよい（例えば、１イテレーションごとに０．００１だけ調整パラメータηを小さくしてもよい）。なお、上記したように、調整パラメータηを段階的に小さくすることは、冗長チャネルの特定に役立つことが期待される。 However, the method of gradually reducing the adjustment parameter η is not limited to such an example. For example, the learning unit 102 may gradually reduce the adjustment parameter η according to a predetermined schedule during the coefficient training. As an example, the learning unit 102 may reduce the adjustment parameter η by a predetermined width for each iteration a predetermined number of times (for example, the adjustment parameter η may be reduced by 0.001 for each iteration). As described above, it is expected that gradually reducing the adjustment parameter η will help identify the redundant channel.

第３項には、量子化関数２０５に含まれる係数γ１^ｌのＬ１ノルムの修正対象層全体における総和が含まれている。すなわち、第３項は、量子化関数２０５に含まれる係数γ１に関する制約項である。同様に、第４項には、量子化関数２０６に含まれる係数γ２^ｌのＬ１ノルムの修正対象層全体における総和が含まれている。すなわち、第４項は、量子化関数２０６に含まれる係数γ２に関する制約項である。 The third term includes the sum of the L1 norms of the coefficient γ1 ^l included in the quantization function 205 in the entire correction target layer. That is, the third term is a constraint term regarding the coefficient γ1 included in the quantization function 205. Similarly, the fourth term contains the sum of the L1 norms of the coefficient γ2 ^l included in the quantization function 206 in the entire correction target layer. That is, the fourth term is a constraint term regarding the coefficient γ2 included in the quantization function 206.

損失関数Ｌは、量子化ビット数が多いほど小さくなると考えられる。したがって、単純に損失関数Ｌに基づいて、係数γ１と係数γ２とを更新すると、係数γ１と係数γ２とが大きくなってしまい、量子化ビット数が抑えられなくなってしまうと考えられる。しかし、このような制約項が損失関数Ｌに付与されることによって、ニューラルネットワークの精度劣化を抑制するだけではなく、必要な程度に抑制された量子化ビット数を推定することが可能となる。 It is considered that the loss function L becomes smaller as the number of quantization bits increases. Therefore, if the coefficient γ1 and the coefficient γ2 are simply updated based on the loss function L, the coefficient γ1 and the coefficient γ2 become large, and it is considered that the number of quantization bits cannot be suppressed. However, by applying such a constraint term to the loss function L, it is possible not only to suppress the deterioration of the accuracy of the neural network but also to estimate the number of quantization bits suppressed to a necessary degree.

このように、学習部１０２による係数訓練では、チャネル数（すなわち、冗長チャネル以外のチャネル数）と量子化ビット数とが同時に推定され得る。したがって、チャネル数と量子化ビット数との間に存在するトレードオフ関係を考慮しながら、チャネル数および量子化ビット数の最適解が求められ得る。これによって、チャネル数および量子化ビット数それぞれが独立に推定される場合（例えば、チャネル削減後のモデルに対して量子化が行われる場合、または、量子化済みのモデルに対してチャネル削減が行われる場合など）よりも、精度劣化を抑制しつつ、処理効率の低下も抑制したニューラルネットワークを構築することが期待され得る。 As described above, in the coefficient training by the learning unit 102, the number of channels (that is, the number of channels other than redundant channels) and the number of quantization bits can be estimated at the same time. Therefore, the optimum solution for the number of channels and the number of quantization bits can be obtained while considering the trade-off relationship existing between the number of channels and the number of quantization bits. As a result, when the number of channels and the number of quantization bits are estimated independently (for example, when quantization is performed on the model after channel reduction, or channel reduction is performed on the model that has been quantized. It can be expected to construct a neural network that suppresses the deterioration of processing efficiency while suppressing the deterioration of accuracy.

続いて、学習部１０２は、係数γ１、係数γ２および係数αの更新回数が所定の回数に達したか否かを判定する（Ｓ１１４）。例えば、係数γ１、係数γ２および係数αの更新回数は、イテレーション数であってもよく、所定の回数は、イテレーション数の閾値（例えば、３回など）であってもよい。学習部１０２は、係数γ１、係数γ２および係数αの更新回数が所定の回数に達していないと判定した場合には（Ｓ１１４において「ＮＯ」）、Ｓ１１３に戻る。 Subsequently, the learning unit 102 determines whether or not the number of updates of the coefficient γ1, the coefficient γ2, and the coefficient α has reached a predetermined number of times (S114). For example, the number of updates of the coefficient γ1, the coefficient γ2, and the coefficient α may be the number of iterations, and the predetermined number of times may be the threshold value of the number of iterations (for example, 3 times). When the learning unit 102 determines that the number of updates of the coefficient γ1, the coefficient γ2, and the coefficient α has not reached a predetermined number of times (“NO” in S114), the learning unit 102 returns to S113.

一方、学習部１０２は、重みパラメータｗの更新回数が所定の回数に達したと判定した場合には（Ｓ１１４において「ＹＥＳ」）、正則化項が付与された損失関数が収束したか否かを判定する（Ｓ１１５）。学習部１０２は、正則化項が付与された損失関数が収束していないと判定した場合には（Ｓ１１５において「ＮＯ」）、Ｓ１１１に戻る。一方、学習部１０２は、正則化項が付与された損失関数が収束したと判定した場合には（Ｓ１１５において「ＹＥＳ」）、訓練対象のニューラルネットワークの訓練を終了する。例えば、正則化項が付与された損失関数またはその変化が閾値よりも小さくなった場合に、正則化項が付与された損失関数が収束したと判定されてもよい。 On the other hand, when the learning unit 102 determines that the number of updates of the weight parameter w has reached a predetermined number (“YES” in S114), the learning unit 102 determines whether or not the loss function to which the regularization term is given has converged. Judgment (S115). When the learning unit 102 determines that the loss function to which the regularization term is added has not converged (“NO” in S115), the learning unit 102 returns to S111. On the other hand, when it is determined that the loss function to which the regularization term is added has converged (“YES” in S115), the learning unit 102 ends the training of the neural network to be trained. For example, when the loss function with the regularization term or its change becomes smaller than the threshold value, it may be determined that the loss function with the regularization term has converged.

学習後の訓練対象のニューラルネットワークおよび学習用データセットは、再学習部１０３に出力される。 The neural network to be trained and the data set for training after training are output to the re-learning unit 103.

再学習部１０３は、学習部１０２から入力された学習用データセットに基づいて、学習部１０２から入力された学習後の訓練対象のニューラルネットワークに基づく再学習を行う。より詳細に、再学習部１０３は、重みパラメータｗを初期化し、係数γ１を学習部１０２による訓練後の係数γ１に固定し、係数γ２を学習部１０２による訓練後の係数γ２に固定し、係数αを学習部１０２による訓練後の係数αに固定した状態において、損失関数に基づく誤差逆伝播法（例えば、誤差逆伝播法に基づく確率的勾配降下法）によって、重みパラメータｗを更新する。これによって、チャネル数および量子化ビット数が特定された状態における最適な重みパラメータｗが獲得され、ニューラルネットワークの精度の更なる向上が期待され得る。 The re-learning unit 103 performs re-learning based on the post-learning training target neural network input from the learning unit 102 based on the learning data set input from the learning unit 102. More specifically, the re-learning unit 103 initializes the weight parameter w, fixes the coefficient γ1 to the coefficient γ1 after training by the learning unit 102, and fixes the coefficient γ2 to the coefficient γ2 after training by the learning unit 102. In a state where α is fixed to the coefficient α after training by the learning unit 102, the weight parameter w is updated by an error back propagation method based on the loss function (for example, a stochastic gradient descent method based on the error back propagation method). As a result, the optimum weight parameter w in the state where the number of channels and the number of quantization bits are specified is acquired, and further improvement in the accuracy of the neural network can be expected.

再学習部１０３による再学習後の訓練対象のニューラルネットワークは、出力部１０４に出力される。 The neural network to be trained after re-learning by the re-learning unit 103 is output to the output unit 104.

出力部１０４は、再学習部１０３から入力された再学習後の訓練対象のニューラルネットワークから、修正対象層に対して導入されたチャネル減衰関数２０４を削除するとともに、修正対象層における訓練後の係数αに応じた冗長チャネルの重みパラメータを削除する。これによって、出力対象のニューラルネットワークが生成される。冗長チャネルは、訓練後の係数αに応じた値が閾値δを下回るチャネルであってよい。例えば、訓練後のsoftmax_ｉ（α^ｌ）が閾値δを下回る場合には、第ｌ層においてチャネルｉが冗長チャネルであるとみなされ、第ｌ層からチャネルｉの重みパラメータｗ_ｉ ^ｌが削除される。 The output unit 104 deletes the channel attenuation function 204 introduced for the modification target layer from the retraining target neural network input from the relearning unit 103, and the output unit 104 deletes the training coefficient in the modification target layer. Delete the weight parameter of the redundant channel according to α. As a result, the neural network to be output is generated. The redundant channel may be a channel whose value corresponding to the coefficient α after training is below the threshold value δ. For example, if the softmax _i (α ^l ) after training is below the threshold value δ, the channel i is considered to be a redundant channel in the first layer, and the weight parameter ^il of the channel _i is deleted from the first layer. To.

なお、出力対象のニューラルネットワークは、かかる例に限定されず、各種の変形が施されてもよい。例えば、出力部１０４は、訓練後の係数γ１^ｌと初期値として設定された量子化ビット数ｋとを統合してもよい（例えば、数式（４）においてγ１^ｌ＝２^－４かつｋ＝８である場合、（２^－４）ｘ（２^８－１）－１は、２^３に統合されてもよい）。同様に、出力部１０４は、訓練後の係数γ２^ｌとｋとを統合してもよい。 The neural network to be output is not limited to this example, and various modifications may be applied. For example, the output unit 104 may integrate the coefficient γ1 ^l after training and the number of quantization bits k set as the initial value (for example, in equation (4), γ1 ^l = 2 ^-4 and k = 8). If, (2 ^-4 ) x ( ² ^8-1 ) -1 may be integrated into 23). Similarly, the output unit 104 may integrate the post-training coefficients γ2 ^l and k.

さらに、上記したように、出力部１０４は、学習段階において量子化関数２０５に含まれていたβ１を畳み込み層２０２の後段に移動させてもよい。これによって、推論段階においては畳み込み層２０２に浮動小数点で表現されるβ１が含まれなくなるため、畳み込み層２０２による畳み込み演算の負荷が軽減され得る。同様に、出力部１０４は、学習段階において量子化関数２０６に含まれていたβ２を畳み込み層２０２の後段に移動させてもよい。出力部１０４は、このようにして生成した出力対象のニューラルネットワークを出力する。 Further, as described above, the output unit 104 may move β1 included in the quantization function 205 in the learning stage to the subsequent stage of the convolution layer 202. As a result, in the inference stage, the convolution layer 202 does not include β1 represented by a floating point number, so that the load of the convolution operation by the convolution layer 202 can be reduced. Similarly, the output unit 104 may move β2 included in the quantization function 206 in the learning stage to the subsequent stage of the convolution layer 202. The output unit 104 outputs the neural network to be output generated in this way.

（１－３．効果の説明）
本発明の実施形態によれば、入力部１００と、修正部１０１と、学習部１０２と、再学習部１０３と、出力部１０４とを備える、ニューラルネットワーク軽量化装置１０が提供される。入力部１００は、複数の処理層を含んだ軽量化対象のニューラルネットワークを取得する。そして、修正部１０１は、複数の処理層の少なくとも一つの処理層を修正対象層として特定し、修正対象層に対して、訓練可能な係数γ１を含んだ量子化関数２０５と、訓練可能な係数γ２を含んだ量子化関数２０６と、チャネル単位の訓練可能な係数αを含んだチャネル減衰関数２０４とを導入して訓練対象のニューラルネットワークを生成する。 (1-3. Explanation of the effect)
According to an embodiment of the present invention, there is provided a neural network weight reduction device 10 including an input unit 100, a correction unit 101, a learning unit 102, a re-learning unit 103, and an output unit 104. The input unit 100 acquires a neural network to be reduced in weight including a plurality of processing layers. Then, the correction unit 101 identifies at least one processing layer of the plurality of processing layers as the correction target layer, and for the correction target layer, a quantization function 205 including a trainable coefficient γ1 and a trainable coefficient. A quantization function 206 including γ2 and a channel attenuation function 204 including a trainable coefficient α for each channel are introduced to generate a neural network to be trained.

学習部１０２は、訓練対象のニューラルネットワークに基づく学習により、軽量化対象のニューラルネットワークの重みパラメータｗと係数γ１と係数γ２と係数αとを訓練する。再学習部１０３は、学習後の訓練対象のニューラルネットワークに基づく再学習により、重みパラメータｗを再訓練する。出力部１０４は、再学習後の訓練対象のニューラルネットワークからチャネル減衰関数２０４と修正対象層における訓練後の係数αに応じた冗長チャネルの重みパラメータｗとが削除された出力対象のニューラルネットワークを出力する。 The learning unit 102 trains the weight parameter w, the coefficient γ1, the coefficient γ2, and the coefficient α of the neural network to be lightened by learning based on the neural network to be trained. The re-learning unit 103 retrains the weight parameter w by re-learning based on the neural network to be trained after learning. The output unit 104 outputs the output target neural network in which the channel attenuation function 204 and the redundant channel weight parameter w according to the post-training coefficient α in the correction target layer are deleted from the retrained target neural network. do.

かかる構成によれば、チャネル数（すなわち、冗長チャネル以外のチャネル数）と量子化ビット数とが同時に推定され得る。したがって、チャネル数と量子化ビット数との間に存在するトレードオフ関係を考慮しながら、チャネル数および量子化ビット数の最適解が求められ得る。これによって、チャネル数および量子化ビット数それぞれが独立に推定される場合よりも、精度劣化を抑制しつつ、処理効率の低下も抑制したニューラルネットワークを構築することが可能となる。 According to such a configuration, the number of channels (that is, the number of channels other than redundant channels) and the number of quantization bits can be estimated at the same time. Therefore, the optimum solution for the number of channels and the number of quantization bits can be obtained while considering the trade-off relationship existing between the number of channels and the number of quantization bits. This makes it possible to construct a neural network that suppresses a decrease in processing efficiency while suppressing a deterioration in accuracy as compared with a case where the number of channels and the number of quantization bits are estimated independently.

以上、本発明の実施形態の詳細について説明した。 The details of the embodiment of the present invention have been described above.

（２．各種の変形例）
以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 (2. Various modifications)
Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited to these examples. It is clear that a person having ordinary knowledge in the field of technology to which the present invention belongs can come up with various modifications or modifications within the scope of the technical ideas described in the claims. , These are also naturally understood to belong to the technical scope of the present invention.

例えば、上記では、学習部１０２が、係数αと係数γ１と係数γ２とを独立的に更新する例を主に説明した。しかし、学習部１０２は、訓練対象のニューラルネットワークとは別のニューラルネットワーク（第４のニューラルネットワーク）に基づいて、係数αと係数γ１と係数γ２とを生成してもよい。かかる係数更新の変形例について、図７を参照しながら説明する。 For example, in the above, the example in which the learning unit 102 independently updates the coefficient α, the coefficient γ1, and the coefficient γ2 has been mainly described. However, the learning unit 102 may generate a coefficient α, a coefficient γ1, and a coefficient γ2 based on a neural network (fourth neural network) different from the neural network to be trained. A modified example of such coefficient update will be described with reference to FIG. 7.

図７は、係数更新の変形例について説明するための図である。図７を参照すると、訓練対象のニューラルネットワークとは別にニューラルネットワーク２０９（第４のニューラルネットワーク）が設けられている。ニューラルネットワーク２０９の構成は特に限定されない。例えば、ニューラルネットワーク２０９は、畳み込み層および全結合層の少なくともいずれか一つを含んでもよい。学習部１０２は、係数訓練において正則化項が付与された損失関数に基づく誤差逆伝播法によって、ニューラルネットワーク２０９の重みパラメータを更新する。 FIG. 7 is a diagram for explaining a modified example of coefficient update. Referring to FIG. 7, a neural network 209 (fourth neural network) is provided separately from the neural network to be trained. The configuration of the neural network 209 is not particularly limited. For example, the neural network 209 may include at least one of a convolution layer and a fully connected layer. The learning unit 102 updates the weight parameter of the neural network 209 by the error back propagation method based on the loss function to which the regularization term is given in the coefficient training.

学習部１０２は、修正対象層への入力（入力特徴量ｘ^ｌ）に基づくデータをニューラルネットワーク２０９にも入力させ、かかるデータに応じたニューラルネットワーク２０９からの出力に基づいて、係数αと係数γ１と係数γ２とを生成してもよい。かかる場合には、係数αと係数γ１と係数γ２が、修正対象層への入力に対して依存する。このとき、修正対象層への入力と同じデータがニューラルネットワーク２０９に入力されてもよいし、修正対象層への入力の平均値などといった一意に定まる統計量が代表値としてニューラルネットワーク２０９に入力されてもよい。 The learning unit 102 causes the neural network 209 to input data based on the input to the correction target layer (input feature amount x ^l ), and based on the output from the neural network 209 corresponding to the data, the coefficient α and the coefficient γ1 And the coefficient γ2 may be generated. In such a case, the coefficient α, the coefficient γ1, and the coefficient γ2 depend on the input to the correction target layer. At this time, the same data as the input to the correction target layer may be input to the neural network 209, or a uniquely determined statistic such as the average value of the input to the correction target layer is input to the neural network 209 as a representative value. You may.

（３．ハードウェア構成例）
続いて、本発明の実施形態に係るニューラルネットワーク軽量化装置１０のハードウェア構成例について説明する。以下では、本発明の実施形態に係るニューラルネットワーク軽量化装置１０のハードウェア構成例として、情報処理装置９００のハードウェア構成例について説明する。なお、以下に説明する情報処理装置９００のハードウェア構成例は、ニューラルネットワーク軽量化装置１０のハードウェア構成の一例に過ぎない。したがって、ニューラルネットワーク軽量化装置１０のハードウェア構成は、以下に説明する情報処理装置９００のハードウェア構成から不要な構成が削除されてもよいし、新たな構成が追加されてもよい。 (3. Hardware configuration example)
Subsequently, a hardware configuration example of the neural network weight reduction device 10 according to the embodiment of the present invention will be described. Hereinafter, as a hardware configuration example of the neural network weight reduction device 10 according to the embodiment of the present invention, a hardware configuration example of the information processing apparatus 900 will be described. The hardware configuration example of the information processing device 900 described below is only an example of the hardware configuration of the neural network weight reduction device 10. Therefore, as for the hardware configuration of the neural network weight reduction device 10, an unnecessary configuration may be deleted from the hardware configuration of the information processing apparatus 900 described below, or a new configuration may be added.

図８は、本発明の実施形態に係るニューラルネットワーク軽量化装置１０の例としての情報処理装置９００のハードウェア構成を示す図である。情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３と、ホストバス９０４と、ブリッジ９０５と、外部バス９０６と、インタフェース９０７と、入力装置９０８と、出力装置９０９と、ストレージ装置９１０と、通信装置９１１と、を備える。 FIG. 8 is a diagram showing a hardware configuration of an information processing device 900 as an example of the neural network weight reduction device 10 according to the embodiment of the present invention. The information processing device 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, a host bus 904, a bridge 905, an external bus 906, and an interface 907. , An input device 908, an output device 909, a storage device 910, and a communication device 911.

ＣＰＵ９０１は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。これらはＣＰＵバス等から構成されるホストバス９０４により相互に接続されている。 The CPU 901 functions as an arithmetic processing device and a control device, and controls the overall operation in the information processing device 900 according to various programs. Further, the CPU 901 may be a microprocessor. The ROM 902 stores programs, calculation parameters, and the like used by the CPU 901. The RAM 903 temporarily stores a program used in the execution of the CPU 901, parameters that appropriately change in the execution, and the like. These are connected to each other by a host bus 904 composed of a CPU bus or the like.

ホストバス９０４は、ブリッジ９０５を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バス等の外部バス９０６に接続されている。なお、必ずしもホストバス９０４、ブリッジ９０５および外部バス９０６を分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The host bus 904 is connected to an external bus 906 such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 905. It is not always necessary to separately configure the host bus 904, the bridge 905, and the external bus 906, and these functions may be implemented in one bus.

入力装置９０８は、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチおよびレバー等ユーザが情報を入力するための入力手段と、ユーザによる入力に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路等から構成されている。情報処理装置９００を操作するユーザは、この入力装置９０８を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。 The input device 908 includes input means for the user to input information such as a mouse, keyboard, touch panel, buttons, microphones, switches and levers, and an input control circuit that generates an input signal based on the input by the user and outputs the input signal to the CPU 901. It is composed of etc. By operating the input device 908, the user who operates the information processing device 900 can input various data to the information processing device 900 and instruct the processing operation.

出力装置９０９は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ装置、液晶ディスプレイ（ＬＣＤ）装置、ＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）装置、ランプ等の表示装置およびスピーカ等の音声出力装置を含む。 The output device 909 includes, for example, a CRT (Cathode Ray Tube) display device, a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Diode) device, a display device such as a lamp, and an audio output device such as a speaker.

ストレージ装置９１０は、データ格納用の装置である。ストレージ装置９１０は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置等を含んでもよい。ストレージ装置９１０は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）で構成される。このストレージ装置９１０は、ハードディスクを駆動し、ＣＰＵ９０１が実行するプログラムや各種データを格納する。 The storage device 910 is a device for storing data. The storage device 910 may include a storage medium, a recording device for recording data on the storage medium, a reading device for reading data from the storage medium, a deleting device for deleting data recorded on the storage medium, and the like. The storage device 910 is composed of, for example, an HDD (Hard Disk Drive). The storage device 910 drives a hard disk and stores programs and various data executed by the CPU 901.

通信装置９１１は、例えば、ネットワークに接続するための通信デバイス等で構成された通信インタフェースである。また、通信装置９１１は、無線通信または有線通信のどちらに対応してもよい。 The communication device 911 is a communication interface composed of, for example, a communication device for connecting to a network. Further, the communication device 911 may support either wireless communication or wired communication.

以上、本発明の実施形態に係るニューラルネットワーク軽量化装置１０のハードウェア構成例について説明した。 The hardware configuration example of the neural network weight reduction device 10 according to the embodiment of the present invention has been described above.

１０ニューラルネットワーク軽量化装置
１００入力部
１０１修正部
１０２学習部
１０３再学習部
１０４出力部

10 Neural network weight reduction device 100 Input unit 101 Correction unit 102 Learning unit 103 Re-learning unit 104 Output unit

Claims

An input unit that acquires a first neural network that includes multiple processing layers,
At least one processing layer of the plurality of processing layers is specified as a modification target layer, and a first quantization function including a trainable first coefficient and a trainable second processing layer are specified for the modification target layer. A modification part that introduces a second quantization function containing the coefficients of and a channel attenuation function containing a trainable third coefficient on a channel-by-channel basis to generate a second neural network.
A learning unit that trains the weight parameter of the first neural network, the first coefficient, the second coefficient, and the third coefficient by learning based on the second neural network.
A re-learning unit that retrains the weight parameters by re-learning based on the second neural network after learning,
An output unit that outputs a third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are deleted from the second neural network after retraining. When,
A neural network weight reduction device equipped with.

The learning unit performs a first learning for training the weight parameter and a second learning for training the first coefficient, the second coefficient, and the third coefficient, one by one.
The neural network weight reduction device according to claim 1.

The channel decay function includes a process of executing a multiplication of a value corresponding to the third coefficient with respect to an input to the correction target layer on a channel-by-channel basis in the second learning.
The neural network weight reduction device according to claim 2.

The channel decay function includes a process of zeroing an input corresponding to a channel whose value corresponding to the third coefficient is lower than a predetermined threshold value among the inputs to the correction target layer in the first learning. ,
The neural network weight reduction device according to claim 3.

The redundant channel is a channel in which the value corresponding to the third coefficient after training is lower than the predetermined threshold value.
The neural network weight reduction device according to claim 4.

In the second learning, the channel decay function executes multiplication of a value corresponding to the third coefficient and multiplication of an adjustment parameter whose value is gradually reduced with respect to the input to the correction target layer. Including processing to do,
The neural network weight reduction device according to any one of claims 3 to 5.

In the second learning, the learning unit gradually reduces the adjustment parameter by performing learning based on the loss function in which the adjustment parameter is incorporated.
The neural network weight reduction device according to claim 6.

In the second learning, the learning unit gradually reduces the adjustment parameters according to a predetermined schedule.
The neural network weight reduction device according to claim 6.

The modification unit introduces the channel attenuation function and the first quantization function so that the channel attenuation function and the first quantization function are applied to the input to the modification target layer.
The neural network weight reduction device according to any one of claims 1 to 8.

The first quantization function includes a process of multiplying the output from the channel decay function by the first coefficient after performing the first normalization.
The neural network weight reduction device according to claim 9.

The first normalization comprises transforming the output from the channel decay function into the first range.
The neural network weight reduction device according to claim 10.

The modification unit introduces the second quantization function so that the second quantization function is applied to the weight parameter of the modification target layer.
The neural network weight reduction device according to any one of claims 1 to 11.

The second quantization function includes a process of multiplying the weight parameter of the layer to be modified by the second coefficient after performing the second normalization.
The neural network weight reduction device according to claim 12.

The second normalization includes a transformation that puts the weight parameter of the layer to be modified into the second range.
The neural network weight reduction device according to claim 13.

The modification target layer includes at least one of a convolution layer and a fully connected layer.
The neural network weight reduction device according to any one of claims 1 to 14.

The re-learning unit fixes the first coefficient to the first coefficient after training, fixes the second coefficient to the second coefficient after training, and fixes the third coefficient to the second coefficient after training. The weight parameter is retrained in a state fixed to the coefficient of 3.
The neural network weight reduction device according to any one of claims 1 to 15.

Obtaining a first neural network containing multiple processing layers,
At least one processing layer of the plurality of processing layers is specified as a modification target layer, and a first quantization function including a trainable first coefficient and a trainable second processing layer are specified for the modification target layer. To generate a second neural network by introducing a second quantization function containing the coefficients of and a channel decay function containing a trainable third coefficient on a per-channel basis.
By learning based on the second neural network, the weight parameter of the first neural network, the first coefficient, the second coefficient, and the third coefficient are trained.
By retraining based on the second neural network after training, the weight parameter is retrained, and
To output a third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are deleted from the second neural network after retraining. ,
How to reduce the weight of neural networks, including.

Computer,
An input unit that acquires a first neural network that includes multiple processing layers,
At least one processing layer of the plurality of processing layers is specified as a modification target layer, and a first quantization function including a trainable first coefficient and a trainable second processing layer are specified for the modification target layer. A modification part that introduces a second quantization function containing the coefficients of and a channel attenuation function containing a trainable third coefficient on a channel-by-channel basis to generate a second neural network.
A learning unit that trains the weight parameter of the first neural network, the first coefficient, the second coefficient, and the third coefficient by learning based on the second neural network.
A re-learning unit that retrains the weight parameters by re-learning based on the second neural network after learning,
An output unit that outputs a third neural network in which the channel attenuation function and the weight parameter of the redundant channel corresponding to the third coefficient after training in the modified target layer are deleted from the second neural network after retraining. When,
A program that functions as a neural network weight reduction device.