JP7438517B2

JP7438517B2 - Neural network compression method, neural network compression device, computer program, and method for producing compressed neural network data

Info

Publication number: JP7438517B2
Application number: JP2019137019A
Authority: JP
Inventors: 俊和和田; 幸司菅間
Original assignee: WAKAYAMA UNIVERSITY
Current assignee: WAKAYAMA UNIVERSITY
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2024-02-27
Anticipated expiration: 2039-07-25
Also published as: JP2021022050A

Description

本開示は、ニューラルネットワークの圧縮に関する。 TECHNICAL FIELD This disclosure relates to compression of neural networks.

ディープニューラルネットワーク（ＤＮＮ）のようなニューラルネットワークの圧縮手法として、プルーニング（Pruning；枝刈り）と、再構成（Reconstruction）と、を行う手法がある。 As compression methods for neural networks such as deep neural networks (DNN), there are methods that perform pruning and reconstruction.

プルーニングは、全結合型ニューラルネットワーク（ＦＣＮ）においては、ニューロン（とそのニューロンに接続された重み）の削除として行われ、畳み込み型ニューラルネットワーク（ＣＮＮ）においては、チャネルの削除として行われる（非特許文献１参照）。チャネルの削除は、削除されるチャネルに属する重み全体の削除として行われる。 Pruning is performed in fully connected neural networks (FCNs) by deleting neurons (and the weights connected to the neurons), and in convolutional neural networks (CNNs) by deleting channels (non-patent (See Reference 1). Deletion of a channel is performed as a deletion of the entire weight belonging to the channel being deleted.

再構成はプルーニング後に行われる。再構成では、プルーニング前の出力に近づくように、ニューラルネットワークの重みパラメータの調整が行われる。例えば、ＦＣＮにおいては、再構成として、ニューロン間の結合の重みパラメータの調整が行われ、ＣＮＮにおいては、再構成として、フィルタ（カーネル）における重みパラメータの調整が行われる。 Reconstruction is done after pruning. In reconstruction, the weight parameters of the neural network are adjusted so that the output approaches the output before pruning. For example, in an FCN, weight parameters of connections between neurons are adjusted as reconstruction, and in CNN, weight parameters in a filter (kernel) are adjusted as reconstruction.

Yihui He, Xiangyu Zhang, Jian Sun, “Channel Pruning for Accelerating Very Deep Neural Networks,” Proc. of ICCV2017, 2017Yihui He, Xiangyu Zhang, Jian Sun, “Channel Pruning for Accelerating Very Deep Neural Networks,” Proc. of ICCV2017, 2017

プルーニングと再構成とを行う従来の圧縮手法においては、再構成後における誤差（再構成誤差）が最小になるようにプルーニングが行われるわけではない、という課題が存在することを、本発明者らは見出した。以下では、この課題について説明する。なお、以下の説明では、簡単化のため、ＦＣＮを前提として説明する。 The present inventors have discovered that in conventional compression methods that perform pruning and reconstruction, there is a problem in that pruning is not performed in such a way that the error after reconstruction (reconstruction error) is minimized. found out. This issue will be explained below. Note that in the following explanation, for the sake of simplicity, the explanation will be based on the FCN.

プルーニングをする際には、削除されるニューロン（プルーニング対象）を選択する必要がある。削除されるニューロンは、削除されるニューロンが存在する層の次層に与える誤差に着目して、選択される。具体的には、ニューロンの削除によって次層に与える誤差が最小となるニューロンが、削除されるニューロンとして選択される。例えば、ニューロンＡ_１を削除した場合に、次層に与える誤差がＥ_１であり、ニューロンＡ_２を削除した場合に、次層に与える誤差がＥ_２である場合、誤差Ｅ_１が誤差Ｅ_２よりも小さければ、ニューロンＡ_１が、ニューロンＡ_２よりも優先して、削除されるニューロンとして選択されることになる。 When pruning, it is necessary to select neurons to be deleted (pruning targets). The neuron to be deleted is selected by focusing on the error it gives to the layer next to the layer in which the neuron to be deleted exists. Specifically, the neuron whose deletion gives the least error to the next layer is selected as the neuron to be deleted. For example, when neuron A ₁ is deleted, the error given to the next layer is E ₁ , and when neuron A ₂ is deleted, the error given to the next layer is E ₂ , then error E ₁ becomes error E ₂ If it is smaller than , neuron A ₁ will be selected as the neuron to be deleted in preference to neuron A ₂ .

プルーニング後の再構成では、削除されずに残ったニューロンから次層のニューロンへ向かう結合における重みパラメータが、調整される。重みの調整は、ニューロンの削除により次層に与える誤差が最小になるように実行される。例えば、プルーニングにおいてニューロンＡ_１を削除することで次層に与える誤差がＥ_１である場合、再構成では、誤差Ｅ_１ができるだけ小さくなるように、重みの調整が行われる。重みの調整により最小化された誤差Ｅ_１は、再構成誤差Ｅ_１ｒと呼ばれる。 In the reconstruction after pruning, the weight parameters in the connections from the remaining neurons to the neurons in the next layer are adjusted. Weight adjustment is performed so that the error imparted to the next layer by neuron deletion is minimized. For example, if the error imparted to the next layer by deleting neuron A ₁ in pruning is E ₁ , the weights are adjusted in reconstruction so that the error E ₁ is as small as possible. The error E ₁ minimized by adjusting the weights is called the reconstruction error E _1r .

以上のような従来の圧縮手法では、再構成誤差Ｅ_１ｒが最小になるようにプルーニングが行われるわけではない。例えば、前述のように、ニューロンＡ_１を削除した場合に次層に与える誤差がＥ_１であり、再構成誤差がＥ_１ｒであるとする。また、ニューロンＡ_２を削除した場合に次層に与えられる誤差がＥ_２であり、再構成誤差がＥ_２ｒであるとする。この場合において、誤差Ｅ_１が誤差Ｅ_２よりも小さいとしても、再構成誤差Ｅ_１ｒが再構成誤差Ｅ_２ｒよりも大きいことがある。すなわち、削除により生じる誤差が最小であっても、再構成誤差が最小になるとの保証はない。 In the conventional compression method as described above, pruning is not performed so that the reconstruction error E _1r is minimized. For example, as described above, assume that when neuron A ₁ is deleted, the error given to the next layer is E ₁ and the reconstruction error is E _1r . It is also assumed that the error given to the next layer when neuron A ₂ is deleted is E ₂ and the reconstruction error is E _2r . In this case, even if the error E ₁ is smaller than the error E ₂ , the reconstruction error E _1r may be larger than the reconstruction error E _2r . That is, even if the error caused by deletion is minimal, there is no guarantee that the reconstruction error will be minimal.

したがって、上記の課題の解決が望まれる。本開示において、上記の課題は、再構成誤差が最小になるようにプルーニングすることにより解決される。更なる詳細は、後述の実施形態として説明される。 Therefore, a solution to the above problems is desired. In the present disclosure, the above problem is solved by pruning so that the reconstruction error is minimized. Further details are described in the embodiments below.

図１は、ニューラルネットワーク圧縮装置及びニューラルネットワーク利用装置の構成図である。FIG. 1 is a configuration diagram of a neural network compression device and a neural network utilization device. 図２は、ニューラルネットワークの構成及び伝播量Ｙの定式化の説明図である。FIG. 2 is an explanatory diagram of the configuration of the neural network and the formulation of the amount of propagation Y. 図３は、比較例に係る圧縮処理のフローチャートである。FIG. 3 is a flowchart of compression processing according to a comparative example. 図４は、実施形態に係る圧縮処理のフローチャートである。FIG. 4 is a flowchart of compression processing according to the embodiment. 図５は、実施形態に係る圧縮処理のためのGreedy Algorithmである。FIG. 5 is a Greedy Algorithm for compression processing according to the embodiment. 図６は、部分空間Uへのx_jの射影を示す図である。FIG. 6 is a diagram showing the projection of x _j onto the subspace U. 図７は、残差r_jの計算方法を示す。FIG. 7 shows a method for calculating the residual r _j . 図８は、残差r_jの計算方法を示す。FIG. 8 shows a method for calculating the residual r _j . 図９は、別のニューロンを削除する際の再構成誤差の計算方法を示す。FIG. 9 shows how to calculate the reconstruction error when deleting another neuron. 図１０は、別のニューロンを削除する際の再構成誤差の計算方法を示す。FIG. 10 shows how to calculate the reconstruction error when deleting another neuron. 図１１は、ＲＥＡＰの畳み込み層への適用を示す。FIG. 11 shows the application of REAP to convolutional layers. 図１２は、グラン・シュミットの直交化計算の適用による射影残差r_jの計算方法を示す。FIG. 12 shows a method of calculating the projection residual r _j by applying Gran-Schmidt orthogonalization calculation. 図１３は、実験結果を示す図である。FIG. 13 is a diagram showing the experimental results.

＜１．ニューラルネットワークの圧縮方法、ニューラルネットワーク圧縮装置、コンピュータプログラム、及び圧縮されたニューラルネットワークデータの製造方法の概要＞ <1. Overview of neural network compression method, neural network compression device, computer program, and method for producing compressed neural network data>

（１）実施形態に係るニューラルネットワークの圧縮方法は、選択されたプルーニング対象に対するプルーニングと、プルーニングされたニューラルネットワークの再構成と、が行われる工程を備える。前記プルーニング対象は、プルーニング及び再構成によって生じる再構成誤差が最小になるように選択される。これにより、再構成誤差が最小になるようにプルーニングされる。 (1) The neural network compression method according to the embodiment includes the steps of pruning a selected pruning target and reconstructing the pruned neural network. The pruning target is selected such that the reconstruction error caused by pruning and reconstruction is minimized. This pruns so that the reconstruction error is minimized.

（２）前記プルーニングは全結合層におけるニューロンのプルーニングであってもよい。 (2) The pruning may be pruning of neurons in a fully connected layer.

（３）前記プルーニングは畳み込み層におけるチャネルのプルーニングであってもよい。 (3) The pruning may be pruning of channels in a convolutional layer.

（４）前記再構成誤差は、プルーニング対象の挙動ベクトルを、前記プルーニング対象以外の他のプルーニング単位の挙動ベクトルが張る部分空間に射影したときの射影残差に基づいて計算されるのが好ましい。 (4) The reconstruction error is preferably calculated based on a projection residual when the behavior vector of the pruning target is projected onto a subspace spanned by behavior vectors of other pruning units other than the pruning target.

（５）前記射影残差は、プルーニング対象の挙動ベクトルの双直交基底に基づいて計算されるのが好ましい。 (5) Preferably, the projection residual is calculated based on a biorthogonal basis of a behavior vector to be pruned.

（６）前記射影残差は、グラン・シュミットの直交化計算の反復適用により計算されてもよい。 (6) The projection residual may be calculated by iteratively applying Gran-Schmidt orthogonalization calculation.

（７）前記再構成誤差は、並列計算等で高速化されるのが好ましい。 (7) Preferably, the reconstruction error is accelerated by parallel calculation or the like.

（８）実施形態に係るニューラルネットワーク圧縮装置は、選択されたプルーニング対象に対するプルーニングと、プルーニングされたニューラルネットワークの再構成と、を行うよう構成されている。ニューラルネットワーク圧縮装置は、前記プルーニング対象は、プルーニング及び再構成によって生じる再構成誤差が最小になるように選択されるよう構成されている。 (8) The neural network compression device according to the embodiment is configured to perform pruning on a selected pruning target and reconfigure the pruned neural network. The neural network compression device is configured such that the pruning target is selected such that a reconstruction error caused by pruning and reconstruction is minimized.

（９）実施形態に係るコンピュータプログラムは、コンピュータを、ニューラルネットワーク圧縮装置として機能させるためのコンピュータプログラムである。前記ニューラルネットワーク圧縮装置は、選択されたプルーニング対象に対するプルーニングと、プルーニングされたニューラルネットワークの再構成と、を行うよう構成され、前記ニューラルネットワーク圧縮装置において、前記プルーニング対象は、プルーニング及び再構成によって生じる再構成誤差が最小になるように選択される。 (9) The computer program according to the embodiment is a computer program for causing a computer to function as a neural network compression device. The neural network compression device is configured to perform pruning on a selected pruning target and reconstruct the pruned neural network, and in the neural network compression device, the pruning target is generated by pruning and reconstruction. It is chosen so that the reconstruction error is minimized.

（１０）実施形態に係る圧縮されたニューラルネットワークデータの製造方法は、選択されたプルーニング対象に対するプルーニングと、プルーニングされたニューラルネットワークの再構成と、が行われる圧縮工程と、前記圧縮工程により圧縮されたニューラルネットワークのデータを出力する工程と、を有し、前記圧縮工程において、前記プルーニング対象は、プルーニング及び再構成によって生じる再構成誤差が最小になるように選択される。 (10) A method for manufacturing compressed neural network data according to an embodiment includes a compression step in which pruning is performed on a selected pruning target and reconstruction of the pruned neural network; and outputting data of the neural network, and in the compression step, the pruning target is selected such that a reconstruction error caused by pruning and reconstruction is minimized.

＜２．ニューラルネットワークの圧縮方法、ニューラルネットワーク圧縮装置、コンピュータプログラム、及び圧縮されたニューラルネットワークデータの製造方法の例＞ <2. Examples of neural network compression method, neural network compression device, computer program, and method for producing compressed neural network data>

図１は、実施形態に係るニューラルネットワーク圧縮装置（以下、「圧縮装置」という）１０とニューラルネットワーク利用装置（以下、「利用装置」という）１００とを示している。実施形態に係る圧縮装置１０は、ニューラルネットワークＮ１を圧縮して小規模化するための圧縮処理２１を実行する。圧縮処理２１を実行することにより実施される方法は、圧縮されたニューラルネットワークの製造方法又は圧縮されたニューラルネットワークデータの製造方法でもある。 FIG. 1 shows a neural network compression device (hereinafter referred to as “compression device”) 10 and a neural network utilization device (hereinafter referred to as “utilization device”) 100 according to an embodiment. The compression device 10 according to the embodiment executes a compression process 21 for compressing the neural network N1 to reduce its size. The method implemented by performing the compression process 21 is also a method for manufacturing a compressed neural network or a method for manufacturing compressed neural network data.

ニューラルネットワークは、複数の人工ニューロン（「ノード」ともいう）が結合した人工的な計算機構である。ニューラルネットワークは、例えば、ディープニューラルネットワーク（ＤＮＮ）である。ＤＮＮは、例えば、全結合型ニューラルネットワーク（ＦＣＮ）であってもよいし、畳み込み型ニューラルネットワーク（ＣＮＮ）であってもよい。以下では、圧縮処理の対象となるニューラルネットワークＮ１を、「原ニューラルネットワーク」といい、圧縮されたニューラルネットワークＮ２を「圧縮ニューラルネットワーク」という。なお、実施形態に係る圧縮装置１０は、原ニューラルネットワークＮ１の機械学習（深層学習）のための処理も実行可能である。圧縮装置１０は、学習済の原ニューラルネットワークＮ１を圧縮する。 A neural network is an artificial computational mechanism in which multiple artificial neurons (also called "nodes") are connected. The neural network is, for example, a deep neural network (DNN). The DNN may be, for example, a fully connected neural network (FCN) or a convolutional neural network (CNN). Hereinafter, the neural network N1 to be subjected to compression processing will be referred to as the "original neural network", and the compressed neural network N2 will be referred to as the "compressed neural network". Note that the compression device 10 according to the embodiment can also execute processing for machine learning (deep learning) of the original neural network N1. The compression device 10 compresses the trained original neural network N1.

圧縮装置１０は、１又は複数のプロセッサ２０及び記憶装置３０を有するコンピュータによって構成されている。１又は複数のプロセッサ２０は、例えば、グラフィックプロセッシングユニット（ＧＰＵ）を含む。１又は複数のプロセッサ２０は、さらにＣＰＵを含んでもよい。ＧＰＵのような大規模並列計算機構は、大規模なニューラルネットワークに関する処理を実行するための大量の計算に適している。 The compression device 10 is configured by a computer having one or more processors 20 and a storage device 30. One or more processors 20 include, for example, a graphics processing unit (GPU). One or more processors 20 may further include a CPU. Large-scale parallel computing mechanisms such as GPUs are suitable for large-scale calculations to perform processing related to large-scale neural networks.

記憶装置３０は、プロセッサ２０によって実行されるコンピュータプログラム３１を記憶している。プロセッサ２０は、コンピュータプログラム３１を実行することで、圧縮処理２１を行う。圧縮処理２１は、プルーニング（Pruning；枝刈り）と再構成（Reconstruction）とを含む。 The storage device 30 stores a computer program 31 executed by the processor 20. The processor 20 performs compression processing 21 by executing a computer program 31. The compression process 21 includes pruning and reconstruction.

記憶装置３０は、圧縮処理２１によって製造された圧縮ニューラルネットワークＮ２を表すデータ（圧縮ニューラルネットワークデータ）Ｎ２０を記憶することができる。圧縮ニューラルネットワークデータＮ２０は、圧縮ニューラルネットワークＮ２を表現する各種のパラメータ（重み、結合関係など）からなるデータである。圧縮装置１０は、圧縮ニューラルネットワークデータＮ２０を、ニューラルネットワークエンジン等へ、出力することができる。圧縮ニューラルネットワークデータＮ２０は、ニューラルネットワークエンジンに読み込まれることで、そのニューラルネットワークエンジンを圧縮ニューラルネットワークＮ２として機能させる。 The storage device 30 can store data (compressed neural network data) N20 representing the compressed neural network N2 manufactured by the compression process 21. The compressed neural network data N20 is data consisting of various parameters (weights, connection relationships, etc.) expressing the compressed neural network N2. The compression device 10 can output the compressed neural network data N20 to a neural network engine or the like. The compressed neural network data N20 is read into the neural network engine, thereby causing the neural network engine to function as the compressed neural network N2.

利用装置１００は、圧縮ニューラルネットワークデータＮ２０を読み込んで、圧縮ニューラルネットワークＮ２として機能するニューラルネットワークエンジンを有する。ニューラルネットワークエンジンは、例えば、プロセッサ２００と記憶装置３００とを備える。プロセッサ２００は、例えば、組み込み系システムにおける低消費電力のＣＰＵでよい。圧縮ニューラルネットワークデータＮ２０は、原ニューラルネットワークＮ１のデータに比べて、サイズが小さいため、低消費電力のＣＰＵによる処理が可能である。 The utilization device 100 has a neural network engine that reads the compressed neural network data N20 and functions as the compressed neural network N2. The neural network engine includes, for example, a processor 200 and a storage device 300. The processor 200 may be, for example, a low power consumption CPU in an embedded system. Since the compressed neural network data N20 is smaller in size than the data of the original neural network N1, it can be processed by a CPU with low power consumption.

組み込み系システムは、汎用的なコンピュータシステムではなく、特定の用途に向けられたコンピュータシステムであり、例えば、スマートフォン・家電などの家庭用機器、産業用ロボットなどの産業用機器、各種の医療用機器、自動車・ドローンなどのビークル、及びその他の機器におけるコンピュータシステムである。組み込み系システムでは、プロセッサとして、低消費電力のＣＰＵが使われることが多いが、圧縮ニューラルネットワークデータＮ２０は、データサイズが小さいため、実行が容易である。 Embedded systems are not general-purpose computer systems, but computer systems for specific purposes, such as household devices such as smartphones and home appliances, industrial devices such as industrial robots, and various medical devices. , computer systems in vehicles such as cars and drones, and other equipment. In embedded systems, a CPU with low power consumption is often used as a processor, but the compressed neural network data N20 is easy to execute because of its small data size.

圧縮ニューラルネットワークＮ２は、例えば、画像・音声の変換、セグメンテーション、識別などの用途に用いられる。より具体的には、例えば、店舗等の客数計測、男女・年齢層分析、車両計数、車種分析など、対象物の画像から必要な情報を抽出するために用いることができる。原ニューラルネットワークＮ１は大規模であり、計算コストが大きいため、組み込み系システムでの実行が困難であるが、圧縮ニューラルネットワークＮ２は、小規模化されているため、組み込み系システムでの実行が容易である。 The compression neural network N2 is used for, for example, image/audio conversion, segmentation, identification, and the like. More specifically, it can be used to extract necessary information from images of objects, such as counting the number of customers at stores, analyzing gender and age groups, counting vehicles, and analyzing vehicle types. The original neural network N1 is large-scale and has a high computational cost, making it difficult to execute in an embedded system, but the compressed neural network N2 is small-scale and therefore easy to execute in an embedded system. It is.

以下、圧縮処理２１について説明する。以下では、理解の容易のため、まず、全結合型ニューラルネットワーク（ＦＣＮ）を前提に、圧縮処理２１を説明し、その後、同様の圧縮処理２１を、畳み込み型ニューラルネットワーク（ＣＮＮ）に適用できることを説明する。 The compression process 21 will be explained below. In the following, for ease of understanding, we will first explain the compression process 21 based on a fully connected neural network (FCN), and then explain that the same compression process 21 can be applied to a convolutional neural network (CNN). explain.

図２は、原ニューラルネットワークＮ１である全結合型ニューラルネットワーク（ＦＣＮ）における層ｌと、層ｌの次の層である層ｌ＋１と、を示している。図２では、２つの層（ｌ層，ｌ＋１層）が代表的に示されている。ＦＣＮにおける各層は、層状に並べられた人工ニューロン（以下、単に「ニューロン」という）が層間で結合されている全結合層である。各層中における丸印がニューロンである。層ｌに含まれるニューロン数はｃであり、層ｌ＋１に含まれるニューロン数はＣである。 FIG. 2 shows a layer l in a fully connected neural network (FCN), which is the original neural network N1, and a layer l+1, which is the next layer after the layer l. In FIG. 2, two layers (l layer, l+1 layer) are representatively shown. Each layer in the FCN is a fully connected layer in which artificial neurons (hereinafter simply referred to as "neurons") arranged in layers are connected between layers. The circles in each layer are neurons. The number of neurons included in layer l is c, and the number of neurons included in layer l+1 is C.

図２中の式（１）は、ニューラルネットワークにデータが与えられた時における、層ｌから次の層ｌ＋１への伝播量Ｙを定式化している。ここでは、Ｙは、層ｌ＋１のＣ個のニューロンの内部活性度を表すＮ×Ｃ行列とする。換言すると、Ｙは、層ｌから与えられる、層ｌ＋１への入力でもある。 Equation (1) in FIG. 2 formulates the amount of propagation Y from layer l to the next layer l+1 when data is given to the neural network. Here, Y is an N×C matrix representing the internal activation of C neurons in layer l+1. In other words, Y is also the input to layer l+1, given from layer l.

Ｎ個のデータ（例えば、Ｎ個の画像データ）を、層ｌのｃ個のニューロンに与えた場合、層ｌの各ニューロンからはＮ個の出力が生じる。層ｌにおけるi番目のニューロンの出力がx_iで表される。x_iは、Ｎ次元のベクトルである。x_iは、i番目のニューロンにＮ個のデータが与えられた時のi番目のニューロンの出力（挙動）を示す。すなわち、x_iはi番目のニューロンの挙動ベクトル（ニューロン挙動ベクトル）でもある。なお、ニューラルネットワークに与えられるデータは、画像以外の他のデータ、例えば、音声データ等であってもよい。画像データ等のデータは、各ニューロンの挙動を把握するために、ニューラルネットワークに与えられる。 When N pieces of data (for example, N pieces of image data) are applied to c neurons of layer l, each neuron of layer l produces N outputs. The output of the i-th neuron in layer l is denoted by x _i . x _i is an N-dimensional vector. x _i indicates the output (behavior) of the i-th neuron when N pieces of data are given to the i-th neuron. That is, x _i is also the behavior vector of the i-th neuron (neuron behavior vector). Note that the data given to the neural network may be data other than images, such as audio data. Data such as image data is fed to a neural network in order to understand the behavior of each neuron.

図２中のｗ_iは、ｌ層のi番目のニューロンから、ｌ＋１層のＣ個のニューロンへ向かう結合の重み（重み係数）からなるＣ次元の重みベクトルである。 In FIG. 2, w _i is a C-dimensional weight vector consisting of weights (weighting coefficients) of connections from the i-th neuron of the l layer to the C neurons of the l+1 layer.

この場合、次層ｌ＋１への伝播量Ｙは、層ｌにおけるニューロンの出力x_iと、層ｌから次層ｌ＋１への重みベクトルｗ_iと、によって、図２中の式（１）に示すように定式化される。 In this case, the amount of propagation Y to the next layer l+1 is determined by the output x _i of the neuron in the layer l and the weight vector w _i from the layer l to the next layer l+1, as shown in equation (1) in FIG. It is formulated as follows.

実施形態に係る圧縮処理２１の目的は、上記のＹをできるだけ変化させることなく、ニューロンの数を、所望の数ほど減少させることである。ニューロンを減少させても、Ｙの変化が少なければ、原ニューラルネットワークＮ１の性能を維持することができる。つまり、ニューラルネットワークを圧縮しても、精度低下を防止できる。なお、上記のように、ＦＣＮでは、ニューロンがプルーニング単位であるが、ＣＮＮでは、チャネルがプルーニング単位である。なお、プルーニング単位は、削除の単位である。 The purpose of the compression process 21 according to the embodiment is to reduce the number of neurons by a desired number without changing the above Y as much as possible. Even if the number of neurons is reduced, the performance of the original neural network N1 can be maintained as long as the change in Y is small. In other words, even if the neural network is compressed, it is possible to prevent a decrease in accuracy. Note that, as described above, in FCN, the neuron is the pruning unit, but in CNN, the channel is the pruning unit. Note that the pruning unit is the unit of deletion.

実施形態に係る圧縮処理２１の説明に先立ち、比較例に係る圧縮処理１２１を説明する。図３は、比較例に係る圧縮処理１２１を示している。図３に示す圧縮処理１２１は、プルーニング工程１２２と、再構成工程１２３と、を有している。比較例においては、プルーニング工程１２２と再構成工程１２３とは完全に分離している。 Prior to describing the compression process 21 according to the embodiment, the compression process 121 according to a comparative example will be described. FIG. 3 shows compression processing 121 according to a comparative example. The compression process 121 shown in FIG. 3 includes a pruning process 122 and a reconstruction process 123. In the comparative example, the pruning step 122 and the reconstruction step 123 are completely separated.

プルーニング工程１２２では、ある層ｌに含まれる複数のニューロン（複数のプルーニング単位）から削除されるニューロン（プルーニング対象）が選択され、選択されたニューロンの削除が行われる。削除されるニューロンが層ｌの中から選択される場合、次層ｌ＋１への伝播量Ｙに与える影響が最も小さくなるニューロンが、削除されるニューロン（プルーニング対象）として選択される。この選択の際には、Lasso回帰を用いて、図３の式（２）に従ってニューロンが選択される（非特許文献１参照。非特許文献１ではチャネルが選択される）。比較例においては、ニューロンの選択の際には、重みが調整されることはない。 In the pruning step 122, a neuron (pruning target) to be deleted is selected from a plurality of neurons (a plurality of pruning units) included in a certain layer l, and the selected neuron is deleted. When a neuron to be deleted is selected from layer l, the neuron that has the smallest influence on the amount of propagation Y to the next layer l+1 is selected as the neuron to be deleted (pruning target). In this selection, neurons are selected using Lasso regression according to equation (2) in FIG. 3 (see Non-Patent Document 1. In Non-Patent Document 1, channels are selected). In the comparative example, weights are not adjusted when selecting neurons.

比較例では、次層ｌ＋１での内部活性度の誤差に関するペナルティ項に重要度ベクトルβのＬ_１ノルムを加えている。重要度ベクトルβのＬ_１ノルムを最小化する重要度ベクトルβを求めると、次層ｌ＋１の活性度の誤差を低く抑えつつゼロ要素の多いβが得られ、削除すべきニューロンを決定できる。すなわち、式（２）の最適化の結果、最適化されたニューロンの重要度ベクトルβ^*が得られるが、そのベクトルのi番目の要素β^* _iが０ならば、i番目のニューロンは不要であり、削除されるニューロンとして選択される。 In the comparative example, the L ₁ norm of the importance vector β is added to the penalty term regarding the error in the internal activity at the next layer l+1. When the importance vector β that minimizes the L ₁ norm of the importance vector β is obtained, β with many zero elements can be obtained while suppressing the error in the activation degree of the next layer l+1, and the neurons to be deleted can be determined. In other words, as a result of the optimization in equation (2), an optimized neuron importance vector β ^* is obtained, but if the i-th element β ^* _i of that vector is 0, the i-th neuron is unnecessary. Yes, and selected as the neuron to be deleted.

比較例において、削除されるニューロンの数は、ハイパーパラメータλの微調整によってコントロールされる。λを増加させれば、削除されるニューロンの数が増え、λを減少させれば、削除されるニューロンの数が減る。比較例においては、削除されるニューロンの数は、λによってコントロールされるため、削除されるニューロンの数の制御は難しい。 In the comparative example, the number of neurons removed is controlled by fine-tuning the hyperparameter λ. Increasing λ increases the number of neurons deleted, and decreasing λ decreases the number of neurons deleted. In the comparative example, the number of neurons to be deleted is controlled by λ, so it is difficult to control the number of neurons to be deleted.

再構成工程１２３では、層ｌにおいて、プルーニング後に残ったニューロンが、次層ｌ＋１に与えるＹが、プルーニング前におけるＹ（本来のＹ）に近づくように、重みが調整（最適化）される。重みの調整は、図３中の式（３）に従って行われる。式（３）は、再構成誤差を最小化する重みベクトルを求める。ここでの再構成誤差は、プルーニング前のＹと、プルーニング後に重みを調整したときのＹと、の差に基づく。 In the reconstruction step 123, the weights are adjusted (optimized) in layer l so that the Y given by neurons remaining after pruning to the next layer l+1 approaches Y before pruning (original Y). The weight adjustment is performed according to equation (3) in FIG. Equation (3) calculates a weight vector that minimizes the reconstruction error. The reconstruction error here is based on the difference between Y before pruning and Y when the weights are adjusted after pruning.

比較例においては、プルーニング対象は、再構成を行う前の誤差を最小化するように選択されており、再構成後に最も誤差が小さくなるように選択されているわけではない。つまり、比較例では、プルーニング対象の選択は、再構成前の誤差に基づいて行われており、再構成は、再構成後の誤差に基づいて行われており、プルーニングと再構成とが、異なる基準で行われている。また、比較例においては、Lassoを用いており、削除されるニューロンの数をコントロールするには、λの人手による微調整が必要となる。つまり、比較例では、削除されるニューロンの数のコントロールは困難である。 In the comparative example, the pruning target is selected so as to minimize the error before reconstruction, and is not selected so as to minimize the error after reconstruction. In other words, in the comparative example, the selection of the pruning target is performed based on the error before reconstruction, and the reconstruction is performed based on the error after reconstruction, and pruning and reconstruction are different. It is done according to standards. Further, in the comparative example, Lasso is used, and manual fine adjustment of λ is required to control the number of neurons to be deleted. In other words, in the comparative example, it is difficult to control the number of neurons to be deleted.

図４は、実施形態に係る圧縮処理２１を示している。以下では、実施形態に係る圧縮処理２１を、ＲＥＡＰ（Reconstruction Error Aware Pruning）という。 FIG. 4 shows compression processing 21 according to the embodiment. Hereinafter, the compression process 21 according to the embodiment will be referred to as REAP (Reconstruction Error Aware Pruning).

比較例では、プルーニングと再構成とが異なる基準で行われていたのに対して、ＲＥＡＰでは、プルーニングと再構成とを同じ基準で行う。すなわち、ＲＥＡＰでは、再構成誤差が最小になるようにプルーニングされるとともに再構成される。ＲＥＡＰでは、図４中の式（４）に従って、削除されるニューロンが決定される。なお、式（４）のＺ^＊は、層ｌにおいて、削除して残ったニューロンを示す。式（４）においては、重みベクトルｗ_iは、Ｚ’をＺ^＊に固定する前に最適化される。したがって、式（４）によれば、次層ｌ＋１への伝播量Ｙの再構成誤差を最小化するニューロンの集合が求まる。 In the comparative example, pruning and reconstruction were performed using different standards, whereas in REAP, pruning and reconstruction are performed using the same standard. That is, in REAP, the data is pruned and reconstructed so that the reconstruction error is minimized. In REAP, neurons to be deleted are determined according to equation (4) in FIG. Note that Z ^* in Equation (4) indicates the neuron remaining after deletion in layer l. In equation (4), the weight vector w _i is optimized before fixing Z' to Z ^* . Therefore, according to equation (4), a set of neurons that minimizes the reconstruction error of the amount of propagation Y to the next layer l+1 is found.

ＲＥＡＰでは、再構成誤差が最小になるようにプルーニング対象であるニューロンが選択されるため、ＲＥＡＰは、再構成誤差が最小になるとは限らない比較例に比べて、有利である。 In REAP, neurons to be pruned are selected so that the reconstruction error is minimized, so REAP is advantageous compared to the comparative example in which the reconstruction error is not necessarily minimized.

式（４）は、組み合わせ最適化問題であり、グリーディ法（Greedy Algorithm）によって解かれる。図５は、式（４）を解くためのアルゴリズムを示している。まず、ステップＳ１において、層ｌにおけるｊ番目のニューロンを消してみる。ステップＳ２において、層ｌにおいて残ったニューロンのみでＹを再構成して誤差（再構成誤差）を計算する。再構成誤差は、図５中において式（５）として示されるコスト関数Ｐ（Ｚ’）を計算することで求まる。再構成誤差が求まると、一旦、削除したｊ番目のニューロンを元に戻す。ステップＳ３で示される繰り返しループにおいては、ステップＳ１及びステップＳ２がすべてのｊ（j∈Z）について繰り返され、最もＰ（Ｚ’）の値が小さくなるニューロンが、プルーニング対象として選択され、最終的に削除される。 Equation (4) is a combinatorial optimization problem and is solved by a greedy algorithm. FIG. 5 shows an algorithm for solving equation (4). First, in step S1, the j-th neuron in layer l is erased. In step S2, Y is reconstructed using only the remaining neurons in layer l, and an error (reconstruction error) is calculated. The reconstruction error is found by calculating the cost function P(Z') shown as equation (5) in FIG. Once the reconstruction error is determined, the deleted j-th neuron is restored. In the iterative loop indicated by step S3, steps S1 and S2 are repeated for all j (j∈Z), and the neuron with the smallest value of P(Z') is selected as the pruning target, and the final will be deleted.

ステップＳ３の繰り返しループによって、一つのニューロンが削除される。ステップＳ４で示される繰り返しループにおいては、残ったニューロンのみで、再度、ステップＳ３の繰り返しループが実行される。再度、ステップ３の繰り返しループが実行されると、別のニューロンが削除される。 One neuron is deleted through the iterative loop of step S3. In the repeat loop shown in step S4, the repeat loop in step S3 is executed again using only the remaining neurons. When the iterative loop of step 3 is executed again, another neuron is deleted.

層ｌからいくつのニューロンを削除するかは、ステップＳ４の繰り返しループを何回実行するかによって決まる。したがって、所望数Ｄのニューロンを削除したい場合、ステップＳ４の繰り返しループをＤ回実行すればよい。したがって、ＲＥＡＰでは、削除されるニューロンの数のコントロールは容易である。 How many neurons to delete from layer l depends on how many times the iterative loop of step S4 is executed. Therefore, if it is desired to delete a desired number D of neurons, it is sufficient to execute the repeat loop of step S4 D times. Therefore, in REAP, it is easy to control the number of neurons to be deleted.

ＲＥＡＰでは、比較例に比べて、計算量が増加する。すなわち、ＲＥＡＰでは、再構成誤差を求める際に最小二乗法を適用するため、連立方程式を解く必要がある。そして、解く必要のある連立方程式は層内のニューロン数分存在する。例えば、層ｌのニューロン数がｃであり、次層ｌ＋１のニューロン数がＣである場合，重みパラメータ数はｃ×Ｃになる。１個のニューロンを削除する場合、係数行列のサイズが（ｃ－１）Ｃ×（ｃ－１）Ｃとなる。したがって、一つの連立方程式を解くための時間計算量は、Ｏ（ｃ^３Ｃ^３）である。この連立方程式をｃ回解く必要があることから、Ｏ（ｃ^４Ｃ^３）の時間計算量となる。 In REAP, the amount of calculation increases compared to the comparative example. That is, in REAP, since the method of least squares is applied when determining the reconstruction error, it is necessary to solve simultaneous equations. There are as many simultaneous equations as there are neurons in the layer. For example, if the number of neurons in layer l is c and the number of neurons in the next layer l+1 is C, the number of weight parameters is c×C. When one neuron is deleted, the size of the coefficient matrix becomes (c-1)C×(c-1)C. Therefore, the time complexity for solving one simultaneous equation is O(c ³ C ³ ). Since this simultaneous equation needs to be solved c times, the time complexity is O(c ⁴ C ³ ).

ここで、本来解こうとしている最小二乗問題は、あるニューロンの挙動を表す挙動ベクトルx_jがなくなったときに、残りのニューロン集合Ｚ’の挙動ベクトルx_i（i∈Z’）の線形和で、次層ｌ＋１への本来の伝播量Ｙを近似する、という問題である。この近似による誤差は、図６に示すように、x_jをx_i（i∈Z’）の線形和で表現した際の誤差r_jに起因している。したがって、この誤差r_jを最小化すれば、誤差r_jを示すベクトルに、次層ｌ＋１への重みベクトルw_i ^Tを掛けるだけで、次層ｌ＋１での活性度（伝播量Ｙ）の誤差が計算できる。 Here, the least squares problem that we are originally trying to solve is that when the behavior vector x _j representing the behavior of a certain neuron disappears, the linear sum of the behavior vectors x _i (i∈Z') of the remaining neuron set Z' is , the problem is to approximate the original amount of propagation Y to the next layer l+1. The error caused by this approximation is due to the error r _j when x _j is expressed as a linear sum of x _i (i∈Z'), as shown in FIG. Therefore, if this error r _j is minimized, the error in the activation level (propagation Y) at the next layer l+1 can be reduced by simply multiplying the vector indicating the error r _j by the weight vector w _i ^T for the next layer l+1. Can calculate.

すなわち、ｊ番目のニューロンの削除により生じる再構成誤差を計算するためには、図６中の式（６－１）に示すコスト関数を計算する必要がある。後述するように、式（６－１）のコスト関数は、式（６－２）のように表される。したがって、x_jをx_i（i∈Z’）が張る部分空間に射影した際の射影残差r_jを、巨大な係数行列を用いることなく、計算することができれば、再構成誤差を効率よく計算することができる。一例として、残差r_jは、式（６－３）のように計算される。 That is, in order to calculate the reconstruction error caused by deletion of the j-th neuron, it is necessary to calculate the cost function shown in equation (6-1) in FIG. As described later, the cost function of equation (6-1) is expressed as equation (6-2). Therefore, if it is possible to calculate the projection residual r _j when x _j is projected onto the subspace spanned by x _i (i∈Z') without using a huge coefficient matrix, the reconstruction error can be efficiently reduced. can be calculated. As an example, the residual r _j is calculated as in equation (6-3).

残差r_jの求め方の一例は、図７及び図８に詳しく説明されている。図７及び図８に示す計算方法は、残差r_jが、x_jの双直交基底と線形従属であることを利用したものである。すなわち、残差r_jは、x_jの双直交基底に対する、x_jの射影である。残差r_jは、x_jの双直交基底に基づいて計算される。図７及び図８に示す計算方法によれば、連立方程式の係数行列を使用せずに、残差r_jを効率的に計算できる。また、図９及び図１０は、あるニューロンを削除した後に、別のニューロンを削除するための再構成誤差を効率的に計算する方法を説明している。 An example of how to obtain the residual r _j is explained in detail in FIGS. 7 and 8. The calculation method shown in FIGS. 7 and 8 utilizes the fact that the residual r _j is linearly dependent on the biorthogonal basis of x _j . That is, the residual r _j is the projection of x _{j onto the biorthogonal basis of x j} _. The residual r _j is calculated based on a biorthogonal basis of x _j . According to the calculation method shown in FIGS. 7 and 8, the residual r _j can be efficiently calculated without using the coefficient matrix of the simultaneous equations. 9 and 10 also illustrate a method for efficiently calculating a reconstruction error for deleting another neuron after deleting one neuron.

高速化のため、再構成誤差の計算は、並列処理で行うのが好ましい。再構成誤差を並列計算することで、高速に再構成誤差を求めることができる。ただし、最小二乗法を解く際に必要となる係数行列を格納するメモリ量は、並列化によって増大する。 In order to increase speed, calculation of reconstruction errors is preferably performed in parallel. By calculating the reconstruction error in parallel, the reconstruction error can be obtained at high speed. However, parallelization increases the amount of memory required to store the coefficient matrix when solving the least squares method.

したがって、再構成誤差の計算を効率的に行うには、消費メモリ量に相当する空間計算量を削減することが好ましい。ここで、空間計算量（消費メモリ）は一つの連立方程式当たり、Ｏ（ｃ^２Ｃ^２）であり、ｃ個並列計算で同時に計算するとＯ（ｃ^３Ｃ^２）になる。空間計算量を削減するには、連立方程式の係数行列を使用せずに最小二乗法の計算を行うのが好ましい。 Therefore, in order to efficiently calculate the reconstruction error, it is preferable to reduce the amount of spatial calculation equivalent to the amount of memory consumed. Here, the space calculation amount (memory consumption) is O(c ² C ² ) per one simultaneous equation, and when c calculations are performed simultaneously in parallel, it becomes O(c ³ C ² ). In order to reduce the amount of spatial calculation, it is preferable to perform the least squares calculation without using the coefficient matrix of the simultaneous equations.

図１２は、並列計算で残差r_jを求める例を示している。図１２では、グラン・シュミット（Gram-Schmit）の直交化計算を反復適用することで、残差r_jを効率的に計算できる。図１１に示す計算方法では、連立方程式の係数行列を使用せずに、残差r_jを効率的に並列計算できる。ニューロン数が非常に多い（実行環境によるが、目安として、4096個以上）の場合は、グラン・シュミットを用いる解法の方が高速である。 FIG. 12 shows an example of finding the residual r _j by parallel calculation. In FIG. 12, the residual r _j can be efficiently calculated by repeatedly applying Gram-Schmit orthogonalization calculation. In the calculation method shown in FIG. 11, the residual r _j can be efficiently calculated in parallel without using the coefficient matrix of the simultaneous equations. When the number of neurons is very large (4096 or more, as a guide, depending on the execution environment), the Gran-Schmidt solution method is faster.

図１１は、ＲＥＡＰが畳み込み層におけるプルーニング及び再構成に適用できることを説明している。図１１中の式（１９）に示すように、畳み込み層におけるスライディングウィンドウ操作は、行列乗算の和によって表される。式（１９）は、全結合層のための式（１）と同様の形式であることから、畳み込み層においても、全結合層と同様に、ＲＥＡＰを適用できることがわかる。 FIG. 11 illustrates that REAP can be applied to pruning and reconstruction in convolutional layers. As shown in equation (19) in FIG. 11, the sliding window operation in the convolution layer is represented by a sum of matrix multiplications. Since Equation (19) has the same format as Equation (1) for the fully connected layer, it can be seen that REAP can be applied to the convolutional layer as well as to the fully connected layer.

図１３は、ニューラルネットワークの圧縮をＲＥＡＰ及び比較例によって行った実験結果を示している。実験では、ImageNetデータセットによってトレーニングしたVGG16を、原ニューラルネットワークＮ１として用いた。原ニューラルネットワークＮ１に対する圧縮処理２１としてＲＥＡＰを適用した場合及び比較例を適用した場合それぞれについて、画像の認識精度（正解率）を求めた。 FIG. 13 shows the results of an experiment in which neural network compression was performed using REAP and a comparative example. In the experiment, VGG16 trained with the ImageNet dataset was used as the original neural network N1. Image recognition accuracy (accuracy rate) was determined for each case when REAP was applied as the compression process 21 for the original neural network N1 and when the comparative example was applied.

図１３の横軸は、ＦＬＯＰｓ（浮動小数点演算数）を示し、縦軸は、正解率を示す。ＦＬＯＰｓが小さいほど、圧縮ニューラルネットワークＮ２の演算数が少なく、圧縮率が大きいことを示す。図１３に示すように、比較例では、圧縮率を増加（削除されるニューロン数を増加）させると、正解率が０．７（７０％）程度まで下がるのに対して、ＲＥＡＰでは、圧縮率を増加させても、正解率は０．８（８０％）程度までしか下がらなかった。したがって、ＲＥＡＰの方が、圧縮による精度低下を抑制できていることがわかる。 The horizontal axis in FIG. 13 shows FLOPs (floating point operations), and the vertical axis shows the correct answer rate. The smaller FLOPs indicates that the number of operations of the compression neural network N2 is smaller and the compression ratio is larger. As shown in Figure 13, in the comparative example, when the compression rate is increased (increasing the number of neurons to be deleted), the accuracy rate drops to about 0.7 (70%), whereas in REAP, the compression rate Even when increasing the number, the accuracy rate only decreased to about 0.8 (80%). Therefore, it can be seen that REAP is better able to suppress the decrease in accuracy due to compression.

＜３．付記＞
本発明は、上記実施形態に限定されるものではなく、様々な変形が可能である。 <3. Additional notes>
The present invention is not limited to the above embodiments, and various modifications are possible.

１０：ニューラルネットワーク圧縮装置
２０：プロセッサ
２１：圧縮処理
３０：記憶装置
３１：コンピュータプログラム
１００：ニューラルネットワーク利用装置
１２１：圧縮処理
１２２：プルーニング工程
１２３：再構成工程
２００：プロセッサ
３００：記憶装置
Ｎ１：原ニューラルネットワーク
Ｎ２：圧縮ニューラルネットワーク
Ｎ２０：圧縮ニューラルネットワークデータ 10: Neural network compression device 20: Processor 21: Compression processing 30: Storage device 31: Computer program 100: Neural network utilization device 121: Compression processing 122: Pruning process 123: Reconstruction process 200: Processor 300: Storage device N1: Original Neural network N2: Compressed neural network N20: Compressed neural network data

Claims

to the computer,
A step of performing pruning on the selected pruning target and reconfiguration to adjust weight parameters of the pruned neural network,
In the step, the pruning target calculates a reconstruction error caused by the reconstruction that adjusts the weight parameters of the pruned neural network based on the adjusted weight parameters, and calculates the reconstruction error so that the reconstruction error is minimized. selected ,
The pruning target selected such that the reconstruction error is minimized is determined by pruning each of the pruning targets that can be selected as the pruning target in the neural network before the selected pruning target is pruned. The reconstruction error is minimized in
How to compress neural networks.

The neural network compression method according to claim 1, wherein the pruning is pruning of neurons in a fully connected layer.

The neural network compression method according to claim 1, wherein the pruning is channel pruning in a convolutional layer.

The reconstruction error is calculated based on a projection residual when the behavior vector of the pruning target is projected onto a subspace spanned by behavior vectors of other pruning units other than the pruning target. The neural network compression method according to item 1.

The neural network compression method according to claim 4, wherein the projection residual is calculated based on a biorthogonal basis of a behavior vector to be pruned.

The neural network compression method according to claim 4, wherein the projection residual is calculated by iteratively applying Gran-Schmidt orthogonalization calculation.

The neural network compression method according to any one of claims 1 to 6, wherein the reconstruction error is calculated in parallel.

A neural network compression device that performs pruning on a selected pruning target and reconfiguration that adjusts weight parameters of the pruned neural network,
The pruning target is selected such that the reconstruction error caused by the reconstruction that adjusts the weight parameters of the pruned neural network is calculated based on the adjusted weight parameters, and the reconstruction error is minimized. configured ,
The pruning target selected such that the reconstruction error is minimized is determined by pruning each of the pruning targets that can be selected as the pruning target in the neural network before the selected pruning target is pruned. The reconstruction error is minimized in
Neural network compression device.

A computer program for causing a computer to function as a neural network compression device,
The neural network compression device is configured to perform pruning on the selected pruning target and reconfiguration to adjust weight parameters of the pruned neural network,
In the neural network compression device, the pruning target calculates a reconstruction error caused by reconstruction that adjusts the weight parameters of the pruned neural network based on the adjusted weight parameters, and calculates the reconstruction error so that the reconstruction error is minimized. selected to be ,
The pruning target is selected such that the reconstruction error is minimized when each of the pruning targets that can be selected as the pruning target is pruned in the neural network before the selected pruning target is pruned. The reconstruction error is minimized in
computer program.

to the computer,
a compression step of performing pruning on the selected pruning target and reconfiguration to adjust weight parameters of the pruned neural network;
outputting the neural network data compressed by the compression step;
has
In the compression step, the pruning target calculates the reconstruction error caused by the reconstruction that adjusts the weight parameters of the pruned neural network based on the adjusted weight parameters, and calculates the reconstruction error so that the reconstruction error is minimized. selected ,
The pruning target selected such that the reconstruction error is minimized is determined by pruning each of the pruning targets that can be selected as the pruning target in the neural network before the selected pruning target is pruned. The reconstruction error is minimized in
A method for producing compressed neural network data.