CN113392970A

CN113392970A - Automatic neural network pruning method based on feature map activation rate

Info

Publication number: CN113392970A
Application number: CN202110528976.6A
Authority: CN
Inventors: 张晋侨; 姜晓栋; 顾成飞
Original assignee: Shanghai Panchip Microelectronics Co ltd
Current assignee: Shanghai Panchip Microelectronics Co ltd
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2021-09-14

Abstract

The invention provides an automatic neural network pruning method based on a characteristic diagram activation rate, which relates to the field of neural networks and provides a novel automatic neural network pruning algorithm. By the technical scheme provided by the invention, model compression can be carried out on different tasks, the algorithm can maximize the compression model while ensuring the precision, and the calculation cost of the algorithm is reduced.

Description

Automatic neural network pruning method based on feature map activation rate

Technical Field

The invention belongs to the field of neural networks, and particularly relates to an automatic neural network pruning method based on a characteristic diagram activation rate.

Background

Early neural network pruning algorithms usually require the artificial presetting of the pruning rate of each layer, and cannot realize the optimal solution. By adopting the automatic pruning algorithm, the optimal solution of each layer can be searched by setting a hyper-parameter, so that the model size is compressed as much as possible and the reasoning speed is increased while the model precision is ensured.

Disclosure of Invention

In view of the above-mentioned defects of the prior art, the technical problem to be solved by the present invention is how to compress the model size and increase the inference speed as much as possible while ensuring the model accuracy.

In order to achieve the purpose, the invention provides an automatic neural network pruning method based on a feature map activation rate, relates to the field of neural networks, and provides a novel automatic neural network pruning algorithm which adaptively prunes out redundant filters of each layer according to a hyper-parameter, so that the redundant information of a model is reduced, and the reasoning speed of the model and the operation efficiency at the end side are greatly improved.

Further, the hyper-parameter, i.e. the activation rate a.

Further, A is in the range of 0-1.

Further, when the activation rate of a certain feature map is smaller than the A, the convolution kernel is judged as a redundant convolution kernel.

Further, the redundant convolution kernels may be pruned from the network.

Further, the method comprises the steps of:

step 1, acquiring the characteristic diagram;

step 2, judging the redundant convolution kernel;

and 3, pruning.

Further, the step 1 is to input data into a neural network model to be pruned to perform forward reasoning, and obtain the feature map output by the convolutional layer.

Further, the step 2 is to pass the feature map of each layer through an active layer from top to bottom, then calculate a non-zero ratio of each channel outputting the feature map, and if the non-zero ratio of the ith channel is smaller than the a, the ith convolution kernel of the convolutional layer is the redundant convolution kernel.

Further, the activation layer is selected from a Relu layer.

Further, in the step 3, the redundant convolution kernel of each convolution layer is pruned from the neural network model, and then fine tuning is performed by using original training data, so that a pruned model is finally obtained.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the technical scheme provided by the proposal, the pruning rate of each layer can be searched through one hyper-parameter, so that the limitation that the compression rate needs to be manually set in the traditional pruning technology is avoided, and the model is compressed more thoroughly.

(2) In addition, the algorithm can greatly compress the model parameters, reduce the calculated amount and reduce the calculation overhead during reasoning while ensuring the model precision.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a flowchart of an algorithm in an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in FIG. 1, the algorithm firstly sets a hyper-parametric activation rate A, wherein A ranges from 0 to 1, and when the activation rate of a characteristic diagram is smaller than the value, the convolution kernel is judged to be a redundant convolution kernel which can be cut from the network.

The specific operation is as follows:

and inputting the data into a neural network model to be pruned for forward reasoning to obtain a characteristic diagram output by the convolutional layer. From top to bottom, each layer of feature map is passed through an active layer (typically a Relu layer), then a non-zero ratio of each channel of the output feature map is calculated, and if the non-zero ratio of the ith channel is less than A, the ith convolution kernel of the convolutional layer is a redundant convolution kernel.

And finally, pruning redundant convolution kernels of each convolution layer from the network model, and then carrying out fine tuning by using original training data to finally obtain a pruned model.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. An automatic neural network pruning method based on feature map activation rate relates to the field of neural networks, and provides a novel automatic neural network pruning algorithm.

2. The method of claim 1, wherein the hyper-parameter is activation rate A.

3. The method of claim 2, wherein a is in the range of 0-1.

4. The method of claim 3, wherein when the activation rate of a feature map is less than A, the convolution kernel is determined to be a redundant convolution kernel.

5. The method of claim 4, wherein the redundant convolution kernels are pruned from the network.

6. The method of claim 5, wherein the method comprises the steps of:

step 1, acquiring the characteristic diagram;

step 2, judging the redundant convolution kernel;

and 3, pruning.

7. The method as claimed in claim 6, wherein step 1 is to input data into a neural network model to be pruned to perform forward reasoning, and obtain the feature map output by the convolutional layer.

8. The method according to claim 7, wherein the step 2 is to pass each layer of the feature map through the active layer from top to bottom, then calculate a non-zero ratio of each channel outputting the feature map, and if the non-zero ratio of the ith channel is smaller than the a, the ith convolution kernel of the convolutional layer is the redundant convolution kernel.

9. The method of claim 8, wherein the activation layer selects the Relu layer.

10. The method according to claim 9, wherein the step 3 prunes the redundant convolution kernel of each convolution layer from the neural network model, and then performs fine tuning by using original training data to obtain a pruned model.