CN113033628A

CN113033628A - Self-adaptive neural network compression method

Info

Publication number: CN113033628A
Application number: CN202110255097.0A
Authority: CN
Inventors: 侯向辉; 袁智龙; 沈宁; 袁晨; 李泽昊
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-06-25

Abstract

The invention relates to a self-adaptive neural network compression method, wherein a first part is to preset an expected accuracy rate; the second part is to use an adaptive pruning algorithm to compress and remove most unnecessary connections in the neural network; and the third part is to use a self-adaptive weight sharing algorithm to realize weight sharing and further realize the reduction of parameters. The invention can reduce the parameter quantity of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce the human intervention to the maximum extent.

Description

Self-adaptive neural network compression method

Technical Field

The invention relates to the technical field of computer neural networks, in particular to a self-adaptive neural network compression method.

Background

The research of deep learning neighborhoods has continued to heat up in recent years and has achieved tremendous success in computer vision, speech recognition, and natural language processing neighborhoods. The neural network obtains a strong learning ability at the cost of the expansion of the parameter number. LeNet completed the classification of handwritten digits with a parameter volume of 1M in 1998, and Alexnet reached 60M by 2012 when it took the champion on the ImageNet dataset. In the following years, the parameters of neural networks are increasing, and the amount of vgg16 parameters is even more than 500 MB. Although these parameters are very versatile, they consume a large amount of storage, memory and computational resources. This phenomenon of over parameterized (over parameterized) makes some neural networks deployable only on some specialized servers. Algorithms for compressing neural networks are available to effectively reduce the number of parameters in the neural network, but these methods require manual intervention to pre-configure some empirical parameters. Experiments have shown that neural networks are quite sensitive to these empirical parameters, which have a large impact on the effect of the final compression. How to automatically and effectively compress the neural network and reduce human intervention to the maximum extent is a practical problem with great research value. In the prior art, when the neural network is compressed, the compression effect is greatly influenced by the manually-intervened experience parameters.

Disclosure of Invention

The invention aims to overcome the defects and provides a self-adaptive neural network compression method, which can reduce the number of parameters of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce human intervention to the greatest extent.

The invention achieves the aim through the following technical scheme: an adaptive neural network compression method, comprising the steps of:

(1) presetting a desired accuracy;

(2) training a neural network, and pruning the neural network by using a self-adaptive pruning algorithm;

(3) based on the pruned neural network, the weight sharing is carried out on the rest parameters in the neural network by using a self-adaptive weight sharing algorithm, so that the number of the parameters is further reduced, and the compression of the neural network is realized.

Preferably, the step (2) adopts a binary calculation threshold method in the pruning process, which is specifically as follows:

(2.1) training a common neural network by using a conventional method;

(2.2) calculating a suitable threshold value using dichotomy for each layer in the neural network such that the accuracy of the neural network is not significantly degraded upon removal of connections having an absolute value less than the threshold value;

(2.3) removing connections in the neural network whose absolute values are smaller than the threshold calculated in step (2);

(2.4) continuing to train the connection which is not removed in the neural network, and repeating the step (2.2) and the step (2.3) until no connection which is not removed exists in the neural network.

Preferably, the step (2.2) is specifically as follows:

(2.2.1) setting upper and lower bounds of upper two halves: the lower bound of the dichotomy is 0, and the upper bound of the dichotomy is the maximum value of the absolute value in the current layer;

(2.2.2) dividing a proper threshold value for a certain layer of the neural network according to a preset accuracy rate; the expected accuracy rate needs to be preset in the process of calculating the threshold, and then a threshold is calculated in the upper and lower boundaries of the dichotomy, so that the accuracy rate on the test set is not lower than the expected accuracy rate after all the connections smaller than the threshold in the original network are removed.

Preferably, the step (3) is specifically as follows:

(3.1) searching a proper threshold value for each layer in the neural network model according to the expected accuracy rate, so that the accuracy rate on the test set is not lower than the expected accuracy rate after sharing according to the weight value of the threshold value;

(3.2) realizing weight sharing by utilizing a clustering algorithm, wherein in the clustering algorithm, for each cluster, the mean value of the elements in the cluster is selected as the center of the cluster; it is specified that for a threshold, the distance of the elements within each cluster to the cluster center cannot exceed this threshold:

(3.3) finely adjusting the value of the clustering center of mass of each layer; in each round of training process, forward propagation is carried out firstly, and the gradient of the whole centroid is the gradient sum of all the same types of connection with the centroid in the backward propagation process.

Preferably, in the step (3), the neural network is further compressed by using weight sharing through reducing the number of bits consumed during storage, and by allowing a plurality of connections with similar weights to share the same weight, the network is stored by only storing which type of the connected weight belongs to, rather than a specific value thereof; if k classes are used for weight sharing, then it is necessary to do so

Bits are used for storing which type the weight value corresponds to; if it is originally necessary to store n b bits, the compression ratio can be calculated as follows:

preferably, the step (3.2) is specifically as follows:

(3.2.1) sorting the values in the original set;

(3.2.2) maintaining two head and tail pointers, and moving the tail pointer backwards each time until the difference of the elements pointed by the head and tail pointers is larger than a selected threshold value; if the difference of the elements pointed by the head and tail pointers is larger than the selected threshold value, classifying the elements between the head and tail pointers into one class; and the steps are executed in a loop until all the clustering is finished.

The invention has the beneficial effects that: the invention can reduce the parameter quantity of the neural network under the condition of keeping the accuracy of the original model, automatically and effectively compress the neural network and reduce the human intervention to the maximum extent.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the approximate monotonicity of the accuracy and threshold values of an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:

example (b): as shown in fig. 1, an adaptive neural network compression method consists of three parts: the first part is to preset a desired accuracy; the second part is to use an adaptive pruning algorithm to compress and remove most unnecessary connections in the neural network; and the third part is to use a self-adaptive weight sharing algorithm to realize weight sharing and further realize the reduction of parameters. Through these three parts, the algorithm compresses LeNet on the minst dataset, eliminating 94% of unnecessary connections at the cost of 0.3% accuracy. The method comprises the following steps:

step 1: presetting expected accuracy;

step 2: the neural network only learns important connections by using a self-adaptive pruning algorithm, and most unnecessary connections are removed; the method comprises the following specific steps:

step 2.1: a common neural network is trained using conventional methods. This step is not to learn the final weight of each connection in the network, but rather to learn the important connections in the network.

Step 2.2: a suitable threshold is calculated using bisection for each layer in the network so that the accuracy of the neural network does not significantly degrade after removing connections whose absolute values are less than this threshold. As shown in fig. 2, the method is a schematic diagram of a monotonic relationship between a threshold value selected by pruning and a network accuracy rate after pruning.

Step 2.2.1: setting upper and lower limits of the upper half: the lower bound of the dichotomy is 0, and the upper bound of the dichotomy is the most significant of the absolute values in the current layer.

Step 2.2.2: and according to the preset accuracy, dividing a proper threshold value for a certain layer of the neural network.

Step 2.3: the connections in the neural network whose absolute value is smaller than the threshold value calculated in the second step are removed. After this step, the original dense network will become a sparse network.

Step 2.4: the connections in the network that are not removed are trained and the second step is then repeated.

And step 3: and the remaining parameters are subjected to weight sharing by using a self-adaptive weight sharing algorithm, so that the number of the parameters is further reduced.

Step 3.1: and searching a proper threshold value for each layer in the model according to the expected accuracy rate, so that the accuracy rate on the test set is not lower than the expected accuracy rate after the sharing according to the threshold value weight.

Step 3.2: the clustering algorithm constructed by the invention is specified to be a threshold, and the distance from each element in each cluster to the cluster center cannot exceed the threshold.

Step 3.3: and finely adjusting the value of the clustering center of mass of each layer. Because the accuracy of the compressed neural network is guaranteed to be higher than the expected accuracy in the binary search process, the centroid does not need to be repeatedly adjusted, and even the centroid can be selected not to be adjusted.

Weight sharing may further compress the neural network by reducing the number of bits consumed in storage. By sharing the same weight value among a plurality of connections with similar weight values, the network can be stored only by storing which kind of connection weight value belongs to, but not by storing the specific value of the connection weight value. If k classes are used for weight sharing, then it is necessary to do so

wherein the step 3.2 is as follows:

(3.2.1) sorting the values in the original set.

(3.2.2) maintaining two head and tail pointers, moving the tail pointer backwards each time until the difference between the elements pointed by the head and tail pointers is larger than the selected threshold. At this point, the elements between the head and tail pointers are classified into one class, and then the second step is repeated.

Obviously, the clustering algorithm constructed by the invention can ensure that the number of clusters is minimum. For each cluster, the mean of the elements within this cluster is chosen as the center of the cluster. Because the range within a cluster does not exceed the threshold, the distance of any element to the cluster center does not exceed the threshold.

In summary, the pruning algorithm is optimized, so that the neural network can be automatically pruned, and unnecessary connections in the neural network can be deleted while the accuracy of the neural network is not obviously reduced by dividing into two appropriate thresholds. The invention compresses LeNet on the minst data set, and eliminates 94% of unnecessary connection at the cost of 0.3% of accuracy loss.

While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An adaptive neural network compression method, comprising the steps of:

(1) presetting a desired accuracy;

2. The adaptive neural network compression method of claim 1, wherein: in the step (2), a binary calculation threshold method is adopted in the pruning process, and the method specifically comprises the following steps:

(2.1) training a common neural network by using a conventional method;

3. The adaptive neural network compression method of claim 2, wherein: the step (2.2) is specifically as follows:

4. The adaptive neural network compression method of claim 1, wherein: the step (3) is specifically as follows:

5. The adaptive neural network compression method of claim 4, wherein: in the step (3), the weight sharing is utilized to further compress the neural network by reducing the number of bits consumed in storage, and by enabling a plurality of connections with similar weights to share the same weight, the network is stored only by storing which type of the connected weight belongs to, but not the specific value thereof; if k classes are used for weight sharing, then it is necessary to do so

6. the adaptive neural network compression method of claim 1, wherein: the step (3.2) is specifically as follows:

(3.2.1) sorting the values in the original set;