CN110097177B

CN110097177B - Network pruning method based on pseudo-twin network

Info

Publication number: CN110097177B
Application number: CN201910400920.5A
Authority: CN
Inventors: 闵锐; 蓝海
Original assignee: Electric Coreda Chengdu Technology Co ltd
Current assignee: Electric Coreda Chengdu Technology Co ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2022-11-29
Anticipated expiration: 2039-05-15
Also published as: CN110097177A

Abstract

The invention belongs to the technical field of neural network model compression, and particularly relates to a network pruning method based on a pseudo-twin network. The invention provides a universal method for cutting redundant connection in a large convolutional neural network on the basis of the traditional ternary neural network, which specifically comprises the following steps: constructing a Ternary network Ternay-N with the same structure as the cut network N, training each step by using the same picture, adopting a knowledge distillation algorithm in the training process, cutting the weight of the corresponding position in the N by using the weight in the Ternay-N after the training is finished, and finally finely adjusting the N, wherein the connection number of the finely adjusted network is greatly reduced compared with that of the original network.

Description

Network pruning method based on pseudo-twin network

Technical Field

The invention belongs to the technical field of neural network model compression, and relates to a network pruning method based on a pseudo-twin network.

Background

Under the background of rapid development of artificial intelligence, a neural network has become a key technology for realizing artificial intelligence. Among many neural networks, convolutional neural networks have been the focus of research because of their excellent performance in image classification and target detection. However, since the modern convolutional neural network occupies too much memory resources and computational resources, it affects its deployment in real-time hardware systems with limited resources. Therefore, how to compress the model size of the convolutional neural network and reduce its computational effort becomes the key to deploying artificial intelligence technology in a limited real-time hardware system.

Implementing compression of convolutional neural networks requires joint solutions from many disciplines, including but not limited to machine learning, optimization, computer architecture, data compression, indexing, and hardware design. Focusing mainly on model compression methods in terms of network nodes, reviewing the practical methods proposed in this regard by researchers in recent years, we can classify these methods into four categories: parameter pruning and sharing, low rank decomposition, transfer/compact convolution filters and knowledge distillation.

Unstructured pruning is an important method for compressing the model size and reducing the computational load of convolutional neural networks, and achieves the purpose of compressing and accelerating convolutional neural networks by clipping the weights and connections in the convolutional neural networks.

Parameter sharing is realized by sharing the same parameter set by the weight of the network model, so as to achieve the purpose of model compression. One important method is network quantization, which compresses the original network by reducing the number of bits required to represent each weight. There is work to apply k-means scalar quantization to parameter values. It has also been shown by scholars that 8-bit quantization of parameters can lead to significant speedup with minimal loss of precision. There are also efforts to use 16-bit fixed-point representation in CNN training based on random rounding, which significantly reduces memory usage and floating-point operations with little loss in classification accuracy. In the extreme case of a 1-bit representation of each weight, i.e., a binary-weighted neural network, there are also many jobs to directly train CNNs with binary weights, e.g., binaryConnect, binaryNet, and xnorretwork.

Network pruning is the objective of compression by removing non-critical connections or parameters in the network. The early methods of network pruning are based on the size of the weight values. Researchers have proposed optimal brain injury and optimal brain surgeon methods that reduce the number of connections using a clipping method based on the loss function Hessian, experiments have shown that such clipping provides higher accuracy than weight-based clipping (e.g., weight decay methods). The latest trend of network pruning is to prune redundant non-information weight in a pre-trained CNN model and to remove redundant neurons based on a no-data pruning method. However, the current pruning technique still has a big problem in the aspects of pruning rate and precision maintenance.

Disclosure of Invention

The invention aims to solve the problems and provides a network pruning method based on a pseudo-twin network.

The technical scheme of the invention is as follows:

1) Reading the network structure of the pruned network, and then constructing a network which is the same as the pruned 32bit network, wherein the network weights are all three values, and the activation function is 32bit, namely the weight parameter W of the network belongs to-1,0,1, and the three-value network is called a pseudo-twin network.

2) And updating the parameters of the pruned network by using a traditional random gradient descent method by taking the target data set as the input of the network, taking the cross entropy of the network output vector and the real vector of the target as a loss function. Then the twin network is trained by adopting a knowledge distillation method. In knowledge distillation, let the output of the pruned network be T (i) and the output of the twin network be S (i). The true distribution of the targets is L (i) cost function L of the pruned network _t ：

Cost function L of twin network _s ：

The figure is shown in figure 1;

3) The training algorithm mainly comprises the following three steps, which are shown in the figure 2:

1. performing simple preprocessing on the image, including normalization and resizing;

and (3) circulation:

2. inputting the training image into the pruned network, and outputting L by the pruned network _t The weights of the teacher network are updated.

3. Putting the same training image in the step b into a twin network by L _s The ternary weights in the twin network are updated.

When L is _t And L _s When the number of the branches is less than 0.01, the circulation is ended, and the weights of the pruned network and the twin network are stored;

4) The method comprises the specific steps of using a twin network as a template to cut the pruned network, multiplying the absolute value of a convolution kernel of the pruned network with the convolution kernel in the twin network to obtain a new convolution kernel, cutting a part with a weight value of 0 in the new convolution kernel, and replacing the convolution kernel in the pruned network with the cut new convolution kernel. The figure is shown in figure 3;

5) The cropped network is further trained to fine tune the parameters therein. The fine tuning method is divided into two types:

firstly, the method comprises the following steps: and (4) reserving the parameters left by the original network and finely adjusting the parameters of the three-value network. The diagram is shown in fig. 4.

2. And (4) reserving the parameters left by the three-value network and finely adjusting the parameters of the original network. The diagram is shown in figure 5.

The trimmed pruned network is the pruned network.

The invention has the advantages that the precision of the pruned network can not be obviously reduced, and the pruning speed is high.

Drawings

FIG. 1 is a calculation of a teacher and student network cost function;

FIG. 2 is a training process for a teacher and student network;

FIG. 3 is an example of pruning a teacher network using a student network as a template;

FIG. 4 is an example of a fine tuning approach;

fig. 5 shows another example of the trimming method.

Detailed Description

Take Le-Net network and MNIST data set as examples:

the MNIST data set is composed of handwritten digital images, a training set comprises 55000 samples, a testing set comprises 10000 samples, a verification set comprises 5000 samples, and each sample has corresponding label information, namely label. All digital images were size normalized and concentrated in a fixed size image of 28x 28 pixels. In the original data set, each pixel of the image is represented by a value between 0 and 255, where 0 is black and 255 is white, and anything in between is a different shade of gray.

Le-Net is a convolutional neural network proposed in 1986 for recognizing handwritten characters, and has 3 convolutional layers, two pooling layers, and a full-link layer and an output layer. The result of clipping it with our method is as follows:

tables 1 and 2 show the results of the first trimming mode, and tables 3 and 4 show the results of the second trimming mode:

TABLE 1

Network layer	Amount of ginseng	Pruning rate
			Conv1	150	0.413
Conv2	2400	0.467
			Conv3	48000	0.528
FC1	10080	0.401
			FC2	840	0.407
Total	61470	0.503

TABLE 2

Name of model	Error rate
		LeNet	1.39％
Three-valued LeNet	1.97％
		Cut LeNet	1.49％

The following is to keep the ternary network unchanged:

TABLE 3

Network layer	Amount of ginseng	Pruning rate
			Conv1	150	0.405
Conv2	2400	0.201
			Conv3	48000	0.413
FC1	10080	0.436
			FC2	840	0.423
Total	61470	0.408

TABLE 4

Name of model	Error rate
		LeNet	1.39％
Three-valued LeNet	1.97％
		Cut LeNet	1.37％

As can be seen from the table, le-Net is pruned about 40% of redundant connections and the error rate is also reduced. This demonstrates the feasibility of the process of the invention.

Taking an AlexNet network and a Cifar-10 dataset as examples:

the Cifar-10 dataset had 60000 color images, 32 × 32, divided into 10 classes of 6000 images each. The inner part is 50000 for training, and 5 training batches are formed, wherein 10000 graphs in each batch are formed; another 10000 was used for testing, constituting a batch individually. From the test lot data, 1000 sheets were randomly taken from each of 10 categories. The remainder are randomly arranged to form a training batch. Note that the number of images in each class is not necessarily the same in a training batch, and there are 5000 images in each class for the training batch as a whole.

One proposed network for image recognition by AlexNet network design 2012 has 5 full convolutional layers and three full link layers. The results of clipping it with our method are as follows:

tables 5 and 6 show the results of the first trimming method, and tables 7 and 8 show the results of the second trimming method:

TABLE 5

TABLE 6

Model	Error Rate
		AlexNet	0.232
Ternary AlexNet	0.253
		Pruned AlexNet	0.181

TABLE 7

Layer	Params	Compression Rate
			Conv1	4800	40.4％
Conv2	153600	39.4％
			Conv3	110592	40.5％
Conv4	147456	41.2％
			Conv5	147456	40.0％
Fc1	4718592	58.0％
			Fc2	16777216	62.6％
Fc3	40960	51.2％
			Total	22100672	62.0％

TABLE 8

As can be seen from the table, alexNet pruned about 62% of the redundant connections and the error rate dropped by 3.4%, which also demonstrates the feasibility of our approach.

Claims

1. A network pruning method based on a pseudo-twin network is characterized by comprising the following steps:

s1, constructing a pseudo-twin network according to a pruned target network, wherein the network weight of the pseudo-twin network is three values;

s2, reading a training sample, training a target network by adopting a random gradient descent method, setting the output of the target network as T (i), and setting the cost function of the target network as follows:

where n is the number of classes, L (i) is the true distribution of the target;

updating the weight of the target network by using the output T (i) of the target network;

the same training sample is adopted to train the pseudo-twin network, the output of the pseudo-twin network is set as S (i), and the cost function of the pseudo-twin network is as follows:

by means of L _s Updating the ternary weights in the pseudo-twin network;

training process at L _t And L _s Ending when convergence occurs, and storing the weights of the target network and the pseudo-twin network;

s3, taking the pseudo twin network as a template, and cutting the target network:

multiplying the absolute value of the convolution kernel of the target network with the convolution kernel in the pseudo-twin network to obtain a new convolution kernel, clipping the part with the weight value of 0 in the convolution kernel, and replacing the convolution kernel in the target network with the clipped new convolution kernel.

2. The pseudo-twin network-based network pruning method according to claim 1, further comprising:

and S4, reserving parameters left by the target network, and adjusting the parameters of the pseudo-twin network by using a random gradient descent method.

3. The pseudo-twin network-based network pruning method according to claim 1, further comprising:

and S4, reserving parameters left by the pseudo twin network, and adjusting the parameters of the target network by using a random gradient descent method.