CN110689113A

CN110689113A - Deep neural network compression method based on brain consensus initiative

Info

Publication number: CN110689113A
Application number: CN201910885350.3A
Authority: CN
Inventors: 申世博; 李荣鹏; 张宏纲; 赵志峰
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-01-14

Abstract

The invention provides a deep neural network compression method based on brain consensus initiative, which screens partial important channels layer by layer in a convolutional layer in the forward process of each neural network training, and sets the activation values of other channels to be zero. Thus, during back propagation of errors, the gradient values of the convolution kernels that generate these insignificant channels are zero and therefore are not updated or trained. Meanwhile, the updating process of the channel utility is embedded in the back propagation of the error, and the connection between the channel utility and the error is enhanced through a 'consensus initiative' method. Each iteration of the network is updated by selectively 'training' convolution kernels corresponding to the effective channels, so that when training is finished, channels with high channel effects are reserved, and channel pruning and deep neural network compression are achieved. The method greatly simplifies the general flow of the existing deep neural network compression method and has high efficiency.

Description

Deep neural network compression method based on brain consensus initiative

Technical Field

The invention relates to the field of artificial intelligence and neural network computing, in particular to a deep neural network compression method based on brain consensus activity.

Background

Over the years, the development of deep neural networks has led to a great revolution in the field of artificial intelligence. It is generally accepted that the performance of a deep neural network depends on its depth. However, a deep neural network tends to incur a large overhead in computation and storage. In order to make the deep neural network applicable to some low power devices, such as mobile phones, it is necessary to reduce the complexity. Among many model compression algorithms, channel pruning is a compression algorithm specifically designed for convolutional layers of deep neural networks.

Channel pruning refers to a model compression algorithm that prunes the channels of the convolutional layers of the deep neural network. And screening a plurality of channels with the best performance for the input image expression through different strategies or methods, and cutting off the rest channels to realize the compression of the deep neural network model. A general channel pruning algorithm comprises three basic steps: training a redundant neural network; clipping it according to a certain rule; and training the cut neural network to recover the model performance. This process is quite redundant and current channel pruning algorithms focus on the significance or importance of each channel itself, thus ignoring the inherent link between them.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a deep neural network compression method based on brain consensus activity, a model compression method for simultaneously carrying out deep neural network training and pruning, a plurality of channels with best cooperativity and strongest expressiveness are selected from all channels of one layer through the consensus activity, and the rest channels are pruned, so that network compression is realized.

The invention is realized by the following technical scheme: a deep neural network compression method based on brain consensus initiative specifically comprises the following steps:

(1) in each forward process of deep neural network training, for each layer of channel, according to the utility value of the initialized channel

Arranging the channels from high to low, reserving the channel activation value corresponding to the channel utility value under the pruning rate according to the set pruning rate,and the channel activation values for the remaining channels of the layer are set to zero. The above-mentioned

And (3) a long-term evaluation value of the channel of each layer of the neural network on the importance degree of the error of the deep neural network in the deep neural network training process, wherein l represents the index of the layer, and k represents the channel index of the layer.

(2) Determining normalized significance evaluation in back propagation process of deep neural network trainingThe method specifically comprises the following steps:

(2.1) in the back propagation process of deep neural network training, multiplying all channel activation values and gradients in each channel, accumulating and averaging to determine the significance evaluation of each channel

Wherein J represents an error function of the network;

represents the l < th > layer, the k < th > channel, the m < th > activation value; m is the number of all the activation values of one channel of the l-th layer.

(2.2) evaluation of significance of channelThrough L₂Normalization processing is carried out on the norm to obtain normalized significance evaluation

Comprises the following steps:

wherein,

is in the range of 0 to 1.

(3) And through a consensus initiative algorithm, the normalized significance evaluation among different channels is fused, and the interaction among the different channels is considered.

(3.1) evaluation of normalized significance between two channels by calculation

And

the product of the two channels is averaged according to the iteration number to obtain the correlation between the two channels

Wherein,

representing the correlation among the ith channel, the jth channel and the ith layer, and the value range is 0-1,and (4) participating in the iteration times of deep neural network training for the two channels.

(3.2) other channels of the same layer

And (3.1) calculated correlation

Multiplying and summing the two channels to obtain a fusion significance evaluation value

(3.3) mixing

Adding the strategy of moving average to the initial channel utility value in step 1

The method comprises the following steps:

wherein, λ represents an attenuation factor, the value range is between 0 and 1, and n is the iteration number of the channel participating in the deep neural network.

(4) Step 3 is carried out in a circulating way, and the channel utility values of all the channels are updatedUntil the deep neural network converges.

(5) After the deep neural network converges, the utility value is calculated according to the channel

And arranging the channels layer by layer, pruning the channels corresponding to the channel utility values under the pruning rate and generating convolution kernels of the channels according to the preset pruning rate, and realizing model compression and acceleration.

Compared with the prior art, the invention has the following beneficial effects: in the deep neural network training process, channels with strong expression capacity for input images are selectively identified and trained, and the learning process and the pruning process of the deep neural network are combined, so that the flow of the traditional neural network pruning algorithm is greatly simplified, and the efficiency of a compression algorithm is improved; by introducing a consensus active phenomenon among neurons in the brain, the internal relation among neurons in the same layer of the neural network is considered, so that the neural network after pruning also has high accuracy, and the performance of the neural network exceeds that of the existing algorithm. The compression method has the characteristics of simple realization, high efficiency and high accuracy of the compressed model.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

As shown in fig. 1, a deep neural network compression method based on brain consensus activity of the present invention specifically includes the following steps:

(1) channel utilityAnd (3) a long-term evaluation value of the channel of each layer of the neural network on the importance degree of the error of the deep neural network in the deep neural network training process, wherein l represents the index of the layer, and k represents the channel index of the layer. Those channels with high channel utility values are important for the neural network model, and if they are pruned, they will have a large impact on the training error, thereby degrading the model performance. Therefore, in each forward process of deep neural network training, for each layer of channels, the utility value of the channel is initialized according to the utility value of the channel

The channels are arranged according to the height of the channel, the channel activation value corresponding to the channel utility value under the pruning rate is reserved according to the set pruning rate, and the channel activation values of the other channels of the layer are set to be zero. The pruning rate is the proportion of the channels to be pruned in all the channels, the value range of the pruning rate is between 0 and 1, and the pruning rate is determined by comprehensively considering the performance loss and the compression yield of the deep neural network.

(2) Obtaining normalized significance evaluation in the back propagation process of deep neural network training

The method specifically comprises the following steps:

(2.1) inverse of deep neural network trainingIn the process of propagation, all the channel activation values and gradients in each channel are multiplied, accumulated and averaged, and the significance evaluation of each channel is determined

Wherein J represents an error function of the network;

(2.2) evaluation of significance of channel

Through L₂Normalization processing is carried out on the norm to obtain normalized significance evaluation

Comprises the following steps:

wherein,is in the range of 0 to 1.

(3) Through a consensus initiative algorithm, normalized significance evaluation among different channels is fused, interaction among the different channels is considered, the effect of cooperatively selecting effective channels can be achieved, and the accuracy of the compressed neural network is improved.

(3.1) evaluation of normalized significance between two channels by calculation

And

Wherein,representing the correlation among the ith channel, the jth channel and the ith layer, and the value range is 0-1,

and (4) participating in the iteration times of deep neural network training for the two channels.

(3.2) other channels of the same layer

And (3.1) calculated correlation

Fused significance evaluation value

The method considers the influence of other channels on the same layer on the current channel and is the core of the consensus initiative algorithm.

(3.3) mixing

The method comprises the following steps:

wherein, λ represents an attenuation factor, the value range is between 0 and 1, and n is the iteration number of the channel participating in the deep neural network. The attenuation factor has the effect that for each channel utility value, the attenuation is continuously carried out along with the increase of the iteration times; if the channel utility value decreases by an amount (due to the attenuation factor) that is less than the attenuation of the channel utility value by the amount (last term of equation (5)) of the increase of the channel utility value during an update such as (3.3), the channel may not participate in training (the activation value is set to zero) during the next training iteration (step 1), thereby achieving the effect of channel "screening".

(4) Step 3 is carried out in a circulating way, and the channel utility values of all the channels are updated

And then continuously selecting the effective channel until the deep neural network converges.

(5) After the deep neural network converges, the utility value is calculated according to the channelAnd arranging the channels layer by layer, pruning the channels corresponding to the channel utility values under the pruning rate and generating convolution kernels of the channels according to the preset pruning rate, and realizing model compression and acceleration. In the deep neural network training process, the method continuously calculates and updates the channel utility value of each channel, namely, the network pruning dependence standard is obtained in the neural network training process. Thus, network pruning can be directly carried out when the neural network training is finished, and the stream of the general pruning method is greatly simplifiedThe method has high efficiency.

Examples

An example of this method is given below. For example, the compressed VGG-16 deep neural network comprises 13 convolutional layers, each having channels with the number [64, 128,256, 512 ].

1. Given an input data set or input picture z⁰(ii) a Pruning Rate of each layer { p^lEither layer or layer of either layer or layer, either layer or layers being less than or equal to 0.5,1 ≦ l ≦ 13} passage; initialization model { conv_lL is more than or equal to 1 and less than or equal to 13 }; attenuation constant λ ← 0.8 and maximum number of iterations of training l_max. Since the present method aims at compressing convolutional layer parameters in a deep neural network, the notation "conv" only represents convolutional layers.

2. Iteration number i ← 0 for initializing neural network training, and channel utility value { u } of each layer^l← 0,1 ≤ l ≤ 13}, correlation matrix { R } of each layer^l←0,1≤l≤13}。

3. When the iteration number I is less than the maximum iteration step number I_maxIn time, the method performs training of the neural network. In the forward process of executing the neural network once, specifically, the following steps are performed layer by layer:

(3.1) calculating to obtain the activation value z of the output channel of each layer^l←conv_l(z^l-1)。

(3.2) initializing a binarization mask m^lAnd ← 0, the role of the mask being to indicate the selected channel.

(3.3) first, u is introduced^lThe sequence from high to low is performed. For all output channels of the current layer (with C)^lRepresenting its number), the method preserves C corresponding to the highest channel utility^l(1-p^l)＝0.5C^lThe activation value of each channel, specifically, the mask value of the corresponding position of these channels is set to 1

(3.4) multiplying the channel mask and the channel activation value by channel z^l←z^l·m^lResult inInput to the next layer.

4. And calculating the error J of the final neural network output.

5. A back propagation process of the neural network is performed, specifically, the following steps are performed layer by layer.

(5.1) calculating the channel gradient of each layer of channels

(5.2) the significance evaluations described by equation (1) and equation (2) are calculated and normalized.

(5.3) calculating and updating the counter that appears in equation (3): if it is not

Otherwise it remains unchanged.

(5.4) updating the correlation matrix R according to equation (3)^l。

(5.5) updating the importance evaluation θ of the channel using equation (4)^l←R^lθ^l。

(5.6) updating the channel utility u described in equation (5)^l←λu^l+(1-λ)θ^l。

6. When the maximum training step length is reached or the neural network converges, the channel utility u according to each layer^lAnd pruning a small half of the channel (the pruning rate of each layer is 0.5) layer by layer to obtain a channel corresponding to the channel utility and a convolution kernel for generating the channel. And copying the rest parameters into a more compact model, thereby realizing the training and pruning of the neural network.

The table below shows the accuracy that can be achieved by this method at different pruning rates (or compression rates) and in comparison with other methods. As shown in the table, when floating point operands, namely FLOPs, are compressed by about 35%, the method can still achieve 93.78% of accuracy, and the result exceeds the common norm-based pruning method; when the compression rate reaches 49.6%, the method can still maintain 93.68% of accuracy, and also exceeds the structured Bayes pruning method, and the compression rate is only 92.50%; when the compression rate reaches 75.2%, the method can achieve 92.72% of recognition accuracy of the compressed neural network with only 1.28% of accuracy loss. Therefore, the compression method can directly convert a redundant and unintelligent neural network into a compact neural network with rich expression capability.

Comparison of accuracy of different methods

Claims

1. A deep neural network compression method based on brain consensus initiative is characterized by comprising the following steps:

And arranging the channels from high to low, reserving channel activation values corresponding to the channel utility values under the pruning rate according to the set pruning rate, and setting the channel activation values of the rest channels of the layer to zero. The above-mentioned