CN111401523A

CN111401523A - Deep learning network model compression method based on network layer pruning

Info

Publication number: CN111401523A
Application number: CN202010177912.1A
Authority: CN
Inventors: 郭烈; 高建东; 赵剑; 刘蓬勃; 石振周
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-07-10

Abstract

The invention discloses a deep learning network model compression method based on network layer pruning, which comprises the following steps: for the convolutional neural network which finishes training, leading in the trained weight and carrying out sparse training aiming at a BN layer, wherein the BN layer has two training parameters which are Gamma and Beta respectively, and the sparse process is finished through multiple iterations; obtaining Gamma parameters of each BN layer of the network, setting a global channel pruning proportion, calculating a threshold value of the Gamma parameters according to the global pruning proportion, and setting all the Gamma parameters smaller than the threshold value to zero; setting the number I of shortcut network layers to be cut off, calculating POZ according to Gamma of BN layer _2 in shortcut, and deleting I shortcut structures with larger POZ; deleting convolution channels related to Gamma with other BN layers set to zero; and storing the network structure and parameters after pruning.

Description

Deep learning network model compression method based on network layer pruning

Technical Field

The invention relates to the technical field of deep learning convolutional neural network model compression acceleration and the like, in particular to a deep learning network model compression method based on network layer pruning.

Background

Compared with the traditional image processing algorithm, the deep convolutional neural network has the advantages that the accuracy is obviously improved, but as the number of network layers is increased, the model is more and more complex, and when the deep convolutional neural network is deployed to edge computing equipment, the phenomenon that the speed is low during network reasoning and the real-time requirement cannot be met is faced.

The current model pruning reasoning acceleration algorithm is divided into unstructured pruning and structured pruning, the unstructured pruning does not change the structure of a network, the weight of a deep convolutional neural network is judged, nodes with smaller weights are pruned, but the network is sparse after pruning, and the acceleration can be realized only through specially designed hardware. The structured pruning of the model is to prune at the level of a convolution neural network convolution kernel (filter), so that the structure of the network can be changed, and a certain acceleration effect can be obtained on different platforms in actual use. However, when the number of the deep learning network layers is too deep, the acceleration effect obtained by model channel pruning is limited due to the influence of hardware IO caused by network layer input and output in the calculation process.

Disclosure of Invention

According to the problems existing in the prior art, the invention discloses a deep learning network model compression method based on network layer pruning,

for the convolutional neural network which completes training, leading in the trained weight and carrying out sparse training aiming at the BN layer;

two training parameters are set in the BN layer, namely Gamma and Beta respectively,

the Gamma coefficient in the BN layer is restrained in the training process, so that the Gamma coefficient approaches to 0, and the process of sparsification is completed through multiple iterations;

obtaining Gamma parameters of each BN layer of the network, setting a global channel pruning proportion, calculating a threshold value of the Gamma parameters according to the global pruning proportion, and setting all the Gamma parameters smaller than the threshold value to zero;

setting the number I of the pre-cut shortcut network layers, calculating the Gamma parameter POZ of the BN layer _2 in each shortcut structure,

sorting each short structure from large to small according to the POZ of the BN layer-2, deleting the short structure corresponding to the POZ sorted at the first I,

deleting convolution channels related to Gamma with other BN layers set to zero;

and storing the structure and parameters of the pruned network, and performing fine tuning training on the stored network if necessary.

Further, the Gamma parameter POZ is calculated in the following way:

where M represents the Gamma coefficient dimension of the BN layer, f (x) is 1 when x is 0, and f (x) is 0 when x is non-zero.

Due to the adoption of the technical scheme, the deep learning network model compression method based on the network layer pruning reduces the IO operation time between network layers in the model reasoning process, improves the calculation speed and ensures that the accuracy influence of the model reasoning is small.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a structural convolutional neural network of a cut-out shortcut;

FIG. 2 is a flowchart of the deep learning network model compression method of the present invention.

Detailed Description

In order to make the technical solutions and advantages of the present invention clearer, the following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the drawings in the embodiments of the present invention:

as shown in fig. 1, for a network layer with a shortcut structure, the importance of the network layer is measured, and the corresponding network layer with the shortcut structure is deleted through an importance comparison algorithm. The general convolutional neural network is an input layer, a hidden layer and an output layer, most structures in the hidden layer are a convolutional layer-BN layer-an activation function layer, a shortcut structure in the network layer is shown in FIG. 1, namely an input _0, a convolutional layer _1, a BN layer _1, an activation layer _1, a convolutional layer _2, a BN layer _2, an activation layer _2, an ADD layer, and an ADD layer characteristic diagram is obtained by adding the activation layer _2 and the input _0, and pruning of the network layer is carried out on the shortcut structure in the network.

As shown in fig. 2, a deep learning network model compression method based on network layer pruning specifically includes the following steps:

the training of the deep learning network is completed aiming at the network task of the user, the trained weight is introduced to carry out the sparse training aiming at the BN layer of the convolutional neural network,

the BN layer has two trainable parameters which are Gamma and Beta respectively,

namely, the Gamma coefficient in the BN layer is restrained in the training process, so that the Gamma coefficient approaches 0. And completing the thinning process through multiple iterations.

The method comprises the steps of obtaining Gamma parameters of each BN layer of the network, setting a global channel pruning proportion, calculating a threshold value of the Gamma parameters according to the global pruning proportion, and setting all the Gamma parameters smaller than the threshold value to zero.

Setting the number I of shortcut network layers to be cut off, calculating POZ (percent of zero) for Gamma parameter of BN layer _2 in each shortcut structure,

And sequencing each short structure from large to small according to the POZ of the BN layer _2, and deleting the I short structures with larger POZ, thereby achieving the purpose of pruning the network layer.

Deleting convolution channels related to Gamma with other BN layers set to zero, storing the structure and parameters of the pruned network, and performing fine tuning training on the stored network if necessary.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A deep learning network model compression method based on network layer pruning is characterized by comprising the following steps:

setting the number I of the pre-cut shortcut network layers, and calculating a Gamma parameter POZ of a BN layer _2 in each shortcut structure;

2. The deep learning network model compression method of claim 1, further characterized by: the Gamma parameter POZ is calculated by adopting the following method: