CN116957044B

CN116957044B - Automatic compression method of convolutional neural network model

Info

Publication number: CN116957044B
Application number: CN202311034626.XA
Authority: CN
Inventors: 钟阳宇; 杜凯; 刘忠新; 邓强; 温研
Original assignee: Beijing Linzhuo Information Technology Co Ltd
Current assignee: Beijing Linzhuo Information Technology Co Ltd
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2024-02-23
Anticipated expiration: 2043-08-17
Also published as: CN116957044A

Abstract

The invention discloses an automatic compression method of a convolutional neural network model, which adopts set selection conditions and limiting conditions to determine a convolutional operator and a batch normalization operator which exist in the convolutional neural network model and can be pruned, and a specific pruning mode is adopted for the convolutional operator and the batch normalization operator respectively to complete pruning, so that the function of automatically pruning the model is realized by only setting initial parameters and the like, the time and the labor required by the pruning process of the model are effectively reduced, and in addition, the pruning effect is further improved for the size of a resynchronization channel of the convolutional operator and the batch normalization operator which are positioned in the same structural layer after pruning.

Description

Automatic compression method of convolutional neural network model

Technical Field

The invention belongs to the technical field of deep learning model compression, and particularly relates to an automatic compression method of a convolutional neural network model.

Background

With the development of artificial intelligence technology, deep learning models have been widely used in related technical fields such as computer vision, speech recognition and natural language processing. However, deep learning models often have a relatively complex structure, and training and use thereof often require a large amount of memory and computing resources, which limits the application of the deep learning model in certain scenarios. Model pruning fundamentally solves the computational and memory pressures by reducing the number of operands involved in the computation, and therefore, model pruning is becoming a critical technique in deep learning model applications. Specifically, model pruning for the convolutional neural network model means that the complexity of the model is reduced by deleting unnecessary neurons, nodes or connections, the calculation amount of the model is reduced, the model is more compact, the storage space and the calculation amount are further reduced, and the operation efficiency of the model is improved. However, the existing model structure is different and complex, and convolutional neural networks with different pruning needs to know the structure and pruning method, and more time and manpower are needed. Therefore, the automatic pruning convolutional neural network model is an indispensable technology in deep learning applications.

Disclosure of Invention

In view of the above, the invention provides an automatic compression method of a convolutional neural network model, which realizes automatic pruning operation of specific convolutional operators and batch normalization operators in the convolutional neural network model.

The invention provides an automatic compression method of a convolutional neural network model, which comprises the following steps:

step 1, acquiring a network structure of a convolutional neural network model to be pruned, wherein the network structure comprises a structural layer, a data flow direction, an operator and an operator weight channel;

step 2, acquiring all convolution operators and batch normalization operators from the network structure, automatically selecting operators according to selection conditions to construct a to-be-determined operator set, and removing operators meeting any limiting condition in the to-be-determined operator set to form a to-be-pruned operator set; ending the flow if the operator set to be pruned is empty, otherwise executing the step 3;

step 3, for the batch normalization operators in the operator set to be pruned, executing step 4 after model sparsification is carried out on the batch normalization operators; step 5 is executed for the convolution operator;

step 4, using the maximum learning parameters of all batch normalization operators in the convolutional neural network model to be prunedThe minimum value of the first threshold upper limit is used as a first threshold upper limit, a first parameter list is formed by arranging learning parameters of batch normalization operators in the operator set to be pruned according to the sequence from small to large, the position of the first threshold upper limit in the first parameter list is i, and the maximum value Loc of the pruning rate is obtained according to the formula (i+1)/len1 x 100 percent _BN Where len1 is the length of the first parameter list; the value is set to be smaller than Loc _BN Deleting operator weight channels of batch normalization operators with learning parameter values smaller than the upper limit of a first threshold along the data flow; executing the step 6;

step 5, taking the minimum value of the maximum L1 norms of all convolution operators in the convolution neural network model to be pruned as a second threshold upper limit, forming a second parameter list by arranging the L1 norms of the convolution operators in the convolution operator set to be pruned in order from small to large, wherein the second threshold upper limit is the first threshold upper limitThe position in the two parameter list is j, and the maximum value Loc of the pruning rate is obtained according to the formula (j+1)/len 2 which is 100 percent _Conv Where len2 is the length of the second parameter list; the value is set to be smaller than Loc _Conv Deleting operator weight channels of convolution operators with L1 norm values smaller than the upper limit of the second threshold along the data flow;

step 6, if operators in the operator set to be pruned are processed, training the current convolutional neural network model by setting training parameters, if the model scale obtained by training does not accord with a preset value, executing step 1, otherwise, outputting the convolutional neural network model to end the process; and if the operators in the operator set to be pruned are not all processed, executing the step 3.

Further, the selection condition in the step 2 is to select a two-dimensional convolution operator with equal length and width convolution kernels and a one-dimensional or two-dimensional batch normalization operator.

Further, the limiting conditions in the step 2 include:

the method comprises the following steps that firstly, convolution operators of N operator weight channels are arranged in the same structural layer, the weights of N-1 operator weight channels corresponding to the convolution operators are the same, and the convolution operators of three continuous operators of the same type are connected backwards in the same structural layer; a second condition is a convolution operator or a batch normalization operator connected with the full connection layer; and thirdly, a convolution operator or a batch normalization operator contained in a residual structure layer, wherein the convolution operator or the batch normalization operator is provided with an Add operator, and a certain input of the Add operator is identical to the output of an operator of a non-adjacent structure layer of an upper layer.

Further, the succession may be referred to as serial or parallel.

Further, the step 6 further includes:

for convolution operators and batch normalization operators which are positioned in the same structural layer, acquiring the number of coincident channels between operator weight channels of the two operators before pruning, taking the number of coincident channels as the final operator weight channel number of the two operators, and respectively deleting non-coincident channels in the convolution operators and the batch normalization operators to finish pruning; if the number of the coincident channels is 0, obtaining a new pruning rate Loc according to a formula A [ log2C ] [ K/S (%), wherein A is the total number of convolution operators in the model, C is the number of convolution kernels, K is the size of the convolution kernels, and S is the size of model parameters; setting pruning rate with value smaller than Loc, deleting operator weight channels of convolution operators with L1 norm value smaller than a second threshold upper limit along the data flow direction, and taking larger values of the operator weight channels of the convolution operators and the batch normalization operators after pruning as the final operator weight channel numbers of the convolution operators and the batch normalization operators; and reserving the operator weight channel position of the operator with a larger value, and replacing the original weight value with the current weight value of each structural layer.

Advantageous effects

According to the invention, the convolution operator and the batch normalization operator which exist in the convolution neural network model and can be pruned are determined by adopting the set selection conditions and the set limiting conditions, and the special pruning mode is adopted for the convolution operator and the batch normalization operator respectively to complete pruning, so that the function of automatically pruning the model is realized by only setting initial parameters and the like, the time and the labor required by the model pruning process are effectively reduced, and in addition, the synchronization of the channel size is required to be executed after the convolution operator and the batch normalization operator which are positioned in the same structural layer are pruned for further improving the pruning effect.

Detailed Description

The present invention will be described in detail with reference to the following examples.

The invention provides an automatic compression method of a convolutional neural network model, which specifically comprises the following steps:

step 1, identifying a network structure of a convolutional neural network model acquisition model to be pruned, wherein the network structure comprises: the method comprises the steps of setting a number for a structural layer, a data flow direction, an operator and an operator weight channel.

The structural layer refers to each layer contained in the convolutional neural network model, such as an input layer, a convolutional layer, a pooling layer, a full-connection layer, an output layer and the like; the data flow direction refers to the transmission direction of data in the convolutional neural network model, such as the transmission of input data from an input layer to a convolutional layer; the operators refer to mathematical calculation units contained in each layer in the convolutional neural network model, such as convolution operators, batch normalization operators and the like; the operator weight channel refers to an output channel in a convolution kernel for a convolution operator, and for a batch normalization operator, the operator weight channel is a dimension number of the weight because the weight of the batch normalization operator is one-dimensional, for example, the convolution kernel of the convolution operator is stored in the weight and is specifically [27,12,3,3], the four values are respectively the output channel, the input channel, the length of the convolution kernel and the width of the convolution kernel, and the operator weight channel is the output channel.

The convolutional neural network (Convolutional Neural Network, CNN) model is a deep learning model, mainly used for processing image data, and its basic structure generally includes: the convolution layer, the pooling layer and the full connection layer form a basic structure of the convolution neural network model by stacking the three structural layers. In addition, other layers, such as batch normalization layers and activation layers, may also be added to the convolutional neural network model, depending on the particular problem.

Step 2, acquiring all convolution operators and batch normalization operators from the network structure obtained in the step 1, and automatically selecting operators according to selection conditions, namely selecting two-dimensional convolution operators with equal length and width convolution kernels and one-dimensional or two-dimensional batch normalization operators to construct a set of operators to be determined; removing operators meeting any limiting condition in the operator set to be determined to form a pruning operator set to be pruned; and if the operator set to be pruned is empty, ending the flow.

Wherein the limiting conditions include: the method comprises the following steps that firstly, convolution operators of N operator weight channels are arranged in the same structural layer, the weights of N-1 operator weight channels corresponding to the convolution operators are the same, the convolution operators of three continuous operators of the same type are connected backwards in the same structural layer, and the continuous operation refers to serial or parallel operation; a second condition is a convolution operator or a batch normalization operator connected with the full connection layer; and thirdly, a convolution operator or a batch normalization operator contained in a residual structure layer, wherein the convolution operator or the batch normalization operator is provided with an Add operator, and a certain input of the Add operator is identical to the output of an operator of a non-adjacent structure layer of an upper layer.

And step 3, for operators in the operator set to be pruned, if the operators are batch normalization operators, executing the step 4 after model sparsification is carried out on the batch normalization operators, and if the operators are convolution operators, executing the step 5.

Step 4, acquiring all batches of normalization operators in the convolutional neural network model to be pruned, and recording the maximum learning parameter of each batch of normalization operatorsBNMax_n, wherein n is the number of the batch normalization operator, and the minimum value of BNMax_n is used as the upper threshold limit L of the batch normalization operator in the convolutional neural network model to be pruned _BN The method comprises the steps of carrying out a first treatment on the surface of the Acquiring learning parameters of all batches of normalization operators in to-be-pruned operator setFrom these learning parametersForming a first parameter list according to the order from small to large, and determining the upper threshold limit L _BN The position i in the first parameter list is used for calculating the maximum value Loc of the pruning rate according to the following formula _BN ：

Loc _BN =（i+1）/len1*100% （1）

Where len1 is the length of the first parameter list.

Setting the satisfied value smaller than Loc _BN Deleting operator weight channels of batch normalization operators with learning parameter values smaller than the upper threshold value along the data flow from input to output in a convolutional neural network model to be pruned, and recording the arrangement positions of the operator weight channels before pruning.

Specifically, if the dimension of the batch normalization operator is [27], there are 27 permutation positions, and in order to determine the operator weight channel actually deleted in the pruning process, the 27 permutation positions need to be recorded.

Step 5, acquiring all convolution operators in the convolution neural network model to be pruned, recording the maximum L1 norm of all the convolution operators, and taking the minimum value of all the maximum L1 norms as the threshold of the convolution operators in the convolution neural network model to be prunedUpper value limit L _Conv The method comprises the steps of carrying out a first treatment on the surface of the Obtaining L1 norms of all convolution operators in a to-be-pruned operator set, forming a second parameter list by arranging the L1 norms in order from small to large, and determining a threshold upper limit L _Conv The position j in the second parameter list calculates the maximum value Loc of the pruning rate of the convolution operator according to the following formula _Conv ：

Loc _Conv =（j+1）/len2*100% （2）

Where len2 is the length of the second parameter list.

Setting the satisfied value smaller than Loc _Conv The pruning rate of (2) is smaller than the upper threshold value L along the deleting L1 norm value from the input to the output data flow in the convolutional neural network model to be pruned _Conv And recording the arrangement positions of the operator weight channels before pruning.

Specifically, the convolution kernel size of each convolution operator may be different, if the L1 norm calculates the sum of the convolution kernels of 12 3*3, then the convolution operator weight [27,12,3,3] has 27 permutation positions, and in order to determine the operator weight channel actually deleted in the pruning process, the 27 permutation positions need to be recorded.

Step 6, comparing and counting the number of the coincident channels between operator weight channels of the two reserved operators before pruning for the convolution operators and the batch normalization operators which are positioned in the same structural layer, and taking the number of the coincident channels as the final operator weight channel number of the two operators, and respectively deleting non-coincident channels in the convolution operators and the batch normalization operators to finish pruning operation;

if the number of the coincident channels is 0, calculating the pruning rate Loc of the convolution operator by adopting the following formula: loc=a [ log2C ]. Times.k/S (%) (3)

Wherein A is the total number of convolution operators in the model, C is the number of convolution kernels, K is the size of the convolution kernels, S is the size of model parameters (the unit is MB, the stored data type is FP 32), pruning rate meeting the value smaller than Loc is set, and L1 norm value smaller than the upper threshold value L is deleted from the data flow direction from input to output in the convolutional neural network model to be pruned _Conv Operator weight channels of convolution operators of (1) and pruningThe larger value of the operator weight channel number of the convolution operator and the batch normalization operator is used as the final operator weight channel number of the convolution operator and the batch normalization operator;

and (3) reserving operator weight channel positions with larger values, replacing the original weight values with weight values reserved by each structural layer, updating channels of convolution operators and batch normalization operators, and storing a convolution neural network model after model pruning.

And 7, setting training parameters to train the convolutional neural network model obtained in the step 6 so as to meet expected processing precision, if the model scale obtained by training does not meet the expected model, executing the step 1, otherwise, outputting the convolutional neural network model to end the process.

Examples

The automatic compression method for the convolutional neural network model provided by the invention realizes pruning operation on the convolutional neural network model, and the specific process is as follows:

s1, identifying a convolutional neural network model.

Loading a convolutional neural network model to be pruned, and carrying out visual processing on the model to identify the network structure of the model, wherein the model comprises each structural layer such as a residual structure, a target identification detector, an image segmentation detector, a full connection layer and the like, operators such as a convolutional operator, a batch normalization operator, an activation function operator and the like, and the operator weights of the operators; reconstructing a model, and analyzing the flow direction of input and output of data in the model to obtain a data flow direction; each structural layer and each operator is then numbered.

S2, determining an operator which can be pruned.

S2.1, obtaining numbers of all convolution operators and batch normalization operators obtained in the step 1, and selecting the convolution operators and the batch normalization operators meeting the following conditions to establish a set of operators to be determined, wherein specifically, the conditions of the convolution operators are two-dimensional convolution operators, the length and the width of convolution kernels are equal, and the conditions of the batch normalization operators are one-dimensional or two-dimensional batch normalization operators.

S2.2, because operators meeting the limiting conditions cannot be pruned, operators meeting any limiting condition are deleted from the operator set to be pruned, and the remaining operators form the operator set to be pruned. The limiting conditions include:

inter-layer contact limitation: the operator belongs to a residual structure layer, the residual structure layer is provided with an Add operator, the output of the Add operator is 1 Tensor, but the input of the Add operator is more than or equal to 2 Tensors, and the output content of one of the operators in the upper layer which is not adjacent to one of the operators is equal to the output content of the operator in the upper layer;

detector limitations (partial model object recognition detector and image segmentation detector): the multiple convolution operators in the same structural layer are provided with N operator weight channels, wherein the corresponding N-1 operator weight channels are equal in value, for example, the weight of the convolution operator 1 is [10,20,30 and 40], the weight of the convolution operator 2 is [10,50,30,40], and the weight of the convolution operator 3 is [20,10,30,40], so that the weight values of the corresponding N-1 operator weight channels are equal, and the weight values of the convolution operator 1 and the convolution operator 3 are unequal;

convolution operator restriction: a plurality of convolution operators in the same structural layer are connected with three continuous operators of the same type backwards along the data flow direction;

full connection layer confinement: operators connected to the full join layer forward or backward along the data flow.

S3, determining a pruning method.

The pruning parameters may also be different, as the pruning method employed is different for different operators in the set of operators to be pruned. If the operator capable of pruning is a batch normalization operator, performing thinning treatment on the operator and pruning the operator, and executing S4; if the operator capable of pruning is a convolution operator, executing S6; and if the operator set to be pruned is empty, ending the flow.

S4, thinning the models of the batch normalization operators.

The calculation formula of the batch normalization operator is as follows:

（4）

wherein,for the current input, it belongs to a set of m input dataB={Z ₁ ,Z ₂ ,.., Z _m }，For the mean value of the m input data,as a function of the variance of the values,a constant close to 0 that is custom;andall are learnable parameters and are stored in weights.

By adding to the Loss function (Loss function)L1 regular constraint of (2) realizationThe sparsification formula is as follows:

（5）

wherein,x、yrespectively representing an input and an output, respectively,Was a weight, the first terml() The loss function is normally trained for the model,Las a function of the loss after the sparseness,g()=||；λfor the front and back terms of parameters for balancing,λthe impact of L1 on the operator is reduced as the number of training increases.

λ=sr * epochs/(epoch*100) （6）

Wherein sr is the sparsity, the value range is [0,1], epochs is the total number of times during sparse training, and epoch is the current training number of times.

And setting dilution ratio sr of batch normalization operators and training parameters to sparsify the model, and obtaining a sparsified model after sparse training is completed, wherein the training parameters comprise training batches, rounds, learning rate, optimizers and the like, and S5 is executed.

S5, calculating the upper threshold limit L of the batch normalization operator _BN ，L _BN =min (list { BN1max, BN2max,.,. BNnmax }), where BNnmax is the maximum value of each batch normalization operator and min (list) is the minimum value in the maximum list; reintegrating all pruned batch normalization operatorsArranged in order from small to large as a first parameter List _BN Find L _BN In List _BN In (2) calculating the maximum value Loc of pruning rate _BN Setting pruning rate, deleting batch normalization operators with learning parameter values smaller than the upper threshold value along the data flow, and recording the positions of the reserved operator weight channels before pruning; s7 is performed.

S6, pruning a convolution operator by using an L1 norm pruning method, sequentially ordering according to a norm calculation result, and determining a pruning part convolution kernel channel by a set pruning rate. The L1 norm formula is as follows:

（7）

wherein x is _p In column for the current convolution operator SapA total number of convolution kernelsPThe method comprises the steps of carrying out a first treatment on the surface of the Calculate this columnpThe sum of the absolute values of all elements in the convolution kernels is obtained, the sum of the convolution kernels of each column of the current convolution operator Sa is obtained, and the convolution kernels are ordered in order from small to large to obtain a norm list _Sa = { Sa1, sa2, sa3, =, saj }, where j is the number of rows of the current convolution operator.

The upper threshold of the convolution operator is L _Conv ，L _Conv =min (List { S1max, S2max,.. Skmax }), where Skmax is the maximum value in the norm List after calculation of the kth convolution operator L1 norm, min (List) is the minimum value in the maximum value List, the norms List integrating all pruned convolution operators becomes a set List, and ordered in order from small to large, finding L _Conv The position of the pruning device is calculated, and the corresponding pruning rate Loc is calculated _Conv The method comprises the steps of carrying out a first treatment on the surface of the Pruning from the direction of model input to output according to pruning rate to be smaller than the upper threshold limit L _Conv Is a convolution operator of (a).

S7, synchronizing operator weight channels after pruning for convolution operators and batch normalization operators belonging to the same structural layer, and taking the maximum number of the convolution operators and batch normalization operators as a final operator weight channel; and replacing the weights reserved in each layer in the model with the weights before replacing, updating the weights of the convolution operator and the batch normalization operators, and finally storing the pruned model.

S8, fine-tuning the model to finish model pruning.

And (3) importing a pruned model, setting training parameters again, performing fine tuning training on the model to achieve expected accuracy, and if the size of the trained model (namely, the data volume unit of the model is MB) does not meet the expected requirement, executing S1 to continue pruning the model until a model meeting the expected requirement is obtained.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An automatic compression method of a convolutional neural network model is characterized by comprising the following steps:

step 1, loading a convolutional neural network model to be pruned, and performing visualization processing on the model to identify a network structure of the model, so as to obtain the network structure of the convolutional neural network model to be pruned, wherein the network structure comprises a structural layer, a data flow direction, an operator and an operator weight channel;

step 4, taking the minimum value of the maximum learning parameters gamma of all batch normalization operators in the convolutional neural network model to be pruned as a first threshold upper limit, forming a first parameter list by arranging the learning parameters of the batch normalization operators in the to-be-pruned operator set in order from small to large, wherein the position of the first threshold upper limit in the first parameter list is i, and obtaining the maximum value Loc of the pruning rate according to the formula (i+1)/len1 x 100% _BN Where len1 is the length of the first parameter list; the value is set to be smaller than Loc _BN Deleting operator weight channels of batch normalization operators with learning parameter values smaller than the upper limit of a first threshold along the data flow; executing the step 6;

step 5, taking the minimum value of the maximum L1 norms of all convolution operators in the convolution neural network model to be pruned as a second threshold upper limit, forming a second parameter list by arranging the L1 norms of the convolution operators in the convolution operator set to be pruned in a sequence from small to large, wherein the position of the second threshold upper limit in the second parameter list is j, and obtaining the maximum value Loc of the pruning rate according to the formula (j+1)/len2 x 100% _Conv Where len2 is the length of the second parameter list; the value is set to be smaller than Loc _Conv Deleting operator weight channels of convolution operators with L1 norm values smaller than the upper limit of the second threshold along the data flow;

step 6, if operators in the operator set to be pruned are processed, training the current convolutional neural network model by setting training parameters, if the model scale obtained by training does not accord with a preset value, executing step 1, otherwise, outputting the convolutional neural network model to end the process; if the operators in the operator set to be pruned are not all processed, executing the step 3;

the limiting conditions in the step 2 include:

2. The automatic compression method according to claim 1, wherein the selection condition in the step 2 is to select a two-dimensional convolution operator having equal length and width convolution kernels and a one-dimensional or two-dimensional batch normalization operator.

3. The automatic compression method according to claim 1, wherein the succession refers to serial or parallel.

4. The automatic compression method according to claim 1, wherein the step 6 further comprises:

for convolution operators and batch normalization operators which are positioned in the same structural layer, acquiring the number of coincident channels between operator weight channels of the two operators before pruning, taking the number of coincident channels as the final operator weight channel number of the two operators, and respectively deleting non-coincident channels in the convolution operators and the batch normalization operators to finish pruning; if the number of the coincident channels is 0, the method is based on the formulaObtaining a new pruning rate Loc, wherein A is the total number of convolution operators in the model, and C is convolutionThe number of the kernels, K is the size of the convolution kernel, and S is the size of a model parameter; setting pruning rate with value smaller than Loc, deleting operator weight channels of convolution operators with L1 norm value smaller than a second threshold upper limit along the data flow direction, and taking larger values of the operator weight channels of the convolution operators and the batch normalization operators after pruning as the final operator weight channel numbers of the convolution operators and the batch normalization operators; and reserving the operator weight channel position of the operator with a larger value, and replacing the original weight value with the current weight value of each structural layer. />