CN113361697A

CN113361697A - Convolution network model compression method, system and storage medium

Info

Publication number: CN113361697A
Application number: CN202110797243.2A
Authority: CN
Inventors: 李卫东; 刘平涛; 罗博文; 张招
Original assignee: Wuhan Lexway Technology Development Co ltd; Shenzhen Siyue Innovation Co ltd
Current assignee: Wuhan Lexway Technology Development Co ltd; Shenzhen Siyue Innovation Co ltd
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-09-07

Abstract

The embodiment of the invention discloses a convolution network model compression method, a convolution network model compression system and a storage medium.

Description

Convolution network model compression method, system and storage medium

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a convolution network model compression method, a convolution network model compression system and a convolution network model compression storage medium.

Background

The convolutional network model generally faces three problems when applied in reality: (1) the excellent performance of the convolutional network model depends on millions of trainable parameters in the model, the parameters and network structure information need to be stored in a hard disk and then loaded into a memory during reasoning, for example, a model pre-trained on ImageNet needs more than 300M space, which is a great burden for embedded devices; (2) the impact of runtime on memory usage, during inference, the intermediate activation/response of the convolutional network model requires even more memory space than the memory model parameters, which is a large burden for many applications with low computational power, even though it is not a problem for high performance GPUs; (3) calculation amount: convolution on high resolution pictures can be computationally intensive to operate, and a large convolution network model can take several minutes to process a picture on an embedded device, making it impractical for real-world applications.

Disclosure of Invention

To this end, embodiments of the present invention provide a method, a system, and a storage medium for compressing a convolutional network model, so as to solve at least one of the problems in the foregoing background art.

In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a convolutional network model compression method, including:

acquiring image data, making a training data set, building a convolution network model by using a deep learning framework, and executing the following steps on the convolution network model:

performing channel sparse regularization training on the convolutional network model by using a training data set until the convolutional network model is converged;

pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer;

finely adjusting the pruned convolution network model, and judging whether the finely adjusted convolution network model is converged;

if the finely tuned convolution network model is converged, storing model parameters of the finely tuned convolution network model to obtain a compressed convolution network model; and if the finely tuned convolution network model is not converged, repeating the steps.

Further, acquiring image data and making a training data set, and building a convolution network model by using a deep learning framework comprises the following steps:

acquiring image data according to an application scene and making a training data set;

calculating the mean and standard deviation of the training data set;

carrying out normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;

configuring the channel number of a convolution network model according to the category number contained in the application scene;

the channel sparse regularization training of the convolutional network model by utilizing the training data set comprises the following steps:

and performing channel sparse regularization training on the convolutional network model by utilizing the preprocessed training data set.

Further, the channel sparse regularization training of the convolutional network model by using the preprocessed training data set comprises:

and inputting the preprocessed training data set into a convolution network model, and performing channel sparse regularization training on the convolution network model to obtain an output value of the convolution network model, updated weight parameters and a scaling factor of the BN layer.

Further, the number of scaling factors is the same as the number of BN layers.

Further, pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer includes:

when a convolution network model is built, a BN layer is inserted after a convolution layer of the convolution network model, and a scaling factor and a translation parameter of the BN layer are obtained by training the convolution network model;

and pruning the channel of the converged convolutional network model by using the scaling factor and the translation parameter of the BN layer.

Further, pruning the channels of the converged convolutional network model comprises:

sorting the absolute values of the scaling factors of the BN layer corresponding to each channel of the converged convolutional network model according to a sorting rule;

intercepting the scaling factors of the BN layers at the corresponding positions after sorting as the global threshold values of all layers of the converged convolutional network model;

judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than a global threshold value or not;

and if so, cutting off a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold value.

In a second aspect, an embodiment of the present invention provides a convolutional network model compression system, where the system includes: a processor and a memory;

the memory is used for storing one or more program instructions;

a processor for executing one or more program instructions to perform any of the method steps of the above method for compressing a convolutional network model.

In a third aspect, embodiments of the present invention provide a computer storage medium having one or more program instructions embodied therein for execution by a convolutional network model compression system to perform any one of the method steps of a convolutional network model compression method as described above.

The embodiment of the invention has the following advantages:

the convolution network model compression method provided by the embodiment of the invention firstly carries out channel sparse regularization training on a convolution network model to make the convolution network model converge, then adopts the scaling factor of a BN layer to prune the channel of the converged convolution network model, then carries out fine tuning on the pruned convolution network model and judges whether the converged convolution network model is converged, and if the converged convolution network model is obtained, the compressed convolution network model is obtained, so that the size of the convolution network model and the occupation of a memory when the convolution network model operates are effectively reduced, the operation number is reduced while the precision is not influenced, the model compression and reasoning acceleration can be realized by using the traditional hardware and a deep learning software package, and other special hardware accelerators are not needed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

Fig. 1 is a schematic flow chart of a convolution network model compression method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a convolution network model compression method according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for pruning channels of a converged convolutional network model according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a further method for pruning channels of a converged convolutional network model according to an embodiment of the present invention;

fig. 5 is a block diagram of a convolution network model compression system according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present embodiment provides a convolution network model compression method, including:

s101, collecting image data, making a training data set, building a convolution network model by using a deep learning framework, and executing steps S102 to S105 on the convolution network model:

s102, performing channel sparse regularization training on the convolutional network model by using a training data set until the convolutional network model is converged;

s103, pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer;

s104, fine tuning the pruned convolution network model, and judging whether the fine tuned convolution network model is converged;

s105, if the finely tuned convolutional network model is converged, storing model parameters of the finely tuned convolutional network model to obtain a compressed convolutional network model; if the fine-tuned convolutional network model does not converge, the above steps S102 to S105 are repeated.

Specifically, the deep learning framework may be a pytorch deep learning framework, and the convolutional network model built by the pytorch deep learning framework may be, for example, an image classification model, which is not specifically limited in this embodiment.

The convolutional network model compression method provided by this embodiment performs channel sparse regularization training on a convolutional network model to converge the convolutional network model, prunes channels of the converged convolutional network model by using a scaling factor of a BN layer, then finely tunes the pruned convolutional network model and determines whether the converged convolutional network model is converged, and obtains the compressed convolutional network model if the converged convolutional network model is converged, thereby effectively reducing the size of the convolutional network model and the occupation of memory during the operation of the convolutional network model, reducing the number of operations without affecting the precision, and realizing model compression and inference acceleration by using conventional hardware and a deep learning software package without using other special hardware accelerators.

Further, as shown in fig. 2, acquiring image data and making a training data set, and building a convolutional network model by using a deep learning framework includes:

s201, acquiring image data according to an application scene and making a training data set;

in this step, an application scenario is an actual life scenario in which the convolutional network model is specifically applied, and categories included in the application scenario are, for example, different categories of fruits such as apples and bananas in one scenario, which is not specifically limited in this embodiment. And each class in the application scenario corresponds to one channel of the convolutional network model.

S202, calculating a mean value and a standard deviation of a training data set;

s203, carrying out normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;

s204, configuring the channel number of the convolution network model according to the category number contained in the application scene;

In this embodiment, a scaling factor γ is introduced for each channel and then multiplied by the output of the channel. And then training the network weight and the scaling factors jointly, and finally directly pruning the channels with small scaling factors and finely tuning the pruned network. The formula adopted for channel sparse regularization training of the convolutional network model is as follows:

wherein x is an input value, y is an output value, W is a weight parameter, γ is a scaling factor of the BN layer, λ is a balance factor, and g (.) is a penalty term on the scaling factor γ of the BN layer. Selection of L₁Regularization, i.e. g(s) | s |, where s is L of the reference vector w₁And (4) norm. The sub-gradient descent method as an unsmooth (non-conductive) L₁Optimization of penalty terms while using smoothed L₁The regular term replaces the penalty term, and the sub-gradient descent method is avoided from being used at the unsmooth point as much as possible.

Further, the number of scaling factors is the same as the number of BN layers.

Specifically, in the formula (1), the number of the scaling factors γ is the same as the number of BN layers.

Further, as shown in fig. 3, pruning the channels of the converged convolutional network model by using the scaling factor of the BN layer includes:

s301, when a convolution network model is built, inserting a BN layer into the convolution layer of the convolution network model, and training the convolution network model to obtain a scaling factor and a translation parameter of the BN layer;

and S302, pruning the channel of the converged convolution network model by using the scaling factor and the translation parameter of the BN layer.

In this embodiment, the formula for pruning the channels of the converged convolutional network model is as follows:

wherein z is_inputIs an input value, z, of the BN layer_outputIs the output value of BN layer, gamma is the scaling factor of BN layer, beta is the translation parameter, mu_BIs taken as the mean value of the average value,

ε is a constant that is not zero, is the variance.

In particular, the essence of pruning a channel is to prune all input and output connections associated with the channel, thereby directly obtaining a narrow network without borrowing any special sparse computation packages. The scaling factor of the BN layer plays a role in selecting channels, and because the regular term and the weight loss function of the scaling factor of the BN layer are jointly optimized through the formula (1), the network can automatically identify and cut off unimportant channels, and the generalization performance of the network is hardly influenced.

And a BN layer is inserted after the convolution layer of the convolution network model, so that the scaling factor and the translation parameter of the BN layer are introduced. The scaling factor of the introduced BN layer, namely the scaling factor of the BN layer, is used as the scaling factor of the network slimming BN layer, so that additional cost is not required to be brought to the network.

Further, as shown in fig. 4, pruning the channels of the converged convolutional network model includes:

s401, sorting the absolute values of the scaling factors of the BN layers corresponding to each channel of the converged convolutional network model according to a sorting rule;

s402, intercepting the scaling factors of the BN layers at the corresponding positions after sorting as the global threshold values of all layers of the converged convolutional network model;

s403, judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than a global threshold value or not;

and S404, if so, clipping a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold.

Specifically, after introducing the regular term of the scale factor of the BN layer, the scale factor of many BN layers in the model tends to be 0. Then, clipping channels corresponding to the scaling factors of the BN layer close to 0, for example, assuming that the dimensions of the feature map after convolution are h × w × c, h and w are the height and width of the feature map, respectively, and c is the number of channels, sending the feature map into the BN layer will obtain the feature map after normalization, where each of the c feature maps corresponds to a set of scaling factors and balance factors of the BN layer, and thus, clipping channels corresponding to the scaling factors of the small BN layer is substantially to directly clip the convolution kernel corresponding to the feature map.

The reference criterion of the scaling factor of the small BN layer depends on the global threshold set for all layers of the entire convolutional network model, which may be defined as a proportion of the scaling factor values of all BN layers, and assuming that 70% of channels in the convolutional network model need to be clipped, the absolute values of the scaling factors of the BN layers may be sorted first according to a sorting rule, for example, the sorting rule may be sorted from small to large, which is not specifically limited in this embodiment, and then the scaling factor of the BN layer located at the 70 th% position in the scaling factors of the BN layers sorted from small to large is clipped as the global threshold according to a preset clipping rule, which in this embodiment is to clip the scaling factor of the BN layer located at the 70 th% position in the scaling factors of the BN layers sorted from small to large as the global threshold, or adopt another preset clipping rule according to actual requirements, the embodiment is not particularly limited. By the method, a compact convolution network model with few parameters, small memory occupation during operation and low calculation amount can be obtained.

In addition, the method is combined with other model compression methods (quantization and low-rank decomposition), so that the compression ratio can be further improved; the method is combined with other optimization acceleration methods (TensorRT and the like) to be used, so that the reasoning speed can be improved; if the inference precision loss is too large, knowledge distillation can be combined to effectively recover the lost precision, and a thin and compact network is obtained under the condition of almost no precision loss.

In a second aspect, as shown in fig. 5, an embodiment of the present invention provides a convolutional network model compression system, which includes: a processor 501 and a memory 502;

memory 502 is used to store one or more program instructions;

a processor 501 for executing one or more program instructions for performing any of the above method steps of the convolutional network model compression method.

In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A method of compressing a convolutional network model, comprising:

performing channel sparse regularization training on the convolutional network model by using the training data set until the convolutional network model is converged;

if the finely tuned convolutional network model is converged, saving the model parameters of the finely tuned convolutional network model to obtain a compressed convolutional network model; and if the finely tuned convolution network model is not converged, repeating the steps.

2. The method for compressing the convolutional network model according to claim 1, wherein the acquiring image data and making a training data set, and the building of the convolutional network model by using the deep learning framework comprises the following steps:

calculating a mean and a standard deviation of the training data set;

performing normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;

the training of the sparse regularization channel of the convolutional network model by using the training data set comprises:

and performing channel sparse regularization training on the convolutional network model by using the preprocessed training data set.

3. The method of compressing a convolutional network model as claimed in claim 2, wherein the training of the convolutional network model with the preprocessed training data set for channel sparsity regularization comprises:

and inputting the preprocessed training data set into the convolution network model, and performing channel sparse regularization training on the convolution network model to obtain an output value of the convolution network model, updated weight parameters and a scaling factor of the BN layer.

4. The convolutional network model compression method of claim 3, wherein the number of scaling factors is the same as the number of BN layer layers.

5. The convolutional network model compression method of claim 1, wherein pruning the channels of the converged convolutional network model by using the scaling factor of the BN layer comprises:

when a convolutional network model is built, inserting a BN layer behind a convolutional layer of the convolutional network model, and training the convolutional network model to obtain a scaling factor and a translation parameter of the BN layer;

6. The convolutional network model compression method of claim 5, wherein the pruning the channels of the converged convolutional network model comprises:

judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than the global threshold value or not;

and if so, cutting off a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold.

7. A convolutional network model compression system, the system comprising: a processor and a memory;

the memory is to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method of any of claims 1-6.

8. A computer storage medium comprising one or more program instructions for executing the method of any one of claims 1-6 by a convolutional network model compression system.