CN116992944A

CN116992944A - Image processing method and device based on leavable importance judging standard pruning

Info

Publication number: CN116992944A
Application number: CN202311257199.1A
Authority: CN
Inventors: 李超; 陈启运; 刁博宇; 宫禄齐
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-11-03
Anticipated expiration: 2043-09-27
Also published as: CN116992944B

Abstract

The application discloses an image processing method and device based on a leavable importance judging standard pruning, belonging to the field of artificial intelligent model compression and reasoning acceleration, wherein the method comprises the following steps: acquiring an image processing model to be pruned; adding an importance judgment standard search space in forward propagation of the image processing model to be pruned, wherein the importance judgment standard search space is used for calculating the part of each layer of convolution layer in the image processing model to be pruned, which needs pruning; back-propagating based on the importance judgment standard search space and updating parameters of the importance judgment standard search space; and fine tuning is carried out on the model pruned based on the final judgment standard.

Description

Image processing method and device based on leavable importance judging standard pruning

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to an image processing method and device based on a leavable importance judging standard pruning, aiming at model compression and reasoning acceleration direction research.

Background

In recent years, with the development of deep learning, convolutional neural networks (Convolutional Neural Networks, CNN) perform surprisingly well in visual target detection, segmentation, recognition, and classification tasks. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, the input image is subjected to convolutional extraction of local features by a filter, and then a feature mapping layer is obtained by the sampling layer. Compared with the traditional neural network, the convolutional neural network simplifies network parameters through local receptive fields, weight sharing and pooling layers, but as the network layer number is deepened, the parameters of the network are continuously enlarged, for example, googleNet has 6.7M parameters, alexNet has 60M parameters and VGG16 has 138M parameters. Various acceleration modes are mainly divided into: first, pruning, deleting unimportant parameters in the network, such as weight pruning, channel pruning, convolution kernel pruning, neuron pruning. Pruning can ensure precision and reduce the parameter quantity at the same time, but the consumed time and the calculated amount are large; second, quantization, converting floating point operation into fixed point operation. The quantization can reduce the volume of the model with less loss of precision, but the accuracy is unstable and the universality is poor; knowledge distillation, large network guidance training generates a small network. The knowledge distillation training is simple, but the design migration mode is difficult; in order to reduce the size of a network model in image processing and to reduce the computational complexity in image processing, network compression represented by model pruning is now widely used in image processing tasks.

Among the convolution kernel pruning, it can be classified into hard pruning, where the last round of deleted convolution kernel does not participate in iteration in the next round, soft pruning, and where the last round of deleted convolution kernel still participates in iteration in the next round. However, they all have a common disadvantage that different layers of the network adopt the same importance criterion, and cannot fully consider different distributions of features of different layers. How to select the appropriate importance criteria should be paid attention.

On the one hand, the importance of the measured neuron can be calculated based on the weight standard, the output of the characteristic is the weight of the multiplication of the input and the weight, and the smaller the weight is, the smaller the contribution to the output is. By carrying out the weight amplitude valueSorting, removing the connection lower than the preset value, and obtaining the pruned network. Based onJudging the importance of the convolution kernel; geometric Median (GM) is not considered important by pruning redundant convolution kernels instead of those with relatively low weights, which are considered too similar to other convolution kernels; on the other hand, neurons with minimal loss effects can be found based on the loss function. Performing taylor expansion on the activation value, calculating a gradient product of a convolution kernel and each weight, and summing to obtain an average value; squaring the spread of the weights; the activation value may also be used as a metric. Most neurons with activation values approaching 0 are redundant, and the removal of the part of neurons can greatly reduce the size and the operand of the model without affecting the model precision. APoZ calculates redundancy and clipping space of the proportional estimate model of each layer 0; clipping is also performed based on the information entropy. However, there is still relatively little research on whether the model as a whole uses a criterion, requiring further research.

Disclosure of Invention

Aiming at the defects of the prior art, the embodiment of the application aims to provide an image processing method and device based on a leavable importance judgment standard pruning, and a selectable importance judgment search space is added to the traditional soft pruning, so that each layer can select a proper judgment standard. In the leachable pruning, each layer "learns" to select the most appropriate criterion, so that the network obtains better pruning, and the running memory is reduced when the image processing task is performed.

According to a first aspect of an embodiment of the present application, there is provided an image processing method based on a leavable importance assessment standard pruning, including:

acquiring an image processing model to be pruned;

adding an importance judgment standard search space in forward propagation of the image processing model to be pruned, wherein the importance judgment standard search space is used for calculating the part of each layer of convolution layer in the image processing model to be pruned, which needs pruning;

back-propagating based on the importance judgment standard search space and updating parameters of the importance judgment standard search space;

trimming the image processing model after pruning based on the final judgment standard;

and deploying the trimmed image processing model on a terminal so as to process the image through the trimmed image processing model.

Further, the image processing model to be pruned comprises L layers of convolution layers and marks the convolution kernel information of the L layers.

Further, in the importance criterion search space:

by calculating the probability of selecting the ith importance criterionTo select criteria for the layer l convolution layer:

，

wherein ,importance criterion search space representing a layer I convolution layer>An importance criterion superparameter +_>，/>Representing the number of importance criteria;

and processing the probability p by adopting a gumme-Softmax discrete distribution sampling method:

，

wherein Is a uniform distribution of 0 to 1, +.>For one of the samples->Random variable noise representing independent distribution, +.>Is the temperature coefficient of softmax, when +.>When changing from big to small->One element will be chosen from evenly distributed to fixed.

Further, the number s=5 of the importance evaluation criteria, and the 5 importance evaluation criteria are respectively: (1)Criteria for evaluation, (2)/(2)>The method comprises the following steps of (1) judging the standard, (3) the geometric median judging standard, (4) the judging standard capable of removing loss change caused by partial feature mapping by calculating the change of the loss function before and after the pruning is expanded by using the Taylor approximation, and (5) the APoZ activation value is selected as the judging standard.

Further, after adding an importance judgment standard search space in the forward propagation of the image processing model to be pruned, the output feature graphs of different importance judgment standards are selected from the importance judgment standard search space to be aligned.

Further, the output feature graphs of different importance judgment standards are selected from the importance judgment standard search space to be aligned, specifically:

，

wherein For the aligned feature map, +.>For selecting the probability of the ith importance criterion +.>To select the ith evaluation standard, the feature map is output after pruning, and the item is added with->The operation sums the feature maps that remain the same location.

Further, back-propagating and updating parameters of the importance criterion search space based on the importance criterion search space includes:

for the followingImage processing model of layer, loss by minimizing criterion +.>Training the importance criterion search space +.>；

After the importance criteria search space training is completed, by minimizing the loss on the validation set in the datasetTo calculate the structural parameters +.>：

，

wherein Is the adjusted criterion search space, < ->Is a set of parameters of the criteria of the judgment,is a loss of criterion search space, +.>Is the computational penalty in pruning the network, +.>Is balance-> and />Is a weight of (2).

According to a second aspect of an embodiment of the present application, there is provided an image processing apparatus based on a leavable importance assessment criterion pruning, including:

the acquisition module is used for acquiring an image processing model to be pruned;

the adding module is used for adding an importance judgment standard search space in forward propagation of the image processing model to be pruned, wherein the importance judgment standard search space is used for calculating the part of each layer of convolution layer in the image processing model to be pruned, which needs pruning;

the updating module is used for carrying out back propagation based on the importance judgment standard search space and updating parameters of the importance judgment standard search space;

the fine adjustment module is used for fine adjustment of the image processing model based on the final judgment standard pruning;

the deployment module is used for deploying the trimmed image processing model to the terminal so as to process the image through the trimmed image processing model.

According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.

According to a fourth aspect of embodiments of the present application there is provided a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

according to the embodiment, the distribution difference problem in the deep learning network layer is considered, different pruning judgment standards are selected for different layers, and optimization of model pruning is realized by combining the advantages of the different judgment standards; according to the application, the overall interlayer cutting result is considered, and the synergy of each layer of the network to the final result is considered not only in a greedy independent cutting mode; according to the application, the accuracy of the network can be maximally reserved in the process of pruning the network by selecting the optimal pruning judgment standard. The method provided by the application can better eliminate unnecessary values in the weight tensor in various deep learning networks, reduce the number of connections between the neural network layers and reduce the parameters involved in calculation, thereby reducing the operation times. In particular, tasks involving large images, such as semantic segmentation or object detection, during image processing, intermediate representations may consume a significant amount of memory, far beyond the network itself. Pruning can reduce the storage space of the model, reduce the intermediate calculation generation diagram and reduce the memory required during operation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a structural overview diagram showing an image processing method based on a leavable importance judging standard pruning according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating an image processing method based on a leavable importance criterion pruning according to an exemplary embodiment.

Fig. 3 is an importance criterion spatial flowchart illustrating an image processing method based on a leachable importance criterion pruning according to an exemplary embodiment.

Fig. 4 is a block diagram showing an image processing apparatus based on a leavable importance assessment criterion pruning according to an exemplary embodiment.

Fig. 5 is a schematic diagram of an electronic device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

As shown in fig. 1 and fig. 2, the image processing method based on the leavable importance judging standard pruning provided by the application may include the following steps:

step S11: acquiring an image processing model to be pruned;

step S12: adding an importance judgment standard search space in forward propagation of the image processing model to be pruned, wherein the importance judgment standard search space is used for calculating the part of each layer of convolution layer in the image processing model to be pruned, which needs pruning;

step S13: back-propagating based on the importance judgment standard search space and updating parameters of the importance judgment standard search space;

step S14: trimming the image processing model after pruning based on the final judgment standard;

step S15: and deploying the trimmed image processing model on a terminal so as to process the image through the trained model.

In the implementation of step S11, an image processing model to be pruned is obtained;

in particular, a pre-trained model with parameters is obtained that achieves advanced performance in image processing.

In specific implementation, pretrained network models such as VGG16, resnet32, resnet56, resNet50 and the like can be adopted in the classification task, and a YOLO series pretrained model can be adopted in the target detection task. The pre-training model with parameters comprisesLayer convolution layer and label->The convolution kernel information of the layer. Acquiring all filter sets in an image processing model to be pruned>Scoring according to each layer of judgment standard in pruning process, and collecting filter>Is divided into reserved sets->And filter set pruned +.>。

In particular implementations, training data sets employed in verifying pruning effects and fine-tuning model loss in image processing task adoptionMay include:

classifying the data set: CIFAR-10, CIFAR-100 and ILSVRC-2012 data sets, wherein the CIFAR-10 data sets comprise 5 ten thousand training images and 1 ten thousand test images, and 6 ten thousand 32X 32 color images are divided into 10 different categories. CIFAR-100 has 100 classes, and the number of images is the same as CIFAR-10. ILSVRC2012 contains 128 tens of thousands of training images and 50 tens of thousands of 1000 classes of verification images;

semantic segmentation and object detection dataset: VOC 2012 and COCO dataset, VOC 2012 contains 17125 pictures with 11540 for the detection task. The COCO dataset is another common large-scale dataset for target detection, comprising 80 categories of large-scale datasets, the data of which are divided into three parts: training, validation and testing, each section contained 118287, 5000 and 40670 pictures, respectively.

In the implementation of step S12, adding an importance criterion search space in the forward propagation of the image processing model to be pruned, where the importance criterion search space is used to calculate a portion of each layer of convolution layers in the image processing model to be pruned, where pruning is required;

specifically, in the importance criterion search space:

，

wherein Is a uniform distribution of 0 to 1, +.>For one of the samples->The random variable noise representing independent distribution can obtain distribution approaching one-hot, and meanwhile, the noise also keeps the sampling probability consistent with the original distribution. />Is the temperature coefficient of softmax, when +.>When changing from big to small->One element will be chosen from evenly distributed to fixed. In a specific implementation, the temperature coefficient +.>Is set to linearly decrease from 5 to 0.1 to gradually approach the true discrete distribution.

Is sampling in distributed discrete data, the process is not conductive, guble-Softmax is a continuous distribution of estimated samples from class distribution, after Guble-Softmax re-referencing, the->The gradient can be propagated by direction to +.>Achieving the purpose of learning.

Wherein the number s=5 of the importance evaluation criteria, and the 5 importance evaluation criteria are respectively: (1)Criteria for evaluation, (2)/(2)>The method comprises the following steps of (1) judging the standard, (3) the geometric median judging standard, (4) the judging standard capable of removing loss change caused by partial feature mapping by calculating the change of the loss function before and after the pruning is expanded by using the Taylor approximation, and (5) the APoZ activation value is selected as the judging standard.

The forward propagation of the importance criteria search space can be expressed as:

wherein Function representing pruning given +.>Return->, wherein />Represents +.o under the S importance criterion>Layer-retained filter set, +.>Representation->Layer filter set, filter set removed->，Representing the judgment standard space, & lt & gt>Representing the compression ratio.

As shown in fig. 3, according to importance judgment standards of different probabilities, pruning is performed on the filter with smaller importance score according to the importance score of each importance judgment standard to obtain pruned layers with different results, and in order to comprehensively consider the sharing of the results of different judgment standards in the training process, the output feature map is defined as an alignment weighted sum of the feature maps of the pruned layers:

，

In the implementation of step S13, the parameters of the importance criterion search space are back-propagated and updated based on the importance criterion search space;

in particular toThe image processing model of the layer evaluates the standard parameter set and finds out the appropriate structural parameter +.>And guiding to select the optimal judgment standard. Loss by minimizing criteria for evaluation>Training the importance criterion search space +.>Loss on minimum validation set +.>Calculating structural parameters +.>：

，

wherein Is the adjusted criterion search space, < ->Is a set of parameters of the criteria of the judgment,is a loss of criterion search space, +.>Is the computational penalty in pruning the network, +.>Is balance-> and />Weight of (2) will +.>Set to 2

By minimizing network loss over training setsObtain the optimal structural parameters->Adam's optimization function is specifically employed in one embodiment and uses a constant learning rate of 0.001 and a weight decay of 0.001. On CIFAR, training was performed for 600 periods, with a batch size of 256. On ILSVRC-2012, training was performed for 35 periods, with a batch size of 256, and the weights of the pre-training model were not updated during training to reduce overfitting.

Finally, according to the S parameter of the final important judgment space after trainingSelecting the judgment standard of the maximum probability, wherein the finally selected judgment standard of each layer forms a judgment standard set suitable for each layer>According to the criterion set ∈ ->And ranking the filters of the network layer, removing the filters with lower ranks according to the compression rate n, and finishing final pruning.

In the specific implementation of step S14, fine tuning is performed on the image processing model pruned based on the final criterion; specifically, the network after final pruning has precision loss, needs fine adjustment, reduces learning rate in the specific implementation process, and is in the data setTraining the network after trimming pruning again; the reduced learning rate is set to be 1/10 of the initial in the specific embodiment.

In the implementation of step S15, the trimmed image processing model is deployed at a terminal, so as to perform image processing through the trained model;

in this embodiment, the image processing model after pruning and fine tuning is deployed to a terminal to perform image classification.

Corresponding to the embodiment of the image processing method based on the learning importance assessment standard pruning, the application also provides an embodiment of the image processing device based on the learning importance assessment standard pruning.

Fig. 4 is a block diagram of an image processing apparatus based on a leavable importance assessment criterion pruning, which is shown according to an exemplary embodiment. Referring to fig. 4, the apparatus may include:

an acquisition module 21, configured to acquire an image processing model to be pruned;

the adding module 22 is configured to add an importance criterion search space in forward propagation of the image processing model to be pruned, where the importance criterion search space is used to calculate a portion of each layer of convolution layer in the image processing model to be pruned, where pruning is required;

an updating module 23, configured to back-propagate based on the importance criterion search space and update parameters of the importance criterion search space;

a fine tuning module 24, configured to fine tune the image processing model pruned based on the final criterion;

the deployment module 25 is configured to deploy the trimmed image processing model to a terminal, so as to perform image processing through the trained model.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Correspondingly, the application also provides electronic equipment, which comprises: one or more processors; a memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for processing images based on the criterion pruning for learning importance as described above. As shown in fig. 5, a hardware structure diagram of an arbitrary device with data processing capability according to the image processing method based on the learning importance evaluation standard pruning according to the embodiment of the present application is shown in fig. 5, and in addition to the processor, the memory, the DMA controller, the magnetic disk, and the nonvolatile memory shown in fig. 5, the arbitrary device with data processing capability according to the embodiment of the present application generally includes other hardware according to the actual function of the arbitrary device with data processing capability, which is not described herein again.

Correspondingly, the application also provides a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the instructions are executed by a processor to realize the image processing method based on the learning importance judging standard pruning. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device of the wind turbine generator, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any device having data processing capabilities. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. An image processing method based on a leavable importance criterion pruning is characterized by comprising the following steps:

acquiring an image processing model to be pruned;

2. The method according to claim 1, wherein the image processing model to be pruned comprises L layers of convolution layers and labels the L layers of convolution kernel information.

3. The method of claim 1, wherein in the importance criteria search space:

，

wherein ,importance criterion search space representing a layer I convolution layer>The importance judgment standard exceeds the parameters,，/>representing the number of importance criteria;

，

4. A method according to claim 3, wherein the number of importance criteria S = 5,5 of saidThe importance judgment standards are respectively as follows: (1)Criteria for evaluation, (2)/(2)>The method comprises the following steps of (1) judging the standard, (3) the geometric median judging standard, (4) the judging standard capable of removing loss change caused by partial feature mapping by calculating the change of the loss function before and after the pruning is expanded by using the Taylor approximation, and (5) the APoZ activation value is selected as the judging standard.

5. The method according to claim 1, wherein after adding an importance criterion search space in the forward propagation of the image processing model to be pruned, the output feature maps of different importance criteria are selected in the importance criterion search space for alignment.

6. The method according to claim 5, wherein the output feature graphs of the different importance criteria are selected in the importance criteria search space for alignment, specifically:

，

7. The method of claim 1, wherein back-propagating and updating parameters of the importance criteria search space based on the importance criteria search space comprises:

，

wherein Is the adjusted judgment standardSearch space->Is a set of criterion parameters, < >>Is a loss of criterion search space, +.>Is the computational penalty in pruning the network, +.>Is balance-> and />Is a weight of (2).

8. An image processing apparatus based on a leavable importance criterion pruning, comprising:

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1-7.