CN115577765A

CN115577765A - Network model pruning method, electronic device and storage medium

Info

Publication number: CN115577765A
Application number: CN202211105427.9A
Authority: CN
Inventors: 丁维浩; 童虎庆
Original assignee: Midea Group Co Ltd; Midea Group Shanghai Co Ltd
Current assignee: Midea Group Co Ltd; Midea Group Shanghai Co Ltd
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2023-01-06

Abstract

The application relates to the technical field of computers, and provides a network model pruning method, electronic equipment and a storage medium, wherein the network model pruning method comprises the following steps: acquiring channel sets to be pruned corresponding to different preset pruning strategies in a network model to be pruned, wherein the different channels to be pruned in the channel sets to be pruned represent unimportant channels acquired by using the corresponding preset pruning strategies; determining a target channel to be pruned based on the channel set to be pruned; and pruning the network model to be pruned based on the target channel to be pruned. By using the method, the aim of efficiently lightening the CNN can be fulfilled when the hardware resources and the real-time performance of the computer are limited, and the precision of the pruned network model is effectively improved.

Description

Network model pruning method, electronic device and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a network model pruning method, an electronic device, and a storage medium.

Background

In recent years, with the development of computer technology, a deep Neural network (CNN) is widely used in the fields of unmanned driving, image classification, object recognition, and the like because of its information extraction capability, and as the CNN is deeper and wider in depth, its information extraction capability is stronger, but there are also higher demands for computer hardware resources and real-time performance. Therefore, how to reduce the weight of CNNs when the computer hardware resources and real-time performance are limited has become a research focus.

Disclosure of Invention

The present application is directed to solving at least one of the technical problems occurring in the related art. Therefore, the network model pruning method provided by the application achieves the purpose of efficiently lightening the CNN when the hardware resources and real-time performance of a computer are limited, and meanwhile effectively improves the precision of the pruned network model.

The application also provides an electronic device.

The present application also provides a non-transitory computer readable storage medium.

The present application also proposes a computer program product.

According to the network model pruning method in the embodiment of the first aspect of the application, the method comprises the following steps:

acquiring channel sets to be pruned corresponding to different preset pruning strategies in a network model to be pruned, wherein different channels to be pruned in the channel sets to be pruned represent unimportant channels acquired by using the corresponding preset pruning strategies;

determining a target channel to be pruned based on the different channels to be pruned;

and pruning the network model to be pruned based on the target channel to be pruned.

According to the network model pruning method, the channel set to be pruned corresponding to different preset pruning strategies in the network model to be pruned is obtained, the target channel to be pruned is determined based on the channel set to be pruned, and the channels to be pruned corresponding to the preset pruning strategies are represented by the channels to be pruned in the channel set to be pruned, so that the channels to be pruned contained in the channel set to be pruned are more abundant and comprehensive when the number of the preset pruning strategies is larger, the determined target channel to be pruned is more accurate, the network model to be pruned is pruned based on the target channel to be pruned, the aim of efficiently pruning CNN when the hardware resources and the real-time performance of a computer are limited can be achieved, and meanwhile, the precision of the network model after pruning is effectively improved in a light weight mode.

According to an embodiment of the present application, the obtaining of the channel set to be pruned corresponding to the different preset pruning strategies in the network model to be pruned includes:

obtaining a target scaling factor of a BN layer in the network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer;

determining a first channel to be pruned matched with the target scaling factor and a second channel to be pruned matched with the target weight parameter;

and acquiring the channel set to be pruned containing the first channel to be pruned and the second channel to be pruned.

According to an embodiment of the present application, the determining a first channel to be pruned that matches the target scaling factor and a second channel to be pruned that matches the target weight parameter includes:

determining a first channel clipping policy based on the target scaling factor, an input of the BN layer, and a bias of the BN layer;

determining the first channel to be pruned based on the first channel pruning strategy and a preset channel pruning threshold;

determining a second channel clipping strategy based on the input of the convolution kernel and the target weight parameter;

and determining the second channel to be pruned based on the second channel pruning strategy and the preset channel pruning threshold.

According to an embodiment of the present application, the determining a target channel to be pruned based on the set of channels to be pruned includes:

and determining a target channel to be pruned based on the intersection channel of the first channel to be pruned and the second channel to be pruned.

According to an embodiment of the present application, the determining a target channel to be pruned based on the set of channels to be pruned further includes:

respectively sequencing the plurality of first channels to be pruned and the plurality of second channels to be pruned according to importance;

and determining a target channel to be pruned based on the union of the results obtained by the sorting.

According to an embodiment of the present application, the obtaining a target scaling factor of a BN layer and a target weight parameter of a convolution kernel of a convolutional layer in a network model to be pruned includes:

training the initial neural network model by using the sample data set, and determining a corresponding to-be-sparse-trained network model when training is stopped;

acquiring an initial scaling factor of a BN layer in the network model to be sparsely trained and an initial weight parameter of a convolution kernel of a convolution layer;

determining a first loss function of the BN layer and a second loss function of the convolution kernel based on a preset regularization constraint, the initial scaling factor and the initial weight parameter;

and carrying out sparse training on the network model to be sparsely trained based on the first loss function and the second loss function, and obtaining a target scaling factor of a BN layer in the network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer.

According to an embodiment of the present application, after the pruning is performed on the network model to be pruned based on the first channel to be pruned and the second channel to be pruned, the method further includes:

and (4) finely adjusting the network model obtained by pruning by using a knowledge distillation mode.

According to an embodiment of the present application, after the fine-tuning the pruning-resultant network model, the method further includes:

matching the model precision of the network model obtained by fine tuning with the preset model precision;

and determining a target network model based on the matching success result of the model precision and the preset model precision.

According to an embodiment of the present application, after the matching of the model precision of the network model obtained by fine tuning and the preset model precision, the method further includes:

acquiring new channel sets to be pruned corresponding to different preset pruning strategies in the network model obtained by fine tuning based on the matching failure result of the model precision and the preset model precision;

and determining a new target channel to be pruned based on the new channel set to be pruned so as to prune the network model obtained by fine tuning.

One or more technical solutions in the embodiments of the present application have at least one of the following technical effects: the method comprises the steps of firstly obtaining channel sets to be pruned corresponding to different preset pruning strategies in a network model to be pruned, and then determining a target channel to be pruned based on the channel sets to be pruned, wherein the different channels to be pruned in the channel sets to be pruned use unimportant channels obtained corresponding to the preset pruning strategies, so that when the number of the preset pruning strategies is larger, the channels to be pruned contained in the channel sets to be pruned are more abundant and comprehensive, the determined target channel to be pruned is more accurate, the network model to be pruned is pruned based on the target channel to be pruned, the aim of efficiently lightening CNN when computer hardware resources and real-time are limited can be fulfilled, and meanwhile, the precision of the pruned network model is effectively improved.

Further, when a target scaling factor of a BN layer in the network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer are obtained, a channel set to be pruned is determined based on a first channel to be pruned matched with the target scaling factor and a second channel to be pruned matched with the target weight parameter, the network structure of an original algorithm does not need to be changed, universality is strong, and a remarkable pruning effect is achieved in scenes such as target detection and image classification; by combining the BN layer scaling factor and the convolution kernel weight parameter to determine the first channel to be pruned and the second channel to be pruned, the problem that the accuracy of channel pruning is low due to the fact that the existing pruning algorithm only depends on the channel to be pruned by the scaling factor of the BN layer is solved, the accuracy of the pruned network model is improved on the premise that the calculated amount, the parameter amount and the model volume when the model is deployed are greatly reduced, and meanwhile the light weight effect of the pruned network model can also be improved.

Furthermore, a mode of performing conventional training on the initial CNN model and then adding preset regularization constraints to perform sparse training is adopted to obtain a target scaling factor of a BN layer in the network model to be pruned and target weight parameters of convolution kernels of convolution layers, so that the reliability and stability of determining the target scaling factor and the target weight parameters are improved by combining model training and adding regularization constraints, and a sufficient basis is provided for the accuracy of subsequent model pruning.

And further, determining a first channel cutting strategy and a second channel cutting strategy based on the target scaling factor and the target weight parameter, and determining a first channel to be pruned and a second channel to be pruned based on the first channel cutting strategy, the second channel cutting strategy and a preset channel cutting threshold value, so that the accuracy and reliability of determining the non-important channel are improved by combining the two-channel strategy pruning scheme, and a foundation is laid for the accuracy of subsequent channel cutting.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a network model pruning method provided in an embodiment of the present application;

FIG. 2 is a graph showing the model size and the class-wide average accuracy for different pruning methods provided by embodiments of the present application;

FIG. 3 is a graph showing the parameter and the class-wide average accuracy of different pruning methods provided by the embodiments of the present application;

FIG. 4 is a graph of performance and the overall average accuracy for different pruning methods provided by embodiments of the present application;

fig. 5 is a schematic structural diagram of a network model pruning device provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the drawings in the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In recent years, with the rapid development of CNNs, CNNs are also increasingly playing a great variety in industrial, security, and living scenes, because CNNs can exhibit excellent information extraction capabilities in many fields such as unmanned driving, image classification, and object recognition. Meanwhile, as the performance of computer hardware is rapidly developed, it becomes possible to build deeper and wider CNNs inside the computer, and in an environment where hardware resources in edge devices are sufficient, more complex CNNs often mean better and more accurate information extraction capability. However, in the production and real-life scenes where hardware resources are limited and real-time requirements are high in the edge device, the deeper and wider CNNs cannot give an information extraction effect meeting the requirements, so that the deeper and wider CNNs need sufficient hardware resources to ensure real-time performance in the actual production deployment process, and hardware cost is increased.

At present, solutions to the above problems generally include two schemes, namely a lightweight structure and model compression, wherein the lightweight structure is to replace the convolutional layer with a lightweight structure, such as Ghost convolution, depth separable convolution and the like; the model compression is to perform model compression on the CNN by adopting algorithms such as pruning, distillation, quantification and the like, although both schemes have a light weight effect on the CNN, the light weight structure has a large influence on the precision of the model, and the speed of the model is not necessarily improved due to the influence of the memory access cost in edge equipment; in the pruning algorithm in the model compression, more criteria for channel selection are provided, but the criteria for selecting various channels are independent, and the pruning algorithms such as SPF (specific pathogen free) and FPGM (fast pulse generation) have larger difference on the network redundancy removing effect, so that the problems of poor light weight effect, serious precision loss and the like can be caused. For example, in the related art, the light weight is achieved by pruning the CNN using a pruning algorithm, and in the pruning algorithm, whether a channel is important is determined based on a scaling factor of a Batch Normalization (BN) layer, and the channel is clipped when it is determined that the channel is not important.

According to the research of the inventor, although the existing pruning algorithm can reduce the calculated amount, the parameter amount and the model volume when the model is deployed, the accuracy of judging whether the channel is important is not high due to the fact that the existing pruning algorithm only uses the scaling factor of one BN layer to judge the importance of the channel, and therefore the light weight effect of the CNN is poor and the precision loss of the CNN after the CNN is light in weight is large.

Based on this, the application provides a network model pruning method, an electronic device and a storage medium, wherein the network model pruning method is suitable for various deep learning algorithms containing a deep convolutional network, including but not limited to networks such as target detection, image classification and semantic segmentation, and can also be applied to scenes such as intelligent security, intelligent home, intelligent factory and the like. The execution main body of the network model pruning method can be edge equipment, and the edge equipment can be a router, a routing switch, integrated access equipment, a multiplexer, various metropolitan area network access equipment and wide area network access equipment; the execution main body of the system can also be a terminal device, and the terminal device can be a Personal Computer (PC), a portable device, a notebook Computer, a smart phone, a tablet Computer, a portable wearable device and other electronic devices. The specific form of the edge device is not limited in the application, and the specific form of the terminal device is not limited in the application.

It should be noted that the following method embodiments take an execution subject as an example of an edge device, and the execution subject of the following method embodiments may be a part or all of the edge device.

Fig. 1 is a schematic flow chart of a network model pruning method provided in an embodiment of the present application, and as shown in fig. 1, the network model pruning method includes the following steps:

and 110, acquiring channel sets to be pruned corresponding to different preset pruning strategies in the network model to be pruned, wherein the different channels to be pruned in the channel sets to be pruned represent unimportant channels acquired by using the corresponding preset pruning strategies.

The network model to be pruned can be CNN which is sparsely trained and is convergent; the preset Pruning strategy may be a preset strategy for determining unimportant channels in the network model to be pruned, the number of the preset Pruning strategies is at least 2, and the preset Pruning strategy may include, but is not limited to, pruning strategies of mathematical methods such as a Soft Filter Pruning (SPF) method, a convolution kernel Pruning (FPGM) method, and the like, and a Pruning strategy for Pruning according to weights in the network model to be pruned. Further, the pruning strategy for pruning according to the weight in the network model to be pruned may be a pruning strategy targeting the improvement of the output of the important channel, or a pruning strategy targeting the pruning of a weight with a smaller output or a weight with a lower influence precision.

Specifically, the edge device obtains channel sets to be pruned corresponding to different preset pruning strategies in the network model to be pruned, may first select at least two preset pruning strategies required by the pruning of this time from the plurality of preset pruning strategies, and then obtain the channel sets to be pruned based on the at least two selected preset pruning strategies. For example, when the at least two preset pruning strategies are a pruning strategy of an SPF method, a pruning strategy of an FPGM method, and a pruning strategy for pruning according to a BN layer scaling factor in the network model to be pruned, the channel set to be pruned includes a channel to be pruned in the network model to be pruned, which is obtained by using the SPF method, a channel to be pruned in the network model to be pruned, which is obtained by using the FPGM method, and a channel to be pruned in the network model to be pruned, which is obtained by using the BN layer scaling factor.

It is understood that the process of obtaining the channel to be pruned using the SPF method and the FPGM method, respectively, may refer to the existing process of pruning using the SPF method and the FPGM method. In addition, the process of obtaining the channel to be pruned by using the BN layer scaling factor may refer to the existing pruning process only using the BN layer scaling factor as a reference. And will not be described in detail herein.

And step 120, determining a target channel to be pruned based on the channel set to be pruned.

Specifically, the edge device performs channel analysis based on the channel set to be pruned, and judges whether the channels to be pruned, which are correspondingly determined based on different preset pruning strategies in the channel set to be pruned, are the same in channel number and channel identification, if the channels to be pruned, which are correspondingly determined based on each preset pruning strategy, are the same in channel number and channel identification, the channel to be pruned, which is correspondingly determined by any preset pruning strategy in the network model to be pruned, is determined as a target channel to be pruned; on the contrary, if the channel numbers of the channels to be pruned, which are determined correspondingly to each preset pruning strategy, are different and/or the channel identifications are different, the channel set to be pruned can be sent to the user terminal, the target channels to be pruned are determined based on the information fed back by the user terminal, and the information fed back by the user terminal can be generated in a manner that the target channels to be pruned are manually selected from the channel set to be pruned based on the user corresponding to the user terminal.

And 130, pruning the network model to be pruned based on the target channel to be pruned.

Specifically, the edge device prunes the network model to be pruned, and may remove a target channel to be pruned in the network model to be pruned from the network model to be pruned, connect networks of the target channels to be pruned, and ensure that channels other than the target channel to be pruned in the network model to be pruned remain in the network model to be pruned.

It can be understood that, considering that pruning can prune the weight with smaller output or the weight with lower influence precision, and the weight of the convolutional layer convolutional kernel in the CNN and the scaling factor of the BN layer can both be used to judge the importance of the channel, the BN layer channel pruning policy and the convolutional layer channel pruning policy can be preset to determine the channel set to be pruned. That is, when the different preset pruning strategies include a BN layer channel pruning strategy and a convolutional layer channel pruning strategy, the implementation process of step 120 may include:

firstly, acquiring a target scaling factor of a BN layer in a network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer; further determining a first channel to be pruned matched with the target scaling factor and a second channel to be pruned matched with the target weight parameter; and then, acquiring a channel set to be pruned containing a first channel to be pruned and a second channel to be pruned.

Wherein, the target scaling factor can characterize that all scaling factors of the BN layer are sparse, and a part of the sparse scaling factors approach 0; the target weight parameters may characterize that all weight parameters of the convolution kernel are sparse, and some of the sparse weight parameters also approach 0.

Specifically, the network model to be pruned is obtained, and a model which contains a BN layer and a convolutional layer, wherein the BN layer contains a target scaling factor, and the convolutional layer contains a target weight parameter is selected from the existing network model as the network model to be pruned; or the model containing the BN layer and the convolutional layer can be obtained by training until the model containing the target scaling factor and the target weight parameter is met. The manner of obtaining the network model to be pruned is not particularly limited herein. And when the network model to be pruned is obtained, naturally, the target scaling factor of the BN layer and the target weight parameter of the convolution kernel of the convolutional layer can also be obtained.

It can be understood that, because the network model to be pruned is a model trained sparsely to converge, part of the scaling factors of the sparsity of the BN layer in the network model to be pruned approaches 0, and part of the weight parameters of the sparsity of the convolution kernel of the convolution layer also approaches 0, it can be determined whether the channel of the BN layer is an important channel based on the target scaling factor of the BN layer, and it is determined that the unimportant channel of the BN layer is the first channel to be pruned, and it is also determined whether the channel of the convolution kernel is an important channel based on the target weight parameters of the convolution kernel, and it is determined that the unimportant channel of the convolution kernel is the second channel to be pruned. At this time, it may be determined that the channel set to be pruned corresponding to different preset pruning strategies in the network model to be pruned includes the first channel to be pruned and the second channel to be pruned.

According to the network model pruning method, when a target scaling factor of a BN layer in a network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer are obtained, a channel set to be pruned is determined based on a first channel to be pruned matched with the target scaling factor and a second channel to be pruned matched with the target weight parameter, the network structure of an original algorithm does not need to be changed, the universality is strong, and obvious pruning effects are achieved in scenes such as target detection and image classification; by combining the BN layer scaling factor and the convolution kernel weight parameter to determine the first channel to be pruned and the second channel to be pruned, the problem that the accuracy of channel pruning is low due to the fact that the existing pruning algorithm only depends on the channel to be pruned by the scaling factor of the BN layer is solved, the accuracy of the pruned network model is improved on the premise that the calculated amount, the parameter amount and the model volume when the model is deployed are greatly reduced, and meanwhile the light weight effect of the pruned network model can also be improved.

It can be understood that, considering that the network model containing the BN layer and the convolutional layer may be a CNN, the target scaling factor and the target weight parameter may be obtained by performing conventional training on the CNN, followed by performing sparse training. Based on this, the target scaling factor of the BN layer and the target weight parameter of the convolution kernel of the convolutional layer in the network model to be pruned are obtained, and the specific implementation process may include:

firstly, training an initial neural network model by using a sample data set, and determining a corresponding to-be-sparse training network model when training is stopped; then obtaining an initial scaling factor of a BN layer in the network model to be sparsely trained and an initial weight parameter of a convolution kernel of the convolution layer; further determining a first loss function of the BN layer and a second loss function of the convolution kernel based on a preset regularization constraint, an initial scaling factor and an initial weight parameter; then, sparse training is carried out on the network model to be sparsely trained on the basis of the first loss function and the second loss function, and a target scaling factor of a BN layer in the network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer are obtained.

The sample data set may be an image data set, a text data set, a voice data set, or a video data set according to an application scenario of the CNN model. The type of the sample data set is not particularly limited in the present application. In addition, the sample data set can adopt an existing data set, and can also be obtained by performing data acquisition on the application field Jing Jin. The present application does not specifically limit the manner of obtaining the sample data set. Specifically, training an initial CNN model by using a sample data set, calculating a loss function of an intermediate CNN model obtained after the training, determining whether the precision of the intermediate CNN model obtained by the training reaches a preset precision threshold value or not based on the loss function, stopping the training if the precision of the intermediate CNN model obtained by the training reaches the preset precision threshold value, and determining that the corresponding intermediate CNN model when the training is stopped is a network model to be sparsely trained; otherwise, if the precision of the intermediate CNN model obtained by the training does not reach the preset precision threshold value, obtaining the intermediate CNN model after the parameters are updated, and training the intermediate CNN model after the parameters are updated again until the precision of the intermediate CNN model obtained after the training reaches the preset precision threshold value, stopping the training, and determining the network model when the training is stopped as the network model to be sparsely trained. The process is a process of performing conventional training on an initial CNN model, the process of training the initial CNN model into a network model to be sparsely trained is also a process of training the initial CNN model to be converged, when the initial CNN model is trained to be converged, a scaling factor of a BN layer can be used as an initial scaling factor gamma of the network model to be sparsely trained, and a weight parameter of a convolution kernel of a convolution layer is used as an initial weight parameter W of the network model to be sparsely trained, the initial scaling factor gamma is a one-dimensional vector, and the initial weight parameter W is a multi-dimensional matrix.

Further, a first loss function L of the BN layer is determined based on a preset regularization constraint and an initial scaling factor ₁ =Σl' + λ sign (γ): determining a second loss function of the convolution kernel to be L based on a preset regularization constraint and an initial weight parameter ₂ Then, based on the first loss function and the second loss function, carrying out sparse training on the network model to be sparsely trained, wherein the coefficient training aims to sparsely add an initial scaling factor and an initial weight parameter of the preset regularization constraint coefficient, so that the scaling factor of the BN layer is sparse and part of the scaling factors approach to 0, the weight parameter of the convolution kernel is sparse and part of the weight parameters approach to 0, and simultaneously, the network model to be sparsely trained is trained to be convergent, so that the target scaling factor of the BN layer in the network model to be pruned is obtained

Target weight parameters of convolution kernel of convolution layer

Further, the specific mode of sparse training may refer to the existing sparse training mode, and is not described herein again; the preset regularized constraint sparsity may be an L1 regularized constraint coefficient, a target scaling factor

Also a one-dimensional vector, target weight parameter

Which is also a multi-dimensional matrix.

According to the network model pruning method, the target scaling factor of the BN layer in the network model to be pruned and the target weight parameter of the convolution kernel of the convolution layer are obtained by performing conventional training on the initial CNN model and then adding the preset regularization constraint for sparse training, so that the reliability and stability of determining the target scaling factor and the target weight parameter are improved by combining model training and adding the regularization constraint, and a sufficient basis is provided for the accuracy of subsequent model pruning.

It can be understood that, since sparse training may cause more scaling factors in the BN layer to approach 0 and more weight parameters in the convolution kernel to approach 0, in order to constrain the scaling factors and the weight parameters, the first channel to be pruned and the second channel to be pruned may be determined by determining different clipping criteria. Based on this, a first channel to be pruned matching the target scaling factor and a second channel to be pruned matching the target weight parameter are determined, and the specific implementation process may include:

firstly, determining a first channel cutting strategy based on a target scaling factor, the input of a BN layer and the bias of the BN layer; determining a first channel to be pruned based on the first channel pruning strategy and a preset channel pruning threshold; determining a second channel clipping strategy based on the input of the convolution kernel and the target weight parameter; and determining a second channel to be pruned based on the second channel pruning strategy and a preset channel pruning threshold.

The preset channel clipping threshold is a percentage greater than 0 and less than 1, and the size of the preset channel clipping threshold can be adjusted, that is, the size of the preset channel clipping threshold in the current clipping process can be adjusted based on the lightweight degree of the network model after the last clipping. Moreover, the processes of determining the first channel to be pruned and determining the second channel to be pruned may be performed sequentially or simultaneously, and are not limited specifically herein.

In particular, based on the target scaling factor, the input of the BN layer and BN-layer bias, the first channel clipping policy may be determined as:

for the output of the BN layer in the network model to be pruned, z _in Is the input of the BN layer in the network model to be pruned, beta is the offset of the BN layer in the network model to be pruned,

the target scaling factor of the BN layer in the network model to be pruned is obtained; the first channel clipping strategy indicates, for a certain channel of the convolution kernel, when the target scaling factor

When the channel approaches 0, the output and the input are approximately irrelevant, so that the channel is relatively unimportant; according to the first channel clipping strategy, at least one relatively unimportant channel in a convolution kernel can be determined, and after the at least one relatively unimportant channel is sorted from small to large, the top topQ channels are determined to be the first channels to be clipped from the sorted channels according to the product Q of the total number of the at least one relatively unimportant channel and a preset channel clipping threshold. Similarly, based on the first channel clipping policy and the preset channel clipping threshold, it may be determined that the first channel to be clipped is:

is the output of the convolution kernel of the convolution layer in the network model to be pruned, A is the input of the convolution kernel of the convolution layer in the network model to be pruned,

target weight parameters for convolution kernels of convolution layers in the network model to be pruned, A _ij Of the i-th row and j-th column of input A as convolution kernelsThe input of the input data is carried out,

as a target weight parameter

Target weight parameters of the ith row and the jth column; the second pass clipping strategy indicates that for a convolution kernel, when the target weight parameter in the convolution kernel approaches 0, the convolution kernel checks the output of an arbitrary input A

And also approaching to 0, determining that the channels of the convolution kernel are relatively unimportant, and after sorting the relatively unimportant channels from small to large, determining the first topM channels as second channels to be pruned from the sorted channels according to the product M of the total number of the relatively unimportant channels in the convolution kernel and a preset channel pruning threshold.

It can be understood that the determined first channel clipping strategy and the determined second channel clipping strategy are referred to as a two-channel strategy pruning scheme in the application, and the two-channel strategy pruning scheme is to add a regularized preset pruning strategy.

According to the network model pruning method, the first channel to be pruned and the second channel to be pruned are determined based on the target scaling factor and the target weight parameter, and then the first channel to be pruned and the second channel to be pruned are determined based on the first channel clipping strategy, the second channel clipping strategy and the preset channel clipping threshold, so that the accuracy and the reliability of determining the non-important channel are improved by combining the two-channel strategy pruning scheme, and a foundation is laid for the accuracy of subsequent channel clipping.

It can be understood that when the number of the channels of the first channel to be pruned and the second channel to be pruned determined based on the two-channel strategy pruning scheme is different and/or the channel identifiers are different, the channel to be finally pruned can be determined in a channel fusion manner. Based on this, the specific implementation process of step 120 may include:

Specifically, for the case that the number of the first channels to be pruned is m, and the number of the second channels to be pruned is n, the intersection channel of the m first channels to be pruned and the n second channels to be pruned may be determined first, and whether the intersection channel is empty or not may be determined, if the intersection channel is not empty, the intersection channel may be directly determined as the target channel to be pruned, the target channel to be pruned is a channel identified by the same channel in the m first channels to be pruned and the n second channels to be pruned, and m and n are positive integers greater than or equal to 1, respectively; on the contrary, if the intersection channel is empty, the first channel to be pruned and the second channel to be pruned may be sent to the user terminal, and the target channel to be pruned may be determined based on information fed back by the user terminal, and the information fed back by the user terminal may be generated in a manner that the user corresponding to the user terminal artificially selects the target channel to be pruned from the first channel to be pruned and the second channel to be pruned.

It can be understood that, for the case that the intersection channels of the m first channels to be pruned and the n second channels to be pruned are not empty and the number of the intersection channels is set to be p, p is a positive integer greater than 0; if p =1, directly determining the 1 intersection channel as a target channel to be pruned; if p is larger than 1, sequencing p intersection channels from large to small according to importance, selecting front [ p/h ] intersection channels from the sequenced p intersection channels, then determining the selected front [ p/h ] intersection channels as target channels to be pruned, h is a positive integer larger than or equal to 2, and [ · ] is a rounding symbol.

It can be understood that, in the actual processing process, in addition to determining the first channel to be pruned and the second channel to be pruned based on the first channel pruning policy and the second channel pruning policy, a pruning policy without adding regularization constraint may be adopted to determine the third channel to be pruned, where the pruning policy without adding regularization constraint may be an existing pruning policy without adding regularization constraint, such as a pruning policy of an SPF method, a pruning policy of a gm FPGM method, and the like; at this time, the target channel to be pruned can be determined according to the intersection channel of the first channel to be pruned, the second channel to be pruned and the third channel to be pruned.

According to the network model pruning method, the target channel to be pruned is determined by aiming at the intersection channel of the first channel to be pruned and the second channel to be pruned, which are determined by the first channel pruning strategy and the second channel pruning strategy, so that the aim of pruning redundant parameters in the network model to be pruned is fulfilled on the premise of ensuring the precision of the network model after pruning, the size, the calculated amount and the parameter amount of the network model after pruning are greatly reduced, the inference speed of the network model after pruning is improved, the real-time performance can be further improved in hardware with the same computing power and memory, the hardware requirement of an algorithm deployment platform can be reduced under the requirement of the same real-time performance, and the hardware cost of algorithm deployment is reduced.

It can be understood that when the number of the channels of the first channel to be pruned and the second channel to be pruned determined based on the two-channel strategy pruning scheme are different and/or the channel identifiers are different, the channel to be finally pruned can be determined in another channel fusion mode. Based on this, the specific implementation process of step 120 may include:

respectively sequencing the plurality of first channels to be pruned and the plurality of second channels to be pruned according to importance; and determining a target channel to be pruned based on the union of the results obtained by the sorting.

Specifically, after the m first channels to be pruned are sorted from large to small according to importance, the first m/2 first channels to be pruned are selected from the sorted m first channels to be pruned; similarly, after the n second channels to be pruned are sorted from large to small according to importance, the first n/2 second channels to be pruned are selected from the n sorted channels to be pruned; and then determining the union set of the first m/2 channels to be pruned and the first n/2 channels to be pruned as the channels to be pruned. Further, when m and n are odd numbers, selecting the first [ m/2] first channels to be pruned or the first [ n/2] second channels to be pruned, wherein [. Cndot. ] is a rounding symbol.

That is, the target channel to be pruned can be determined by directly merging the two channels to be pruned after the importance ranking. It can be understood that, here, the first m/2 first channels to be pruned and the first n/2 second channels to be pruned may be selected for union operation, and the first m/l first channels to be pruned and the first n/l second channels to be pruned may also be selected for union operation, where the value of l ≧ 2,l is not limited.

It can be understood that, for the case that the intersection channel of the m first channels to be pruned and the n second channels to be pruned is empty, the target channels to be pruned may also be determined in a manner that the m first channels to be pruned and the n second channels to be pruned are sorted first and then merged according to the importance. And will not be described in detail herein.

According to the network model pruning method, the target channels to be pruned are determined in a mode that the plurality of first channels to be pruned and the plurality of second channels to be pruned are sequenced firstly and then collected according to the importance, so that the aim of screening the target channels to be pruned with the lowest importance for the primarily determined channel set to be pruned is fulfilled, and the precision of model pruning and the convergence speed of the model after pruning are further improved.

It is understood that, in order to ensure the accuracy of the pruned network model, the pruned network model may be fine-tuned to converge. Based on this, after step 130, the network model pruning method may further include:

Specifically, the network model obtained by pruning is finely adjusted in a knowledge distillation mode, namely, the network model to be sparsely trained is used as a teacher network, the network model obtained by pruning is used as a student network, and then the network model obtained by pruning and the student network are finely adjusted in a distillation mode. The specific process of the knowledge distillation is the same as that of the prior knowledge distillation method, and the detailed description is omitted here.

According to the network model pruning method, the precision of the network model after pruning is improved in a mode of carrying out fine adjustment on the network model obtained through pruning in a knowledge distillation mode, and therefore reliable guarantee is provided for greatly improving the lightweight effect of the network model after pruning.

It can be understood that, since the purpose of pruning and fine tuning is to obtain a target network model meeting the actual requirement of light weight, the precision of the network model obtained through pruning and fine tuning can be judged. Based on this, after the network model obtained by pruning is subjected to fine tuning, the network model pruning method may further include:

matching the model precision of the network model obtained by fine tuning with the preset model precision; and determining a target network model based on the successful matching result of the model precision and the preset model precision.

The preset model precision can be set artificially according to the actual lightweight requirement of the network model to be pruned.

Specifically, the target network model is a lightweight, small-scale and high-precision network model, and the precision of the preset model can be set manually according to actual requirements. For example, if a large-scale network model to be pruned needs to be pruned to a small model, in order to ensure the convergence speed of the model and the precision of the model, a high preset channel clipping threshold value is usually not set at one time, but a small preset channel clipping threshold value is set first to perform one-time clipping and then the effect is checked, and the size of the preset channel clipping threshold value of the next time of clipping is determined according to the current clipping and fine-tuning effect until the network model to be pruned is pruned to a target network model; on the contrary, if the network model to be pruned with a smaller scale needs to be pruned, a larger preset channel clipping threshold value can be set on the premise of ensuring the convergence speed and accuracy of the model. Based on the method, the network model obtained after pruning and fine tuning can be judged whether the model precision of the network model obtained after pruning and fine tuning is the target network model reaching the preset model precision or not, if the model precision of the network model obtained after pruning and fine tuning is determined to reach the preset channel cutting threshold value, the model precision and the preset model precision are determined to be successfully matched, the pruning operation is stopped, and the corresponding network model when the pruning operation is stopped is determined to be the target network model.

According to the network model pruning method, the target network model when matching is successful is determined in a mode of matching the model precision of the network model obtained through pruning and fine tuning with the preset model precision, and the reliability and the accuracy of obtaining the lightweight network model are improved.

It can be understood that if the accuracy of the network model obtained by the pruning and the fine tuning does not meet the preset light weight requirement, the target network model can be determined by means of the pruning and the fine tuning again. Based on this, the network model pruning method may further include:

firstly, acquiring a new channel set to be pruned corresponding to different preset pruning strategies in a network model obtained by fine tuning based on a matching failure result of model precision and preset model precision; and further determining a new target channel to be pruned based on the new channel set to be pruned so as to prune the network model obtained by fine tuning.

Specifically, for the network model obtained after pruning and fine tuning, it is determined that the model precision of the network model obtained by pruning and fine tuning does not reach the preset model precision, and the network model obtained by pruning and fine tuning can be used as a new network model to be pruned to return to step 110, so as to obtain a new channel set to be pruned; determining a new target channel to be pruned based on the new channel set to be pruned, and pruning the new network model to be pruned based on the new channel to be pruned; and matching the model precision of the network model obtained by pruning and fine tuning with the preset model precision successfully.

It can be understood that, when the network model to be pruned is the network model YOLOv5n applied to the character detection scene, the network model pruning method provided in the present application may be used to prune the YOLOv5n, that is, first obtain channel sets to be pruned corresponding to different preset pruning strategies in the YOLOv5n, then determine a target channel to be pruned based on the channel sets to be pruned, and then prune the YOLOv5n based on the target channel to be pruned, where both the process of obtaining the channel sets to be pruned and the process of determining the target channel to be pruned may refer to the foregoing embodiments. Further, for YOLOv5, pruning is performed by using the method (BN + CONV) of the present application, existing pruning is performed by using only the scale factor of the BN layer, and pruning is performed by using only Convolution (CONV)Branching, the effect graphs shown in fig. 2, fig. 3 and fig. 4 and the data results shown in table 1 can be obtained, fig. 2 is a graph corresponding to different pruning methods under the Model Size (Model Size) and the full class average accuracy (mep), fig. 3 is a graph corresponding to different pruning methods under the Parameter (Parameter) and the full class average accuracy (mep), and fig. 4 is a graph corresponding to different pruning methods under the GFLOPS and the full class average accuracy (mep); wherein, GFLOPs is GFLOPS (Giga Floating-point Operations Per Second) with 10 hundred million Floating-point operands Per Second, and GFLOPs can be used as performance parameters of the central processing unit; YOLOv5n0 is a method for pruning YOLOv5n by using the existing scaling factor of BN layer only or by using Convolution (CONV) only, and YOLOv5n ¹ Pruning was performed using the method of the present application for YOLOv5 n.

As can be understood from fig. 2 to fig. 4 and table 1, when the method of the present application is used for pruning YOLOv5n, the calculated amount is reduced by 47%, the model size is reduced by 71%, the full-class average accuracy (mAP) is reduced by 2.8%, and the inference time of the single-core central processing unit is reduced by 40%, where the inference time is the time obtained by averaging after 100 single-core inferences; therefore, compared with the existing method for pruning by only using the scaling factor of the BN layer, the method has the advantages that the model lightweight result is smaller and the consumed time is less on the premise of ensuring the large reduction of the calculated amount and the performance of the central processing unit.

TABLE 1

The network model pruning device provided by the present application is described below, and the network model pruning device described below and the network model pruning method described above may be referred to in correspondence with each other.

Fig. 5 illustrates a schematic structural diagram of a network model pruning apparatus, and as shown in fig. 5, the network model pruning apparatus 500 includes:

an obtaining module 510, configured to obtain channel sets to be pruned corresponding to different preset pruning strategies in a network model to be pruned, where the different channels to be pruned in the channel sets to be pruned represent unimportant channels obtained by using the corresponding preset pruning strategies;

a determining module 520, configured to determine a target channel to be pruned based on the channel set to be pruned;

and the pruning module 530 is configured to prune the network model to be pruned based on the target channel to be pruned.

It can be understood that the obtaining module 510 may be specifically configured to obtain a target scaling factor of a BN layer in a network model to be pruned and a target weight parameter of a convolution kernel of a convolutional layer; determining a first channel to be pruned matched with the target scaling factor and a second channel to be pruned matched with the target weight parameter; and acquiring a channel set to be pruned containing a first channel to be pruned and a second channel to be pruned.

It can be understood that the obtaining module 510 may be further configured to use the sample data set to train the initial neural network model, and determine a corresponding to-be-sparse-trained network model when training is stopped; acquiring an initial scaling factor of a BN layer in a network model to be sparsely trained and an initial weight parameter of a convolution kernel of a convolution layer; determining a first loss function of the BN layer and a second loss function of the convolution kernel based on a preset regularization constraint, an initial scaling factor and an initial weight parameter; and performing sparse training on the network model to be sparsely trained based on the first loss function and the second loss function to obtain a target scaling factor of a BN layer in the network model to be pruned and a target weight parameter of a convolution kernel of the convolution layer.

It is to be understood that the determining module 520 may be specifically configured to determine the first channel clipping strategy based on the target scaling factor, the input of the BN layer, and the bias of the BN layer; determining a first channel to be pruned based on a first channel pruning strategy and a preset channel pruning threshold; determining a second channel clipping strategy based on the input of the convolution kernel and the target weight parameter; and determining a second channel to be pruned based on the second channel pruning strategy and a preset channel pruning threshold.

It can be understood that the determining module 520 may be further configured to determine the target channel to be pruned based on an intersection channel of the first channel to be pruned and the second channel to be pruned.

It can be understood that the determining module 520 may be further configured to sort the plurality of first channels to be pruned and the plurality of second channels to be pruned respectively according to the importance; and determining a target channel to be pruned based on the union of the results obtained by the sorting.

It is understood that the pruning module 530 may be further configured to perform fine tuning on the network model obtained by pruning by using a knowledge distillation method.

It can be understood that the pruning module 530 may be specifically configured to match the model precision of the network model obtained by the fine tuning with the preset model precision; and determining a target network model based on the successful matching result of the model precision and the preset model precision.

It can be understood that the pruning module 530 may be further configured to obtain, based on a matching failure result of the model precision and the preset model precision, new channel sets to be pruned corresponding to different preset pruning strategies in the network model obtained through fine tuning; and determining a new target channel to be pruned based on the new channel set to be pruned so as to prune the network model obtained by fine tuning.

According to the network model pruning device disclosed by the embodiment of the application, the channel set to be pruned corresponding to different preset pruning strategies in the network model to be pruned is firstly obtained, the target channel to be pruned is determined based on the channel set to be pruned, and the channels to be pruned in the channel set to be pruned are represented by unimportant channels obtained corresponding to the preset pruning strategies, so that the channels to be pruned contained in the channel set to be pruned are more abundant and comprehensive when the number of the preset pruning strategies is larger, the determined target channel to be pruned is more accurate, the network model to be pruned is pruned based on the target channel to be pruned, the aim of efficiently pruning CNN when the hardware resources and the real-time of a computer are limited can be fulfilled, and meanwhile, the precision of the network model after pruning is effectively improved in a light weight manner.

Fig. 6 illustrates a physical structure diagram of an electronic device, and as shown in fig. 6, the electronic device 600 may include: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may call logic instructions in the memory 630 to perform the following method:

acquiring channel sets to be pruned corresponding to different preset pruning strategies in a network model to be pruned, wherein the different channels to be pruned in the channel sets to be pruned represent unimportant channels acquired by using the corresponding preset pruning strategies;

determining a target channel to be pruned based on the channel set to be pruned;

In addition, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that contribute to the related art in essence may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, embodiments of the present application disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments, for example, including:

In another aspect, the present application further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, for example, the method includes:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that the above embodiments are only for illustrating the present application, and do not limit the present application. Although the present application has been described in detail with reference to the embodiments, those skilled in the art should understand that various combinations, modifications and equivalents may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application, and the technical solutions of the present application should be covered by the claims of the present application.

Claims

1. A network model pruning method, comprising:

2. The network model pruning method according to claim 1, wherein the different preset pruning strategies include a BN layer channel pruning strategy and a convolutional layer channel pruning strategy, and the obtaining of channel sets to be pruned corresponding to the different preset pruning strategies in the network model to be pruned includes:

obtaining a target scaling factor of a BN layer in a network model to be pruned and a target weight parameter of a convolution kernel of a convolution layer;

3. The method of claim 2, wherein the determining a first channel to be pruned that matches the target scaling factor and a second channel to be pruned that matches the target weight parameter comprises:

4. The network model pruning method of claim 2, wherein the determining a target channel to be pruned based on the set of channels to be pruned comprises:

5. The network model pruning method of claim 2, wherein the determining a target channel to be pruned based on the set of channels to be pruned, further comprises:

6. The network model pruning method according to claim 2, wherein the obtaining of the target scaling factor of the BN layer and the target weight parameter of the convolution kernel of the convolutional layer in the network model to be pruned includes:

7. The network model pruning method according to any one of claims 2 to 6, wherein after the pruning of the network model to be pruned based on the first channel to be pruned and the second channel to be pruned, the method further comprises:

8. The network model pruning method of claim 7, wherein after the fine-tuning of the pruned network model, the method further comprises:

9. The network model pruning method according to claim 8, wherein after the matching of the model accuracy of the fine-tuned network model and the preset model accuracy, the method further comprises:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the network model pruning method according to any one of claims 1 to 9 when executing the program.

11. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the network model pruning method according to any one of claims 1 to 9.

12. A computer program product comprising a computer program, characterized in that the computer program realizes the network model pruning method according to any of claims 1 to 9 when executed by a processor.