CN116992941A

CN116992941A - Convolutional neural network pruning method and device based on feature similarity and feature compensation

Info

Publication number: CN116992941A
Application number: CN202311034754.4A
Authority: CN
Inventors: 唐斌; 王强
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2023-08-17
Filing date: 2023-08-17
Publication date: 2023-11-03

Abstract

The invention discloses a convolutional neural network pruning method and device based on feature similarity and feature compensation. The method comprises the following steps: the size of an input image in a data set is adjusted to be a fixed size, the pixel value of the image is subjected to standardized processing, and training data is added through an image enhancement technology; initializing a network structure, setting model parameters, and training the model by using training data; aiming at the trained model, obtaining the similarity between convolution kernels, carrying out cluster analysis according to the similarity, carrying out similar grouping on each layer of convolution kernels, selecting the reserved convolution kernels from each similar grouping, and generating a new network structure; copying the reserved convolution kernel parameters into a new network structure, and compensating the weight parameters of the pruned convolution kernels in each similar group into the reserved convolution kernels in a parameter superposition mode; and (5) performing model accuracy restorative training by using the original data set, and storing model parameters and a network structure. The pruning method can maintain the model precision.

Description

Convolutional neural network pruning method and device based on feature similarity and feature compensation

Technical Field

The invention relates to the field of computer vision and model compression, in particular to a convolutional neural network pruning method and device based on feature similarity and feature compensation.

Background

As machine computing power continues to increase, convolutional neural networks (Convolutional neural networks, CNNs) have made significant progress in the field of computer vision. Such as image recognition, object detection, and image segmentation. However, the performance of the CNNs model is closely related to its complexity. For best results in various computer vision tasks, it is often necessary to use deeper and wider networks, resulting in a large floating point number calculation and parameter count. For computing tasks requiring real-time and privacy, it is necessary to directly perform model reasoning on the edge device, but it is often difficult for the resource-constrained edge device to deploy a network model with a larger scale. Therefore, reducing the number of model floating point numbers and parameters is of great significance to edge device deployment.

In order to overcome the problem that the edge equipment is difficult to deploy, a network pruning method is adopted in many application scenes. For huge floating point number calculation amount and parameter amount, weight parameters or network structures with smaller influence on model precision in the network can be pruned. The network pruning scheme is divided into unstructured pruning and structured pruning, and the unstructured pruning has the advantages that the unstructured pruning is sparse in weight matrix, so that the effects of compression and acceleration calculation are difficult to achieve under the condition of no special calculation hardware. Structured pruning compresses the network size by pruning the network structure, such as convolution kernels, convolution layers, etc., the original convolution structure of the model remains after pruning. The structured pruning scheme mainly prunes the convolution kernels with the importance index lower than the threshold value by setting the importance index for each convolution kernel or channel and globally sets the threshold value, and the pruned model has obvious compression effect on scale compared with the original model, so that the problem of directly deploying the network model by the edge equipment is solved to a certain extent.

Most of the structuralized pruning schemes are mainly focused on the design of convolution kernels or channel importance indexes, and the similarity of the characteristics of the convolution kernels or the channels is ignored. In the reasoning process of the convolutional neural network, the output feature graphs of different convolutional kernels have similarity, and the convolutional kernels with similar behaviors in the pruning network can reduce the loss of model precision under the condition of compressing the model scale. In addition, in the case of conventional training of the model, the pruned network parameters contain more characteristic information, and this part of network parameters can play a positive correlation role in model accuracy, and in the process of network pruning, this part of characteristic information needs to be transferred to the reserved network structure by means of characteristic compensation. Therefore, it is necessary for structured pruning to consider feature similarity and feature compensation.

Disclosure of Invention

The invention aims to: aiming at the defects of the prior art, the invention provides a convolutional neural network pruning method and device based on feature similarity and feature compensation, which can reduce the loss of model precision under the condition of reducing the calculated quantity and the parameter quantity of model floating point numbers, so that the model can be directly deployed at edge equipment, the instantaneity of reasoning is improved, and the privacy safety is protected.

The technical scheme is as follows: in order to achieve the above object, the present invention has the following technical scheme:

a convolutional neural network pruning method based on feature similarity and feature compensation comprises the following steps:

the size of an input image in a data set is adjusted to be a fixed size, the pixel value of the image is subjected to standardized processing, and training data of an original data set is increased through an image enhancement technology;

initializing a network structure, setting model parameters, and training the model by using training data;

aiming at the trained model, obtaining the similarity between convolution kernels, carrying out cluster analysis according to the similarity, carrying out similar grouping on each layer of convolution kernels, selecting the reserved convolution kernels from each similar grouping, and generating a new network structure;

copying the reserved convolution kernel parameters into a new network structure, and compensating the weight parameters of the pruned convolution kernels in each similar group into the reserved convolution kernels in a parameter superposition mode;

and (5) performing model accuracy restorative training by using the original data set, and storing model parameters and a network structure.

According to an embodiment of the present invention, obtaining the similarity between convolution kernels comprises: stretching the three-dimensional tensor of each convolution kernel into one-dimensional tensor, and calculating cosine similarity between the one-dimensional tensors so as to measure the similarity between the convolution kernels.

According to an embodiment of the present invention, the tensor stretching process is expressed as:

wherein ,is the jth convolution kernel in the ith layer, which consists of n _i H is set as _i ×w _i Is stretched to form a one-dimensional tensor +.>n _i Representing the number of input feature graphs of the ith layer, n _i+1 Represents the number of the i-th layer output characteristic diagrams, h _i And w is equal to _i Representing the height and width of the two-dimensional tensor in each three-dimensional tensor, respectively.

According to an embodiment of the present invention, similarly grouping the convolution kernels of each layer includes: for a neural network with L layers of convolution kernels, similar grouping is carried out on each layer of convolution kernels by using a kmeans cluster analysis algorithm, and the number of the groups is defined by the pruning rate p set by each layer _i Determining, wherein the number of packets of the i-th layer convolution kernel is: k (k) _i ＝p _i ×n _i+1 Wherein i is more than or equal to 1 and less than or equal to L, n _i+1 Representing the number of i-th layer output feature graphs.

According to an embodiment of the present invention, selecting the retained convolution kernel in each similar packet includes: and calculating the sum of the absolute values of the weight parameters of each convolution kernel by using the one-dimensional tensor stretching result, and selecting the convolution kernel with the maximum sum of the absolute values of the weight parameters from each similar group as a reserved convolution kernel.

According to an embodiment of the present invention, generating a new network structure includes: setting a subscript for each convolution kernel of each layer, storing the subscript of the reserved convolution kernels in a mask array, and generating a new network structure according to the mask array, wherein the mask array is expressed as:

wherein mask_i A mask array representing an I-th layer, 1 representing the j-th convolution kernel in the I-th layer being preserved, 0 representing the j-th convolution kernel in the I-th layer being pruned, I representing the set of preserved convolution kernels, and U representing the set of pruned convolution kernels. .

According to an embodiment of the invention, the manner of parameter superposition is as follows: and sharing the weight parameters of each group of pruned convolution kernels to the reserved convolution kernels, wherein the sharing refers to overlapping tensors of corresponding positions of the convolution kernels in the group as convolution kernels after sharing the parameters.

A convolutional neural network pruning device based on feature similarity and feature compensation, comprising:

the data preprocessing module is configured to adjust the size of an input image in the dataset to be a fixed size, perform standardization processing on pixel values of the image, and increase training data of the original dataset through an image enhancement technology;

the model training module is configured to initialize a network structure and set model parameters, and training the model by using training data;

the network pruning module is configured to acquire the similarity between the convolution kernels according to the trained model, perform cluster analysis according to the similarity, perform similar grouping on each layer of convolution kernels, select the reserved convolution kernels from each similar grouping, and generate a new network structure;

the parameter compensation module is configured to copy the reserved convolution kernel parameters into a new network structure, and compensate the weight parameters of the pruned convolution kernels in each similar group into the reserved convolution kernels in a parameter superposition mode;

and the precision recovery module is configured for performing model precision restorability training by utilizing the original data set and storing model parameters and a network structure.

The present invention also provides a computer device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the convolutional neural network pruning method based on feature similarity and feature compensation as described above.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a convolutional neural network pruning method based on feature similarity and feature compensation as described above.

Compared with the prior art, the invention has the following advantages and beneficial effects: aiming at the scene of the importance of the similarity of the convolution kernels and the redundancy parameters, the convolution kernels with the similarity output are grouped in each layer through a clustering method, and one convolution kernel is reserved in each group, so that network compression is realized. Secondly, aiming at the problem that most network pruning methods do not fully consider the influence of parameters of a pruned structure on network performance, the invention compensates the parameters of pruned convolution kernels in each group into reserved convolution kernels through superposition in the network pruning process, so that the network obtains more information and improves the network performance. By using the method and the device, the loss of model precision can be reduced under the condition of reducing the calculated quantity and the parameter quantity of the model floating point number, so that the model can be directly deployed on the edge equipment, the instantaneity of reasoning is improved, and the privacy safety is protected.

Drawings

FIG. 1 is a flow chart of a convolutional neural network pruning method of the present invention;

FIG. 2 is a schematic diagram of a network structure generated prior to model training;

FIG. 3 is a schematic diagram of a network pruning flow;

fig. 4 is a schematic diagram of model accuracy restorative training after pruning.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

The network pruning method is to compress a trained model, and before network pruning, a network structure and a data set are required to be determined, and model training is performed. After network pruning, the original data set is required to be used for carrying out accuracy recovery training on the pruned model. Referring to fig. 1, the convolutional neural network pruning method based on feature similarity and feature compensation provided by the invention comprises the following steps:

and (1) adjusting the size of an input image in the dataset to be a fixed size, carrying out standardization processing on pixel values of the image, and adding training data of the original dataset through an image enhancement technology.

The size of the input image is adjusted to a fixed size to facilitate the input of the network. In the embodiment of the invention, the input size of the image is fixed to be 3×32×32 for the VGG network structure, and 3×224×224 for the res net network structure.

Through image enhancement technology, the diversity of the data set is increased by rotation, overturning and cutting, so that the training data of the network is increased, and the generalization capability of the network is improved.

And (2) initializing a network structure, setting model parameters, and training the model by using training data.

For training of the model, different network structures are initialized according to different requirements, and fig. 2 is an example of the generated network structure. In the embodiment of the invention, the optimization algorithm is SGD (random gradient descent algorithm), the initial learning rate is 0.1, the learning rate attenuation is 0.1, the momentum is 0.9, the weight attenuation is 5e-4, the batch size is 128, and the learning rate attenuation is carried out once every 30 rounds. And training the model by using the enhanced training data.

And (3) aiming at the trained model, obtaining the similarity between the convolution kernels, carrying out cluster analysis according to the similarity, carrying out similar grouping on each layer of convolution kernels, selecting the reserved convolution kernels from each similar grouping, and generating a new network structure.

Different from the overall pruning rate setting, the method adopts the scheme of setting the pruning rate according to layers, and the compression degree of the model can be freely set according to the redundancy condition in this way, so that the front convolution layer for extracting more characteristic information does not need to have the importance compared with the rear convolution layer with more redundancy information, and the importance of each layer is limited in the current layer.

For a trained model, a convolution kernel is obtained by traversing the mobile of each layer of the model, and in order to conveniently calculate cosine similarity among different convolution kernels, a three-dimensional convolution kernel tensor is stretched into a one-dimensional convolution kernel tensor. The tensor stretching process can be represented by formula (1):

After the tensor stretching is completed, the calculation of cosine similarity between one-dimensional tensors can be represented by formula (2):

wherein ,and->Respectively represent the mth and nth convolutions in the ith layerAnd (3) a core.

Similar groupings are made for each layer of convolution kernels using kmeans cluster analysis algorithm. For a network with L layers of convolution layers, in the clustering grouping of each layer of convolution kernel, the number of the grouping is defined by the pruning rate p set by each layer _i (1.ltoreq.i.ltoreq.L), wherein the number of groupings of the ith layer convolution kernel may be represented by equation (3):

k _i ＝p _i ×n _i+1 (3)

after the above operation is performed, the similar grouping situation of the convolution kernels in each convolution layer can be obtained, and since the convolution kernels have similarity, and the corresponding output feature graphs have similarity, the next layer convolution will produce similar results when performing convolution operation on the similar feature graphs, so that only one convolution kernel is reserved for each similar grouping, and the influence of the feature similarity is fully considered. In the model reasoning process, the larger weight parameters have larger influence on the model reasoning result, and for each similar group, the importance degree of each convolution kernel is measured by the sum of the absolute values of the weight parameters in the group, and each group only keeps the convolution kernel with the largest sum of the absolute values of the weight parameters, wherein the sum of the absolute values of the weight parameters of the convolution kernel can be represented by a formula (4):

wherein ,representing the sum of absolute values of the j-th convolution kernel weight parameters in the i-th layer.

After determining the convolution kernel that is preserved in each packet, the pruned network structure is also determined. To facilitate copying parameters of the original model with the preserved convolution kernel into the new network structure, and to perform feature compensation for redundancy parameter utilization, a mask array is set at each layer to mark the preserved convolution kernel subscript in the original model, where the mask array can be represented by formula (5):

wherein mask_i A mask array representing an I-th layer, 1 representing the j-th convolution kernel in the I-th layer being preserved, 0 representing the j-th convolution kernel in the I-th layer being pruned, I representing the set of preserved convolution kernels, and U representing the set of pruned convolution kernels.

And (4) copying the reserved convolution kernel parameters into a new network structure, and compensating the weight parameters of the pruned convolution kernels in each similar packet into the reserved convolution kernels in a parameter superposition mode.

After the mask array is obtained, a new network structure is generated according to the mask array, and the parameters of the reserved convolution kernel are copied into the new network structure. And after the copying is finished, performing characteristic compensation according to the packet condition and the reserved convolution kernel condition. The method of the present invention considers the weight parameters of the pruned convolution kernel. For a conventionally trained model, the importance of the convolution kernel is judged directly through a certain index, the interpretability is poor, and the accuracy is poor before the model is recovered to be trained. Because the convolution kernels of the same group have similar behaviors and the output characteristic graphs of the convolution kernels are similar, the method of the invention shares the weight parameters of each group of pruned convolution kernels to the reserved convolution kernels in a superposition mode, thereby realizing characteristic compensation. The manner of parameter superposition is as follows: and sharing the weight parameters of each group of pruned convolution kernels to the reserved convolution kernels, wherein the sharing refers to overlapping tensors of corresponding positions of the convolution kernels in the group as convolution kernels after sharing the parameters.

For example, in the ith layer, the convolution kernelSimilarly, is->The sum of the absolute values of the weight parameters is the largest, will +.>And->Weight parameters of (2) are shared by superposition to +.>The convolution kernel superposition can be represented by equation (6):

wherein q1, q2, q 3. Epsilon.1, n _i+1 ]，Representing the convolution kernel after sharing parameters, +.>And->Respectively represent->And->N of (v) _i Two-dimensional tensor->Finger will-> And->The two-dimensional tensors of the corresponding positions are superimposed, +.>Representing that one-dimensional tensors of corresponding positions in the two-dimensional tensors are overlapped, and the two-dimensional tensors are +.>And->Will be pruned in the newly generated network.

The above description in connection with step (3) and step (4) discloses the core technical points of network pruning and feature compensation according to the present invention, and the flow chart is shown in fig. 3, after clustering the packets, the maximum value of the sum of the absolute values of the weight parameters of each convolution kernel in the packets needs to be calculated, the convolution kernels needing to be reserved in each similar packet are determined by using the maximum value, and the sum of the absolute values of the weight parameters can be calculated by using the one-dimensional tensor after stretching. While determining the reserved convolution kernels, a subscript may be set for each convolution kernel, the subscript of the reserved convolution kernels being recorded in the mask array. A new network structure is generated from the mask array and the weight parameters of the retained convolution kernel are copied into the new network structure. In the new network structure, according to the result of each group and the index of the reserved convolution kernel, the weight parameters of the pruned convolution kernel in the group are overlapped into the reserved convolution kernel, so as to realize characteristic compensation. In fig. 3, for the i-th layer convolution layer, the convolution kernels of different groups are represented using different backgrounds, the convolution kernels pruned in a certain group are marked using gray, and since the convolution kernels are pruned, they have no effect on the i+1-th layer output feature map and are represented using a dotted line graph.

And (5) performing model precision restorative training by using the original data set, and storing model parameters and a network structure.

Fig. 4 is a schematic diagram of model accuracy restorative training after pruning, after network pruning, a network structure after pruning is obtained, the pruned part is graphically represented by a dotted line, for different models, a fine-grained learning rate is specified, then the model accuracy restorative training is performed by using an original data set, and finally model parameters and the network structure are saved for deployment and use.

Aiming at the difficult problem that pruning force of a convolutional neural network is balanced with model precision, the fact that similar output characteristic diagrams exist in the convolutional result of each layer is considered, and the similar characteristic diagrams exist as the input of the next convolutional layer to have similar results. The invention outputs the convolution kernel corresponding to the characteristic diagram through pruning similar, pruning redundant parameters. When network pruning is carried out, the weight parameters of the pruned convolution kernel are directly discarded, however, the pruned weight parameters often play a positive role on model accuracy, and the invention re-compensates the parameters to the network in a superposition mode. Furthermore, the deeper the model hierarchy, the more redundant parameters. For shallower levels, pruning proportion can be reduced, and for deeper levels, pruning proportion can be increased to achieve retention of effective information.

Based on the same inventive concept as the method, the invention also provides a convolutional neural network pruning device based on feature similarity and feature compensation, which comprises the following components:

It should be understood that the convolutional neural network pruning device in the embodiment of the present invention may implement all the technical solutions in the above method embodiments, and the functions of each functional module may be specifically implemented according to the methods in the above method embodiments, and the specific implementation process may refer to the relevant descriptions in the above embodiments, which are not repeated herein.

It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a method, apparatus, computer device, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The invention is described with reference to flow charts of methods according to embodiments of the invention. It will be understood that each flow in the flowchart, and combinations of flows in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows.

The above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The convolutional neural network pruning method based on feature similarity and feature compensation is characterized by comprising the following steps of:

2. The method of claim 1, wherein obtaining the similarity between convolution kernels comprises: stretching the three-dimensional tensor of each convolution kernel into one-dimensional tensor, and calculating cosine similarity between the one-dimensional tensors so as to measure the similarity between the convolution kernels.

3. The method of claim 2, wherein the tensor stretching process is expressed as:

4. The method of claim 1, wherein similarly grouping each layer of convolution kernels comprises: for a neural network with L-layer convolution kernels, kmeans aggregation is usedThe class analysis algorithm groups the convolution kernels of each layer similarly, and the number of groups is defined by the pruning rate p set by each layer _i Determining, wherein the number of packets of the i-th layer convolution kernel is: k (k) _i ＝p _i ×n _i+1 Wherein i is more than or equal to 1 and less than or equal to L, n _i+1 Representing the number of i-th layer output feature graphs.

5. The method of claim 2, wherein selecting the retained convolution kernel in each similar packet comprises: and calculating the sum of the absolute values of the weight parameters of each convolution kernel by using the one-dimensional tensor stretching result, and selecting the convolution kernel with the maximum sum of the absolute values of the weight parameters from each similar group as a reserved convolution kernel.

6. The method of claim 5, wherein generating a new network structure comprises: setting a subscript for each convolution kernel of each layer, storing the subscript of the reserved convolution kernels in a mask array, and generating a new network structure according to the mask array, wherein the mask array is expressed as:

7. The method according to claim 1, characterized in that the parameters are superimposed in the following way: and sharing the weight parameters of each group of pruned convolution kernels to the reserved convolution kernels, wherein the sharing refers to overlapping tensors of corresponding positions of the convolution kernels in the group as convolution kernels after sharing the parameters.

8. A convolutional neural network pruning device based on feature similarity and feature compensation, comprising:

9. A computer device, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processors implement the steps of the feature similarity and feature compensation based convolutional neural network pruning method of any one of claims 1-7.

10. A computer storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the convolutional neural network pruning method based on feature similarity and feature compensation as claimed in any one of claims 1-7.