CN113850385A - Coarse and fine granularity combined neural network pruning method - Google Patents

Coarse and fine granularity combined neural network pruning method Download PDF

Info

Publication number
CN113850385A
CN113850385A CN202111187212.1A CN202111187212A CN113850385A CN 113850385 A CN113850385 A CN 113850385A CN 202111187212 A CN202111187212 A CN 202111187212A CN 113850385 A CN113850385 A CN 113850385A
Authority
CN
China
Prior art keywords
pruning
convolution kernel
mode
layer
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111187212.1A
Other languages
Chinese (zh)
Inventor
姜宏旭
朱雨婷
李波
张永华
东东
胡宗琦
从容子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111187212.1A priority Critical patent/CN113850385A/en
Publication of CN113850385A publication Critical patent/CN113850385A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a coarse and fine granularity combined neural network pruning method, which comprises the following steps: carrying out group thinning training on the screened candidate filters, and trimming the candidate filters smaller than the threshold value after a certain number of rounds; sequencing the importance of the convolution kernels layer by layer, and obtaining the convolution kernels to be pruned layer by layer according to a pre-defined pruning rate; performing regularization compression on the convolution kernel by taking the weight as a unit, and dynamically generating a mode set meeting a pre-constructed mode discrimination function in the compression process; matching each convolution kernel to an optimal pattern of the convolution kernel in the pattern set; carrying out convolution kernel pruning and mode pruning; setting parameters to be pruned to zero, and carrying out hard pruning on the model; and (4) retraining and fine-tuning the model after the hard pruning by combining a knowledge distillation method to obtain a final model after the pruning. The invention can fully exert the advantages of structured and unstructured pruning, improves the model storage and reasoning efficiency and has higher hardware friendliness.

Description

Coarse and fine granularity combined neural network pruning method
Technical Field
The invention relates to the technical field of embedded AI (artificial intelligence), in particular to a coarse and fine granularity combined neural network pruning method.
Background
With the advent of a range of embedded devices, it is very challenging to perform deep neural network reasoning in view of high computational and memory requirements. Pruning is a method widely used in model compression at present, and judges the importance of parameters by searching an effective judging means, and cuts unimportant parameters to reduce the redundancy of the model. The existing pruning method mainly comprises structured pruning and unstructured pruning, wherein the unstructured pruning has the characteristics of refinement and high precision, but is not friendly to hardware, and the structured pruning has the characteristics of coarse granularity and high hardware efficiency, but has larger precision loss.
For unstructured pruning, the size of a model cannot be changed and inference time cannot be reduced by directly setting the weight to zero, and the model needs to be sparsely encoded by indexes and assisted by a specific matrix multiplication strategy. Weight clipping has been less focused on arbitrary positions at present because although clipping at arbitrary positions brings higher net parameter compression ratio, the indexing overhead is not negligible, and sparse multiplication efficiency is significantly lower than dense matrix, so weight clipping at arbitrary positions no longer has advantages in parameter compression ratio or inference time. Mode pruning is proposed in recent research, a fixed number of weights are pruned in each convolution kernel, and the rest weights are concentrated in a certain area to form a specific kernel mode, so that an intermediate sparse type between unstructured pruning and structured pruning is generated, and the method has better hardware adaptation degree. However, in the current scheme, the research on the importance of the parameter of the convolution kernel is directly applied to mode selection, the mode is selected by a greedy method, all convolution kernels cannot be matched with the optimal mode, and a comparative experimental demonstration on the effect of adopting a convolution importance strategy (such as forcing a central element not to be pruned) is lacked. Another approach mathematically derives the corresponding best mode when the model parameters are fixed, but ignores the variability of the parameters during training. Meanwhile, single-round pattern generation consumes a lot of time cost compared with single-round training, and pattern generation performed round by round is not preferable.
Structured pruning is mainly divided into filter pruning and convolution kernel pruning from the dimension. These schemes crop insignificant filters or convolution kernels by proposing a set of significance criteria. Since the pruning can change the weight matrix size, the inference speed can be directly improved. However, most of the existing methods combining structured pruning and unstructured pruning stack different pruning schemes in series, and analysis of mutual influence of pruning sequence and different pruning modes is lacked.
Therefore, how to provide a coarse-grained structured combined pruning method capable of fully playing the advantages of structured and unstructured pruning and improving model storage and reasoning efficiency is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a coarse-fine granularity combined neural network pruning method, which can give full play to the advantages of structured and unstructured pruning, improve the model storage and reasoning efficiency, and have higher hardware friendliness.
In order to achieve the purpose, the invention adopts the following technical scheme:
a thickness-granularity combined neural network pruning method comprises the following steps:
selecting candidate filters needing pruning in an original model layer by layer according to the two norms, carrying out group thinning training on the candidate filters, and pruning the candidate filters smaller than a threshold value after a certain number of rounds;
sequencing the importance of the convolution kernels layer by layer according to the two norms, and obtaining the convolution kernels to be pruned layer by layer according to a pre-defined pruning rate;
performing regularization compression on the convolution kernel by taking the weight as a unit, and dynamically generating a mode set meeting a pre-constructed mode discrimination function in the compression process;
matching each convolution kernel to an optimal pattern of the convolution kernel in the pattern set;
carrying out convolution kernel pruning and mode pruning;
setting parameters to be pruned to zero, and carrying out hard pruning on the model;
and (4) retraining and fine-tuning the model after the hard pruning by combining a knowledge distillation method to obtain a final model after the pruning.
Preferably, in the above method for pruning a coarse-fine combined neural network, the generating process of the pattern set is as follows:
traversing each convolution kernel of the model, and selecting 4 positions with the maximum weight absolute value to form an optimal mode of the convolution kernel;
globally counting the occurrence frequency of each optimal mode, and selecting k modes with the highest frequency as a mode set generated by the current round;
and judging the weight adjacency, the shape quantity and the current model convergence of the pattern set by using a pattern discriminant function so as to evaluate whether the pattern set generated in the current round can be used as a final pattern set.
Preferably, in the above method for pruning a coarse-fine combined neural network, the condition for determining the adjacency of weights in the pattern set is as follows: the elements within a single pattern set are all adjacent; the method specifically comprises the following steps:
traversing each non-zero element position (i, j) of the pattern set, and judging whether four adjacent positions (i +1, j), (i-1, j), (i, j +1) and (i, j-1) are in a convolution kernel or not; if the position is in the convolution kernel, judging whether the position is nonzero; if the non-zero element at the neighbor position of any element does not exist, all the mode elements are not adjacent, and if the condition does not exist after the traversal is completed, all the mode elements are adjacent.
Preferably, in the above coarse-fine combined neural network pruning method, the condition for determining the number of pattern set shapes is as follows: the shape of the mode set is less than or equal to k/2; the method specifically comprises the following steps:
numbering the convolution kernel weight indexes, wherein the number corresponding to the jth weight of the ith row is i × kernel _ size + j;
combining corresponding pattern set shapes by using dictionary record positions, and splicing the keys of the dictionaries by numbers with nonzero weights from small to large;
traversing each mode set, inquiring the number corresponding to the shape of the mode set, recording whether each number exists by using one set, if not, adding 1 to the shape number of the mode set, and adding the number into the set.
Preferably, in the above method for pruning a coarse-and-fine-grained neural network, the condition for determining the convergence of the current model is as follows: the current gradient mean value is smaller than a threshold value q; the method specifically comprises the following steps:
and acquiring the gradient of each position through the grad attribute of the tensor, and solving the mean value through a mean function.
Preferably, in the above method for pruning a coarse-fine combined neural network, pruning the candidate filters smaller than the threshold includes:
transforming the convolution matrix and the mask matrix into the same matrix, calculating two norms for each column, recording the columns smaller than the threshold value of 0.1m as a filter to be pruned;
traversing the model layer by layer, storing input and output channels to be pruned of each layer, regarding the l-th layer, the output channel to be discarded is a column index corresponding to the pruning filter of the layer, and the input channel is a row index corresponding to the pruning filter of the l-1-th layer;
the row and column to be pruned are discarded by the index _ select function.
Preferably, in the above method for pruning a coarse-fine-grained neural network, the obtaining process of the convolution kernel to be pruned is as follows:
transforming the convolution matrix, and calculating the two-norm sum of convolution kernels one by one;
sorting all the convolution kernels of each filter of each layer according to a two-norm sum;
the least important k convolution kernels are found according to the predefined good pruning rate, and the mask is set to 0 as the convolution kernel to be pruned.
Preferably, in the above method for pruning a coarse-fine-grained neural network, the matching each convolution kernel to the optimal pattern of the convolution kernel in the pattern set includes:
traversing the final mode sets one by one, calculating the sum of weight absolute values of the convolution kernel on each mode set, and selecting the maximum sum index as the mode set of the convolution kernel; after the allocation is complete, the pattern set shape of the convolution kernel is not changed.
Preferably, in the above neural network pruning method based on combination of coarse and fine granularity, regularization training is performed on weights that may be pruned in a filter pruning stage, a convolution kernel pruning stage and a mode pruning stage, and a difference between a predicted value and a true value of a cross entropy carving model is adopted.
Preferably, in the above coarse-fine granularity combined neural network pruning method, in the process of regularization training of the weights that may be pruned in stages, an iterative method of SGD random gradient descent is adopted to optimize a differentiable objective function, the gradients of loss functions are calculated on a small batch of data to iteratively update weights and bias terms, and a learning rate attenuation strategy multistep lr is adopted to perform compression pruning.
According to the technical scheme, compared with the prior art, the invention discloses a coarse and fine granularity combined neural network pruning method, a combined pruning method consisting of filter pruning, convolution kernel pruning and mode pruning compresses a model, the position of the weight needing pruning is judged according to a structured and unstructured pruning scheme, a mask layer is generated, and meanwhile, the weight needing to be discarded is gradually reduced through training with a punishment item. And carrying out actual pruning through hard pruning, multiplying the original model parameters by a mask layer, and setting the discarded parameters to be zero. And retraining and fine-tuning the pruned model to restore the precision, and learning by using the model before pruning as a teacher and using the difference of predicted values before and after pruning as a loss function through knowledge distillation.
The invention triggers from the advantages and disadvantages of different pruning methods, carries out structured and unstructured pruning on the model, and carries out filter, convolution kernel and mode pruning in sequence according to the pruning granularity, and all the pruning methods are not independent but are carried out cooperatively. And the final model has higher hardware friendliness under the condition that the compression ratio is slightly superior to that of the current hybrid pruning result. Meanwhile, the invention judges whether the mode set generated in the current round can be finally adopted or not through the mode discrimination function, and proves the effectiveness of the evaluation function through the experiments on a plurality of mode sets, thereby ensuring that the accuracy of the model after mode pruning is higher than that of the current mode pruning method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a coarse-and-fine-granularity combined neural network pruning method provided by the invention;
FIG. 2 is a schematic diagram of pattern generation and pruning provided by the present invention;
FIG. 3 is a diagram illustrating a comparison of filter pruning, convolution kernel pruning and pattern pruning in accordance with the present invention;
FIG. 4 is a schematic diagram of filter pruning according to the present invention;
FIG. 5 is a schematic diagram of convolution kernel pruning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the embodiment of the invention discloses a coarse-and-fine-granularity combined neural network pruning method, which comprises the following steps:
s1, filter pruning:
and selecting candidate filters needing pruning in the original model layer by layer according to the two norms, carrying out group thinning training on the candidate filters, and pruning the candidate filters smaller than a threshold value after a certain number of rounds.
S2, convolution kernel pruning:
and sequencing the importance of the convolution kernels layer by layer according to the two norms, and obtaining the convolution kernels to be pruned layer by layer according to a pre-defined pruning rate.
S3, generating a pattern set:
and performing regularization compression on the convolution kernel by taking the weight as a unit, and dynamically generating a mode set meeting a pre-constructed mode discrimination function in the compression process.
S4, allocating a mode for each convolution kernel:
each convolution kernel is matched to the optimal pattern of that convolution kernel in the pattern set.
And S5, performing convolution kernel pruning and mode pruning.
S6, hard pruning: setting the parameters to be pruned to zero, and carrying out hard pruning on the model.
S7, fine tuning and retraining:
and (4) retraining and fine-tuning the model after the hard pruning by combining a knowledge distillation method to obtain a final model after the pruning.
The above steps are further described below:
s1, filter pruning
The process is shown in fig. 4. Because the pruning granularity is large, the regularization training of excessive parameters may cause excessive model pruning, and the precision is difficult to recover, so that candidate filters which may be pruned are found first, group thinning training is performed on the filters, and the filters smaller than the threshold are pruned actually after a certain number of rounds. Specifically, the convolution matrix is transformed into (input _ channel _ kernel _ size, output _ channel), and the sums of two norms in each column are sorted, as shown in formula (1). Finding p% is not important, and filters smaller than the threshold m are masked with 0 for the column.
Figure BDA0003299762070000061
Because the convolution layer after the filter pruning is still a dense matrix, any influence on subsequent links can not be caused, and the size of the convolution matrix can be reduced by the filter pruning, so that the subsequent compression speed is increased, and unimportant filters are removed in advance. Specifically, both the convolution matrix and the mask matrix are transformed into (input _ channel _ kernel _ size, output _ channel), and the two-norm is calculated for each column as described above, and columns smaller than the threshold of 0.1m are recorded as filters to be pruned. Traversing the neural network layer by layer, storing input and output channels to be pruned of each layer, regarding the l-th layer, the output channel to be discarded is the column index (except the last convolutional layer) corresponding to the pruning filter of the layer, and the input channel is the row index (except the first convolutional layer) corresponding to the pruning filter of the l-1 layer. The row and column to be pruned are discarded by the index _ select function.
S2 convolution kernel pruning
A schematic diagram of convolution kernel pruning is shown in fig. 5. Specifically, after the convolution matrix is transformed, the two-norm sum of convolution kernels is calculated one by one, and the importance of the kth convolution kernel of the fth filter of each layer is shown as formula (2). The parameters are sorted within the layers, the least important k convolution kernels are found according to a predefined good pruning rate, and the mask is set to 0.
Figure BDA0003299762070000071
S3, mode set generation
The embodiment of the invention sets the parameters of the model as follows: assuming that the number of convolution layers of the neural network is N, the number of input channels is K, the number of output channels is F, and the weight and offset of the nth layer are respectively represented as Wn、bn. For each layer of the f output channel and the k output channel, the corresponding convolution kernel is Zf,k,:,:The weight of the ith row and the jth column in the convolution kernel is Zf,k,i,j
The mode pruning is taken as unstructured pruning, has relatively good regularity, and can be optimized through a subsequent compiler, so that the neural network reasoning time is reduced. The mode set generation is divided into static generation and dynamic generation, wherein the static generation generates a mode before pruning, and the dynamic generation generates a mode in the pruning process.
For static mode generation, the traditional scheme is to select a maximum weight mode containing a central element, convolution is selected according to gradient mathematical derivation, strict theoretical derivation is achieved, but due to the fact that structured pruning and mode pruning are performed simultaneously, after a redundant filter is removed, model weight cannot be changed greatly, the model must be trained in advance and well distributed, and pruning speed is limited greatly.
For dynamic mode generation, the traditional method for generating a dynamic mode pool is to gradually generate candidate modes and finish the clipping of unimportant cores. Although the related concepts are proposed, the ideas are not formulated, and since pattern generation is very time consuming, patterns are actually pre-generated in the code.
The embodiment of the invention combines a static mode set generation method and a dynamic mode set generation method to construct a mode discrimination function so as to judge whether a mode set generated in the current round can be adopted or not, and the core of the problem lies in how to judge whether a mode set is a good mode set or not.
Previous studies on the importance of the convolution kernel eigenvalues have mainly proposed the following:
(1) the central weight in the 3 x 3 kernel is crucial.
(2) The important weights within the convolution kernel tend to be adjacent.
(3) The larger the absolute value of the element, the more important it tends to be.
It should be noted that there are some differences between the importance of elements in the convolution kernel and the pattern generation criteria, where the individual for judging the importance in the kernel is a convolution kernel, and the pattern generation is to classify the convolution kernel into a pattern by the idea of similar clustering, so that part of the convolution kernels cannot be matched with their corresponding best patterns. Therefore, the diversity of the convolution kernel is also an important criterion for measuring the quality of the pattern set.
Therefore, as shown in fig. 2, the present invention adopts a new pattern set generation method, and each pattern is set to have 4 non-null elements and the rest null values. And performing pattern generation on all convolution kernels of 3 x 3 in the neural network every r rounds, and setting the number of patterns to be generated as k, wherein the values of the hyper-parameter k are limited between 2, 4, 8 and 12 from the aspects of hardware friendliness and practical effect.
The method specifically comprises the following steps:
s31, traversing each convolution kernel of the model, and selecting 4 positions with the maximum weight absolute value to form an optimal mode of the convolution kernel;
s32, globally counting the occurrence frequency of each optimal mode, and selecting k modes with the highest frequency as a mode set generated by the current round;
and S33, judging the weight adjacency, the shape number and the current model convergence of the pattern set by using a pattern discriminant function to evaluate whether the pattern set generated in the current round can be used as a final pattern set.
The embodiment of the invention provides three judgment conditions, and if and only if the conditions are all met, the mode set generation process is ended. The conditions were as follows:
(1) the elements within a single pattern set are all adjacent.
(2) The mode set shape is less than or equal to k/2.
(3) The current gradient mean is less than the threshold q.
The specific judgment process is as follows:
(1) for the judgment of element adjacency, each non-zero element position (i, j) of the traversal pattern judges whether four adjacent positions (i +1, j), (i-1, j), (i, j +1) and (i, j-1) are inside a convolution kernel or not, and if the positions are inside the convolution kernel, whether the positions are non-zero or not is judged. If the non-zero element at the neighbor position of any element does not exist, all the mode elements are not adjacent, and if the condition does not exist after the traversal is completed, all the mode elements are adjacent.
(2) For the determination of the number of pattern set shapes, since the adjacency checking is performed first, it is not necessary to consider non-adjacent pattern combinations. All modes have C9 4126 cases, where all neighbors are only 36 cases, the invention therefore makes an enumeration decision. The convolution kernel weight indexes are numbered in advance, and the number corresponding to the jth weight in the ith row is i × kernel _ size + j. And combining the corresponding pattern shapes by using dictionary recording positions, wherein keys of the dictionary are the number splicing from small to large of non-zero weight. Since the convolution kernel processed by the method is 3 x 3, the number is 9 at most, when the numbers are spliced together, no ambiguity occurs, for example, keys corresponding to four positions of 1, 3, 5 and 9 are '1359', if the size of the convolution kernel is increased, other coding modes need to be considered, and the value of the dictionary is the mode shape number. Traversing each mode, inquiring the number corresponding to the shape of each mode, recording whether each number exists by using a set, if not, adding 1 to the shape number of the mode, and adding the number to the set.
(3) For the judgment of the average gradient, the gradient of each position is obtained through the grad attribute of the tensor, and the mean value is obtained through the mean function.
S4, allocating a mode for each convolution kernel:
assuming the pattern set as S, the optimization objective is shown in equation (3). And because the modes of each convolution kernel are independent, traversing the mode set one by one, calculating the sum of the weight absolute values of the convolution kernels on the modes, and selecting the maximum sum index as the mode of the convolution kernel. After the allocation is complete, the pattern set shape of the convolution kernel will not change.
Figure BDA0003299762070000091
S5 convolution kernel pruning and mode pruning
The invention performs regularization training on weights that may be pruned in stages. For each pruning method, a 0-1 mask matrix is used for controlling whether the weight is reserved, the weight mask of the pruned tree is 0, and the reservation is 1. WkCorresponding filter, convolutionThe mask matrix of the core and the mode pruning are respectively Ik_f、Ik_k、Ik_pThe shape is the same as the convolution matrix. Setting the regularization coefficients corresponding to the filter, the convolution kernel and the mode pruning stage as lambda respectivelyf、λk、λpThe loss function is shown in equation (4).
Figure BDA0003299762070000092
The difference between the predicted value and the actual value of the model is carved by adopting cross entropy, the number of output categories of the neural network is set to be n, the distribution of the actual value output by the neural network is y _ true, and the predicted value is y _ pred, as shown in formula (5).
Figure BDA0003299762070000101
For the optimizer, the invention uses SGD random gradient descent as an iterative method for optimizing differentiable objective functions, iteratively updates weights and bias terms by calculating the gradient of the loss function over a small batch of data, and performs compressed pruning using a learning rate decay strategy MultiStepLR.
When the compression stage starts, the mask matrix is initialized to all 1, and the negation is taken as 0, and the loss function is the same as that of the common training stage. The values of the three mask matrices are then changed in stages, which controls the compression granularity of the model. The schematic diagrams of the three pruning methods are shown in fig. 3, wherein the filter pruning granularity is maximum, the convolution kernel pruning is centered, and the mode pruning is minimum. Since pruning should be a coarse to fine process, the model is first thinned out and pruned, and then pruned with convolution kernel and mode granularity.
S6 hard pruning
And (3) carrying out actual convolution kernel internal cutting on the model by hard pruning, wherein the parameters to be cut are very close to 0, traversing each layer of weight parameters, updating the model parameters by multiplying the convolution matrix and the mask matrix, so that the weight to be cut is 0, and the updated output is Wk', as shown in a formula (6).
Wk'=Wk⊙Ik_k⊙Ik_p (6)。
S7, fine tuning retraining
In the embodiment of the invention, after the model is subjected to hard pruning, the model is retrained. And (4) further fine-tuning the generated model by adopting a knowledge distillation method. The vanilla knowledge distillation is a classical precision recovery method in the field of image classification, and a large teacher model is distilled into a small student model in a set mode without great precision loss. At this block, similar logit outputs for the original network and the pruned network are forced using the distillation loss function, as shown in equation (7). In the present invention, the fixed equilibrium coefficient α is 0.4 and the temperature T is 4.
Figure BDA0003299762070000102
In order to further prove the superiority of the pruning method, the invention carries out the following experimental verification:
in the invention, cifar10 and cifar100 data sets, vgg16, resnet32 and resnet50 are selected as networks to be pruned, and experiments are carried out on a general GPU platform. The contents of the processing of each module and the preliminary results are as follows.
Experiment one: and performing multiple groups of comparison experiments, researching whether a conclusion obtained by the study on the importance of the convolution kernel parameters is suitable for model pruning, setting an experimental group and a control group, and measuring whether a mode set is good or not by using the accuracy of a pruned model, wherein the specific experiment design and results are shown in table 1.
TABLE 1 comparison of different modes of pruning methods
Figure BDA0003299762070000111
Experiment two: the test of single pruning with the number of modes of 2, 4, 8 and 12 is carried out, and the accuracy rate under the same number of modes is compared with other existing achievements, and the result proves that the invention is superior to the existing achievements.
Experiment three: aiming at a plurality of indexes of the mode evaluation function, the invention verifies the correctness of the mode evaluation function in the compression process, namely, the model can gradually meet the judgment condition of the mode evaluation function under the condition of good training. And (3) respectively drawing a gradient, an adjacent element pattern number and a pattern set shape number change trend graph by taking the number of compression wheels as a horizontal axis so as to verify the feasibility and the accuracy of the evaluation condition.
Experiment four: and limiting the number of the modes to be 4, carrying out structural and unstructured pruning on the model, comparing the compression ratio and the accuracy of the model parameters, and taking other mixed pruning results as comparison objects. According to the observation of the current results, the invention has higher hardware friendliness under the condition of being slightly superior to the current research.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A thickness-granularity combined neural network pruning method is characterized by comprising the following steps:
selecting candidate filters needing pruning in an original model layer by layer according to the two norms, carrying out group thinning training on the candidate filters, and pruning the candidate filters smaller than a threshold value after a certain number of rounds;
sequencing the importance of the convolution kernels layer by layer according to the two norms, and obtaining the convolution kernels to be pruned layer by layer according to a pre-defined pruning rate;
performing regularization compression on the convolution kernel by taking the weight as a unit, and dynamically generating a mode set meeting a pre-constructed mode discrimination function in the compression process;
matching each convolution kernel to an optimal pattern of the convolution kernel in the pattern set;
carrying out convolution kernel pruning and mode pruning;
setting parameters to be pruned to zero, and carrying out hard pruning on the model;
and (4) retraining and fine-tuning the model after the hard pruning by combining a knowledge distillation method to obtain a final model after the pruning.
2. The method for pruning a coarse-and-fine-granularity combined neural network as claimed in claim 1, wherein the generation process of the pattern set is as follows:
traversing each convolution kernel of the model, and selecting 4 positions with the maximum weight absolute value to form an optimal mode of the convolution kernel;
globally counting the occurrence frequency of each optimal mode, and selecting k modes with the highest frequency as a mode set generated by the current round;
and judging the weight adjacency, the shape quantity and the current model convergence of the pattern set by using a pattern discriminant function so as to evaluate whether the pattern set generated in the current round can be used as a final pattern set.
3. The method for pruning a coarse-and-fine-grained neural network as claimed in claim 2, wherein the condition for judging the adjacency of the weights in the pattern set is as follows: the elements within a single pattern set are all adjacent; the method specifically comprises the following steps:
traversing each non-zero element position (i, j) of the pattern set, and judging whether four adjacent positions (i +1, j), (i-1, j), (i, j +1) and (i, j-1) are in a convolution kernel or not; if the position is in the convolution kernel, judging whether the position is nonzero; if the non-zero element at the neighbor position of any element does not exist, all the mode elements are not adjacent, and if the condition does not exist after the traversal is completed, all the mode elements are adjacent.
4. The method for pruning a coarse-and-fine-grained neural network as claimed in claim 2, wherein the judgment conditions of the number of pattern set shapes are as follows: the shape of the mode set is less than or equal to k/2; the method specifically comprises the following steps:
numbering the convolution kernel weight indexes, wherein the number corresponding to the jth weight of the ith row is i × kernel _ size + j;
combining corresponding pattern set shapes by using dictionary record positions, and splicing the keys of the dictionaries by numbers with nonzero weights from small to large;
traversing each mode set, inquiring the number corresponding to the shape of the mode set, recording whether each number exists by using one set, if not, adding 1 to the shape number of the mode set, and adding the number into the set.
5. The method for pruning a coarse-and-fine-granularity combined neural network as claimed in claim 2, wherein the condition for judging the convergence of the current model is as follows: the current gradient mean value is smaller than a threshold value q; the method specifically comprises the following steps:
and acquiring the gradient of each position through the grad attribute of the tensor, and solving the mean value through a mean function.
6. The method for pruning a coarse-and-fine-granularity combined neural network as claimed in claim 1, wherein the pruning the candidate filters smaller than the threshold comprises:
transforming the convolution matrix and the mask matrix into the same matrix, calculating two norms for each column, recording the columns smaller than the threshold value of 0.1m as a filter to be pruned;
traversing the model layer by layer, storing input and output channels to be pruned of each layer, regarding the l-th layer, the output channel to be discarded is a column index corresponding to the pruning filter of the layer, and the input channel is a row index corresponding to the pruning filter of the l-1-th layer;
the row and column to be pruned are discarded by the index _ select function.
7. The method for pruning a coarse-and-fine-grained neural network as claimed in claim 1, wherein the convolution kernels to be pruned are obtained by:
transforming the convolution matrix, and calculating the two-norm sum of convolution kernels one by one;
sorting all the convolution kernels of each filter of each layer according to a two-norm sum;
the least important k convolution kernels are found according to the predefined good pruning rate, and the mask is set to 0 as the convolution kernel to be pruned.
8. The method for pruning a coarse-and-fine-granularity combined neural network as claimed in claim 1, wherein the matching of each convolution kernel to the optimal pattern of the convolution kernel in the pattern set comprises:
traversing the final mode sets one by one, calculating the sum of weight absolute values of the convolution kernel on each mode set, and selecting the maximum sum index as the mode set of the convolution kernel; after the allocation is complete, the pattern set shape of the convolution kernel is not changed.
9. The method as claimed in claim 1, wherein regularization training is performed on weights that may be pruned in a filter pruning stage, a convolution kernel pruning stage and a mode pruning stage, respectively, and differences between predicted values and true values of a cross entropy carving model are employed.
10. The method of claim 9, wherein during the regularized training of the weights that can be pruned in stages, the objective function is optimized by using an iterative method of SGD stochastic gradient descent, the weights and bias terms are updated iteratively by calculating the gradient of the loss function over a small batch of data, and compressed pruning is performed by using a learning rate decay strategy multistep lr.
CN202111187212.1A 2021-10-12 2021-10-12 Coarse and fine granularity combined neural network pruning method Pending CN113850385A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111187212.1A CN113850385A (en) 2021-10-12 2021-10-12 Coarse and fine granularity combined neural network pruning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111187212.1A CN113850385A (en) 2021-10-12 2021-10-12 Coarse and fine granularity combined neural network pruning method

Publications (1)

Publication Number Publication Date
CN113850385A true CN113850385A (en) 2021-12-28

Family

ID=78977981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111187212.1A Pending CN113850385A (en) 2021-10-12 2021-10-12 Coarse and fine granularity combined neural network pruning method

Country Status (1)

Country Link
CN (1) CN113850385A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580636A (en) * 2022-05-06 2022-06-03 江苏省现代企业信息化应用支撑软件工程技术研发中心 Neural network lightweight deployment method based on three-target joint optimization
CN114937186A (en) * 2022-06-14 2022-08-23 厦门大学 Neural network data-free quantification method based on heterogeneous generated data
CN115496207A (en) * 2022-11-08 2022-12-20 荣耀终端有限公司 Neural network model compression method, device and system
CN117689001A (en) * 2024-02-02 2024-03-12 中科方寸知微(南京)科技有限公司 Neural network multi-granularity pruning compression method and system based on zero data search
CN117910536A (en) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 Text generation method, and model gradient pruning method, device, equipment and medium thereof
CN114937186B (en) * 2022-06-14 2024-06-07 厦门大学 Neural network data-free quantization method based on heterogeneous generated data

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580636A (en) * 2022-05-06 2022-06-03 江苏省现代企业信息化应用支撑软件工程技术研发中心 Neural network lightweight deployment method based on three-target joint optimization
CN114580636B (en) * 2022-05-06 2022-09-16 江苏省现代企业信息化应用支撑软件工程技术研发中心 Neural network lightweight deployment method based on three-target joint optimization
CN114937186A (en) * 2022-06-14 2022-08-23 厦门大学 Neural network data-free quantification method based on heterogeneous generated data
CN114937186B (en) * 2022-06-14 2024-06-07 厦门大学 Neural network data-free quantization method based on heterogeneous generated data
CN115496207A (en) * 2022-11-08 2022-12-20 荣耀终端有限公司 Neural network model compression method, device and system
CN115496207B (en) * 2022-11-08 2023-09-26 荣耀终端有限公司 Neural network model compression method, device and system
CN117689001A (en) * 2024-02-02 2024-03-12 中科方寸知微(南京)科技有限公司 Neural network multi-granularity pruning compression method and system based on zero data search
CN117689001B (en) * 2024-02-02 2024-05-07 中科方寸知微(南京)科技有限公司 Neural network multi-granularity pruning compression method and system based on zero data search
CN117910536A (en) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 Text generation method, and model gradient pruning method, device, equipment and medium thereof
CN117910536B (en) * 2024-03-19 2024-06-07 浪潮电子信息产业股份有限公司 Text generation method, and model gradient pruning method, device, equipment and medium thereof

Similar Documents

Publication Publication Date Title
CN113850385A (en) Coarse and fine granularity combined neural network pruning method
US6941289B2 (en) Hybrid neural network generation system and method
CN111275172B (en) Feedforward neural network structure searching method based on search space optimization
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN111079899A (en) Neural network model compression method, system, device and medium
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN111723915B (en) Target detection method based on deep convolutional neural network
JP6950756B2 (en) Neural network rank optimizer and optimization method
CN109740734B (en) Image classification method of convolutional neural network by optimizing spatial arrangement of neurons
CN115689070B (en) Energy prediction method for optimizing BP neural network model based on monarch butterfly algorithm
CN110569883A (en) Air quality index prediction method based on Kohonen network clustering and Relieff feature selection
CN115481727A (en) Intention recognition neural network generation and optimization method based on evolutionary computation
CN114936518A (en) Method for solving design parameters of tension/compression spring
Phan et al. Efficiency enhancement of evolutionary neural architecture search via training-free initialization
CN111967528B (en) Image recognition method for deep learning network structure search based on sparse coding
Du et al. CGaP: Continuous growth and pruning for efficient deep learning
Rong et al. Soft Taylor pruning for accelerating deep convolutional neural networks
Yamada et al. Weight Features for Predicting Future Model Performance of Deep Neural Networks.
CN114781639A (en) Depth model compression method for multilayer shared codebook vector quantization of edge equipment
Li et al. Incremental filter pruning via random walk for accelerating deep convolutional neural networks
CN110909027B (en) Hash retrieval method
CN114239826A (en) Neural network pruning method, medium and electronic device
Shobeiri et al. Shapley value in convolutional neural networks (CNNs): A Comparative Study
JPH0561848A (en) Device and method for selecting and executing optimum algorithm
CN108932550B (en) Method for classifying images based on fuzzy dense sparse dense algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination