CN113837378A - Convolutional neural network compression method based on agent model and gradient optimization - Google Patents

Convolutional neural network compression method based on agent model and gradient optimization Download PDF

Info

Publication number
CN113837378A
CN113837378A CN202111039434.9A CN202111039434A CN113837378A CN 113837378 A CN113837378 A CN 113837378A CN 202111039434 A CN202111039434 A CN 202111039434A CN 113837378 A CN113837378 A CN 113837378A
Authority
CN
China
Prior art keywords
network
model
pruning
micro
accuracy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111039434.9A
Other languages
Chinese (zh)
Inventor
刘德荣
李佳鑫
王永华
赵博
饶煊
吴球业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111039434.9A priority Critical patent/CN113837378A/en
Publication of CN113837378A publication Critical patent/CN113837378A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Aiming at the limitations of the prior art, the invention provides a convolutional neural network compression method based on a proxy model and gradient optimization, and the method uses the proxy model to predict the network accuracy of the corresponding structure without training the network of each structure, thereby greatly saving the search time; secondly, the method uses differentiable structural parameters to generate the pruning rate, directly predicts the network accuracy through the proxy model, constructs the direct relation between the network accuracy and the structural parameters, can directly train the pruning rate and realizes rapid and automatic pruning; the method takes global information into consideration, so that the optimal sub-network structure can be searched; after the target network is pruned according to the method, the parameters and the calculated amount of the network can be effectively compressed, the accuracy of the target network is kept to be reduced slightly or almost unchanged, and the pruned network is friendly to hardware and can be deployed on a plurality of network platforms.

Description

Convolutional neural network compression method based on agent model and gradient optimization
Technical Field
The invention relates to the technical field of deep learning, in particular to a pruning technology of a convolutional neural network in model compression, and more particularly relates to a convolutional neural network compression method based on a proxy model and gradient optimization.
Background
In recent years, deep neural networks have achieved great success in the fields of image classification, object detection, semantic segmentation, and the like. However, the huge network model scale is difficult to deploy on most resource-limited devices, which greatly limits the application of artificial intelligence. Therefore, many methods of compressing the model have been proposed.
Neural network pruning is a common method for model compression, and is disclosed in chinese patent application publication No. CN111931914A, with publication No. 2020.11.13: a convolutional neural network channel pruning method based on model fine tuning is disclosed, wherein neural network pruning is to prune redundant parameters in a neural network by a certain method so as to reduce the size and the calculated amount of a model and ensure that the accuracy of the pruned network is reduced to a small extent.
In the prior art, some methods are based on artificially designing network pruning rules and pruning rates, and the pruning rate of each layer can be determined only by testing a layer-by-layer network, so that a large amount of time is spent on the experiment; moreover, because each layer of network is independently tested and pruned, the integrity of the network is not concerned, and the interdependence relation among the networks is neglected, the obtained structure is not optimal. In some automatic pruning methods, the search space is discrete, gradient optimization can be performed only by performing relaxation serialization on the discrete space, or heuristic search is performed by using an evolutionary algorithm, a simulated annealing method and the like, and the obtained structure may not be optimal, or a large amount of time cost is required to find the optimal structure. In addition, many existing methods require a certain training round for each possible structure to select a better structure when searching the structure, which also often takes a lot of time to train and evaluate. Thus, the prior art has certain limitations.
Disclosure of Invention
Aiming at the limitation of the prior art, the invention provides a convolutional neural network compression method based on a proxy model and gradient optimization, and the technical scheme adopted by the invention is as follows:
a convolutional neural network compression method based on a proxy model and gradient optimization comprises the following stages:
an initialization stage:
acquiring a proxy model, a target network to be compressed and a target compression ratio of the target network, and initializing a micro-structure parameter and an experience pool; entering a model preheating stage after initialization is completed;
a model preheating stage:
performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples in the experience pool has accumulated to exceed a preset batch size, the following steps are also performed in each round of the model warm-up phase: extracting batch-sized samples from the experience pool to train the proxy model; when the number of executed rounds exceeds the preset number of model preheating rounds, ending the model preheating stage and entering a searching stage;
a searching stage:
when the number of rounds executed does not reach a preset maximum number of running rounds, performing the following steps in each round of the search phase: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of executed rounds does not reach the preset number of search rounds, the following steps are also executed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds reaches the maximum number of running rounds, ending the searching stage and entering a network pruning stage;
pruning and recovering stages:
generating a structure vector from the micro-structurable parameter; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein, the maximum operation round number is larger than the search round number and the model preheating round number.
Compared with the prior art, the method and the device have the advantages that the network accuracy of the corresponding structure is predicted by using the proxy model, and the network of each structure does not need to be trained, so that the searching time is greatly saved; secondly, the method uses differentiable structural parameters to generate the pruning rate, directly predicts the network accuracy through the proxy model, constructs the direct relation between the network accuracy and the structural parameters, can directly train the pruning rate and realizes rapid and automatic pruning; the method takes global information into consideration, so that the optimal sub-network structure can be searched; after the target network is pruned according to the method, the parameters and the calculated amount of the network can be effectively compressed, the accuracy of the target network is kept to be reduced slightly or almost unchanged, and the pruned network is friendly to hardware and can be deployed on a plurality of network platforms.
Preferably, the network accuracy assessment value is obtained by:
and pruning the target network in a mode of setting the output of the filter channel to be 0 according to the structural vector, and evaluating the pruning result to obtain the network accuracy evaluation value of the structural vector.
Preferably, a Gaussian distribution N (mu, sigma) is used in the model preheating stage2) Randomly generating a structure vector a ═ a1,a2,...,an) Where μ ═ ratio _ target, σ ═ 1, and ratio _ target is a target compression rate of the target network, i.e., a pruning rate of each layer of the target network.
As a preferred scheme, the agent model is a multilayer perceptron, the agent model is trained by a stochastic gradient descent method, and a loss function of the agent model is a root-mean-square loss:
Figure BDA0003248545890000031
SAccjis a structure vector ajCorresponding network accuracy prediction value, AccjIs a structure vector ajThe corresponding network accuracy assessment value, N, indicates the batch size.
As a preferred embodiment, in the search phase the microstructurable parameter a ═ a (a) is determined from the following formula1,A2,...,An) Generating a structure vector a:
ai=sigmoid(Ai)*(1-min_a)+min_a,i=1,2,...,n;
where min _ a represents the minimum pruning rate, min _ a ∈ [0,1 ].
As a preferable scheme, the micro-structurable parameter is updated by a random gradient descent method, and a loss function used in the process of updating the micro-structurable parameter is as follows:
loss=a_loss+γ*f_loss;
wherein: a _ loss is-SAcc, and SAcc is a predicted value of network accuracy; f _ loss is a floating point operand; gamma is a penalty coefficient.
As a preferred solution, the sub-networks obtained by pruning in the pruning and recovery phase are trained to recover accuracy using a knowledge distillation method.
The present invention also provides the following:
a convolutional neural network compression system based on proxy model and gradient optimization, comprising:
an initialization module:
the method comprises the steps of obtaining a proxy model, a target network to be compressed and a target compression ratio of the target network, and initializing a micro-structure parameter and an experience pool; entering a model preheating stage after initialization is completed;
a model preheating module:
for performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples in the experience pool has accumulated to exceed a preset batch size, the following steps are also performed in each round of the model warm-up phase: extracting batch-sized samples from the experience pool to train the proxy model; when the number of executed rounds exceeds the preset number of model preheating rounds, ending the model preheating stage and entering a searching stage;
a search module:
for performing the following steps in each round of the search phase when the number of rounds performed has not reached a preset maximum number of run rounds: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of executed rounds does not reach the preset number of search rounds, the following steps are also executed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds reaches the maximum number of running rounds, ending the searching stage and entering a network pruning stage;
pruning and recovering module:
for generating a structure vector from the micro-structurable parameters; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein, the maximum operation round number is larger than the search round number and the model preheating round number.
A medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the aforementioned convolutional neural network compression method based on a proxy model and gradient optimization.
A computer device comprising a medium, a processor and a computer program stored in the medium and executable by the processor, the computer program when executed by the processor implementing the steps of the aforementioned convolutional neural network compression method based on a proxy model and gradient optimization.
Drawings
Fig. 1 is a schematic diagram illustrating a progressive stage of a convolutional neural network compression method based on a proxy model and gradient optimization according to embodiment 1 of the present invention;
fig. 2 is a schematic circular logic diagram of a convolutional neural network compression method based on a proxy model and gradient optimization according to embodiment 1 of the present invention;
fig. 3 is a schematic diagram of a training principle provided in embodiment 1 of the present invention;
fig. 4 is a fitting effect diagram of the proxy model in embodiment 2 of the present invention;
FIG. 5 is a schematic diagram of the model accuracy and the variation trend of the proxy model prediction caused by the optimization of the structural parameters in the search process in embodiment 2 of the present invention;
fig. 6 is a schematic diagram of a convolutional neural network compression system based on a proxy model and gradient optimization according to embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the embodiments described are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the embodiments in the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the present application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims. In the description of the present application, it is to be understood that the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not necessarily used to describe a particular order or sequence, nor are they to be construed as indicating or implying relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
Further, in the description of the present application, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The invention is further illustrated below with reference to the figures and examples.
In order to solve the limitation of the prior art, the present embodiment provides a technical solution, and the technical solution of the present invention is further described below with reference to the accompanying drawings and embodiments.
Example 1
Referring to fig. 1 and fig. 2, a convolutional neural network compression method based on a proxy model and gradient optimization includes the following stages:
s1, initialization stage:
acquiring a proxy model S, a target network M to be compressed and a target compression ratio _ target of the target network M, and initializing a micro-structure parameter A and an experience pool Relay; entering a model preheating stage after initialization is completed;
s2, model preheating stage:
performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples len (relaybuffer) of the experience pool has accumulated to exceed a preset batch size batch _ size, the following steps are also performed in each round of the model warm-up phase: extracting samples of batch size batch _ size from the experience pool RelayBuffer to train the proxy model; when the number of executed rounds epoch exceeds the preset number of model preheating rounds epoch _ warp, ending the model preheating stage and entering a searching stage;
s3, search stage:
when the number of rounds executed epoch does not reach a preset maximum number of running rounds max _ epoch, performing the following steps in each round of the search phase: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of rounds epoch performed does not reach a preset number of search rounds epoch _ S, the following steps are also performed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds epoch reaches the maximum number of running rounds max _ epoch, ending the searching stage and entering a network pruning stage;
s4, pruning and recovery stage:
generating a structure vector from the micro-structurable parameter; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein the maximum number of running rounds max _ epoch > the number of searching rounds max _ epoch > the number of model preheating rounds epoch _ warmup.
Compared with the prior art, the method and the device have the advantages that the network accuracy of the corresponding structure is predicted by using the proxy model, and the network of each structure does not need to be trained, so that the searching time is greatly saved; secondly, the method uses differentiable structural parameters to generate the pruning rate, directly predicts the network accuracy through the proxy model, constructs the direct relation between the network accuracy and the structural parameters, can directly train the pruning rate and realizes rapid and automatic pruning; the method takes global information into consideration, so that the optimal sub-network structure can be searched; after the target network is pruned according to the method, the parameters and the calculated amount of the network can be effectively compressed, the accuracy of the target network is kept to be reduced slightly or almost unchanged, and the pruned network is friendly to hardware and can be deployed on a plurality of network platforms.
Specifically, the invention is mainly applicable to neural networks containing convolutional layers, such as convolutional neural networks like VGG, ResNet, MobileNet and the like. The scheme prunes convolutional layers in the convolutional neural network, and achieves the purpose of compressing the network by pruning channels or filters in the convolutional layers.
The effect of the proxy model is to quickly evaluate the searched sub-networks in the search phase, which saves time and cost greatly compared with directly training and evaluating the sub-networks. And secondly, a mapping relation between the network structure and accuracy is constructed, so that the structure vector can be optimized by directly using a random gradient descent method, and a better structure can be searched.
The input of the proxy model is a structure vector which consists of the pruning rate of each layer of the network. Evaluating the network accuracy of a network generally refers to two accuracies: top1 and Top5, Top1 means the accuracy of the first-ranked category, namely the real category, and can represent the real performance of the network; top5 is the accuracy of the Top5 ranked category. In an alternative embodiment, the network accuracy prediction value output by the proxy model is a prediction value of the accuracy of the network of the sub-network Top1 on the assumption that the target network is pruned according to the input structure vector.
In particular, it should be noted that the pruned network is not trained in the model pre-heating stage, but is directly predicted, and therefore, the accuracy obtained is low. But the inventors at this stage focused on the relative high and low performance of the different structures.
As a preferred embodiment, the network accuracy evaluation value is obtained by:
and pruning the target network in a mode of setting the output of the filter channel to be 0 according to the structural vector, and evaluating the pruning result to obtain the network accuracy evaluation value of the structural vector.
Specifically, the step of pruning the target network in the embodiment of the present invention is as follows:
calculating the L1 norm of each filter of each layer of the target network as the importance of each filter; ranking each filter by importance, the lower ranked filters being considered unimportant; calculating the number k of filters to be cut according to the structure vector a and the number out _ Channels of each layer, and selecting k filters with lower sequence; pruning the selected k filters.
In the pruning process aiming at obtaining the network accuracy evaluation value of the structure vector, the invention does not directly delete the filter, but makes the output of the corresponding channel be 0, and eliminates the influence on the output, namely, the channel is not available and is cut. The method can achieve the equivalent effect of deleting the filter, can also keep the original structure of the target network, is convenient for pruning different network structures in each round, and is more flexible and efficient.
As a preferred embodiment, a Gaussian distribution N (μ, σ) is used during the model warm-up phase2) Randomly generating a structure vector a ═ a1,a2,...,an) Where μ ═ ratio _ target, σ ═ 1, and ratio _ target is a target compression rate of the target network, i.e., a pruning rate of each layer of the target network.
As a preferred embodiment, the agent model is a multilayer perceptron (MLP), the agent model is trained by a stochastic gradient descent method, and the loss function of the agent model is root-mean-square loss:
Figure BDA0003248545890000091
SAccjis a structure vector ajCorresponding network accuracy prediction value, AccjIs a structure vector ajThe corresponding network accuracy assessment value, N, indicates the batch size.
In particular, the use of a loss function of root mean square loss allows the proxy model to predict the accuracy of the target network.
As a preferred embodiment, in the searchIn a stage, the microstructurable parameter A ═ A (A) is determined by the following formula1,A2,...,An) Generating a structure vector a:
ai=sigmoid(Ai)*(1-min_a)+min_a,i=1,2,...,n;
where min _ a represents the minimum pruning rate, min _ a ∈ [0,1 ].
Specifically, a can be guaranteed by the above formulaiAt [ min _ a,1]And for AiIs continuously conductive.
The invention aims to search the structural parameters which enable the accuracy of the pruned target network to be the highest, so that when the proxy model can predict the accuracy of the target network, the accuracy of the target network is the highest, namely the predicted value SAcc of the accuracy of the output network of the proxy model is the largest. Therefore, the Loss of the updated structure parameter a can be designed as:
a_loss=-SAcc;
because different devices can accept different model sizes, in order to compress the network to a target size, the invention increases the constraint on the size of the network model when updating the structure parameters, and Floating Point of Operations (FLOPs) are used as indexes for representing the size of the model. The FLOPs are calculated as follows, wherein the FLOPs calculation formula of each convolutional layer is as follows:
flops_conv=k*k*I*O*W*H;
wherein k is kernel _ size of the convolution layer, I is the number of input channels, O is the number of output channels, and W is the size of the output characteristic diagram;
and the FLOPs calculation formula of the full connection layer is as follows:
flops_fc=I*O;
wherein, I is the number of input channels, and O is the number of output channels.
Respectively calculating the FLOPs of all layers and then adding the FLOPs to obtain the FLOPs of the whole network; the constraint function for FLOPs is obtained as follows:
Figure BDA0003248545890000101
wherein max _ hops is the FLOPs of the target network without pruning, and flop _ target is the FLOPs of the desired pruned network.
Therefore, as a preferred embodiment, the micro-structurable parameter is updated by a random gradient descent method, and the loss function used in updating the micro-structurable parameter is:
loss=a_loss+γ*f_loss;
wherein: a _ loss is-SAcc, and SAcc is a predicted value of network accuracy; f _ loss is a floating point operand; gamma is a penalty coefficient.
Specifically, SAcc is the output of the proxy model corresponding to the structure vector a, so SAcc is derivable for the structure vector a, i.e. for the structure parameter a. As can be seen from the calculation method of the FLOPs, the FLOPs are related to the output channels of the network layer, and the output channels after pruning are O' ═ O × aiThus, FLOPs are also conductive to the structure vector a, i.e., to the structure parameter A. Therefore, both parts of the Loss function Loss are derivable for the structure parameter a, so the structure parameter a can be optimized using a gradient descent method according to Loss.
Referring to fig. 3, it can be seen that the gradient computation of the loss function to the structural parameter is complete when the gradient propagates backwards, with no truncation in the middle. SAcc is the output of the proxy model corresponding to the structure vector a, so SAcc is derivable for the structure vector a, i.e. for the structure parameter A. As can be seen from the calculation method of the FLOPs, the FLOPs are related to the output channels of the network layer, and the output channels after pruning are O' ═ O × aiThus, FLOPs are also conductive to the structure vector a, i.e., to the structure parameter A. Therefore, both parts of the Loss function Loss are derivable for the structure parameter a, so the structure parameter a can be optimized using a gradient descent method according to Loss.
As a preferred solution, the sub-networks obtained by pruning in the pruning and recovery phase are trained to recover accuracy using a knowledge distillation method.
Specifically, knowledge distillation can migrate the knowledge of a teacher network (a large network) into a student network (a small network) to improve the efficiency of small network training, so that the accuracy of the small network can approach or even recover to the accuracy of the large network. Thus, knowledge distillation can be used to train the compressed network to improve the accuracy of the network.
The embodiment of the invention takes the target network as a teacher network, takes the searched sub-networks as student networks, and trains the student networks by using a knowledge distillation method, so that the accuracy of the student networks can be close to that of the teacher network, and the accuracy of the networks before pruning can be recovered.
Example 2
This embodiment is a further description made by combining specific parameter settings with a more specific example on the basis of embodiment 1, wherein:
the target network is ResNet20, which has been trained on 305 epochs on the public data set CIFAR10, and finally the target network has a Top1 accuracy of 92.44%.
The agent model is a multilayer perceptron (MLP), namely a neural network consisting of 3 layers of full connection layers, and the network structure is as follows: FC (n _ layers,256), ReLU (), FC (256,1), where n _ layers is the number of network layers that need to be pruned. The input to the proxy model is the structure vector, i.e., the pruning rate of each layer of the network. The target network in this embodiment is a residual network ResNet20, which is a total of 20 layer network layers, including a convolutional input layer, 9 residual blocks, and a fully connected output layer. Wherein each residual block comprises two convolutional layers and a primary residual connecting layer. In order to keep the output channels at both ends of the residual connection consistent, in this embodiment, only the first layer of the residual block is pruned, so that there are 9 convolutional layers to be pruned, i.e. n _ layers is 9.
A target compression ratio _ target of the target network is 0.5; since a total of 9 convolutional layers need to be pruned, the dimension of a is 9, i.e., n is 9.
In the present embodiment, the data set used to assess the accuracy of the network is the public data set CIFAR 10. To quickly evaluate the accuracy of the network, in this case, only a portion (1/10) of the training set is used to evaluate the accuracy of the network, resulting in a network accuracy evaluation.
The batch size 128.
In the present embodiment, the proxy model is optimized using a Stochastic Gradient Descent (SGD) method, and the initial learning rate lr is set to 0.1, the weight attenuation _ decay is set to 0.0005, and the momentum coefficient momentum is set to 0.9. And the learning rate is updated using a cosine annealing method.
The number of model warm-up rounds, epoch _ warmup, is 250.
Specifically, if the proxy model is tested after the training of the model preheating stage is completed, all (a, Acc) combinations are extracted from the experience pool, then a is input into the proxy model to obtain corresponding SAcc, and all the SAcc and Acc are marked on the graph to obtain a fitting effect graph of the proxy model, as can be seen from fig. 4, the proxy model can effectively fit the target network.
In the generation of the structure vector a from the structure parameter a, the minimum pruning rate min _ a is 0.2.
The constraint function for FLOPs is:
Figure BDA0003248545890000121
in this embodiment, as the FLOPs, max _ FLOPs, when resenet 20 does not prune is 40.81M, and the target compression ratio _ target is 0.5, the FLOPs _ target, which is the FLOPs of the desired pruned network, is 20.41M.
For the loss function of the updated structure parameter a:
loss=a_loss+γ*f_loss;
gamma is the penalty coefficient of f _ Loss. In this case, γ is 10.
In this case, when the structural parameters are optimized by using the random gradient descent (SGD) method, the initial learning rate lr is set to 0.01, the weight attenuation _ decay is set to 0.0005, and the momentum coefficient momentum is set to 0.9, and the learning rate is updated by using the cosine annealing method.
Referring to fig. 5, it can be seen from the accuracy of the model and the variation trend of the proxy model prediction caused by the optimization of the structural parameters in the search process that the optimization of the structural parameters is gradually converged.
In this embodiment, the search round number epoch _ S is 350, and the maximum operation round number max _ epoch is 400.
In the pruning and recovery, 700 epochs of training are performed on the pruned subnetworks in this case in order to achieve sufficient training of the subnetworks. The final results are given in table 1 below:
TABLE 1 comparison of model size and accuracy before and after pruning
Figure BDA0003248545890000122
As can be seen from Table 1, the method provided by the invention can effectively compress the target network to the specified compression rate, and ensure that the accuracy of the compressed network is almost unchanged.
Example 3
A convolutional neural network compression system based on proxy model and gradient optimization, please refer to fig. 6, which includes:
the initialization module 1:
the method comprises the steps of obtaining a proxy model, a target network to be compressed and a target compression ratio of the target network, and initializing a micro-structure parameter and an experience pool; entering a model preheating stage after initialization is completed;
model preheating module 2:
for performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples in the experience pool has accumulated to exceed a preset batch size, the following steps are also performed in each round of the model warm-up phase: extracting batch-sized samples from the experience pool to train the proxy model; when the number of executed rounds exceeds the preset number of model preheating rounds, ending the model preheating stage and entering a searching stage;
the searching module 3:
for performing the following steps in each round of the search phase when the number of rounds performed has not reached a preset maximum number of run rounds: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of executed rounds does not reach the preset number of search rounds, the following steps are also executed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds reaches the maximum number of running rounds, ending the searching stage and entering a network pruning stage;
pruning and recovery module 4:
for generating a structure vector from the micro-structurable parameters; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein, the maximum operation round number is larger than the search round number and the model preheating round number.
Example 4
A medium having stored thereon a computer program which, when executed by a processor, implements the steps of the convolutional neural network compression method based on proxy model and gradient optimization of embodiment 1.
Example 5
A computer device comprising a medium, a processor, and a computer program stored in the medium and executable by the processor, the computer program when executed by the processor implementing the steps of the proxy model and gradient optimization based convolutional neural network compression method of embodiment 1.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A convolutional neural network compression method based on a proxy model and gradient optimization is characterized by comprising the following stages:
an initialization stage:
acquiring a proxy model, a target network to be compressed and a target compression ratio of the target network, and initializing a micro-structure parameter and an experience pool; entering a model preheating stage after initialization is completed;
a model preheating stage:
performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples in the experience pool has accumulated to exceed a preset batch size, the following steps are also performed in each round of the model warm-up phase: extracting batch-sized samples from the experience pool to train the proxy model; when the number of executed rounds exceeds the preset number of model preheating rounds, ending the model preheating stage and entering a searching stage;
a searching stage:
when the number of rounds executed does not reach a preset maximum number of running rounds, performing the following steps in each round of the search phase: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of executed rounds does not reach the preset number of search rounds, the following steps are also executed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds reaches the maximum number of running rounds, ending the searching stage and entering a network pruning stage;
pruning and recovering stages:
generating a structure vector from the micro-structurable parameter; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein, the maximum operation round number is larger than the search round number and the model preheating round number.
2. The convolutional neural network compression method based on proxy model and gradient optimization of claim 1, wherein the network accuracy assessment value is obtained by:
and pruning the target network in a mode of setting the output of the filter channel to be 0 according to the structural vector, and evaluating the pruning result to obtain the network accuracy evaluation value of the structural vector.
3. The proxy model and gradient optimization-based of claim 1Convolutional neural network compression method, characterized in that it uses a Gaussian distribution N (μ, σ) in the model warm-up phase2) Randomly generating a structure vector a ═ a1,a2,...,an) Where μ ═ ratio _ target, σ ═ 1, and ratio _ target is a target compression rate of the target network, i.e., a pruning rate of each layer of the target network.
4. The convolutional neural network compression method based on the proxy model and the gradient optimization of claim 1, wherein the proxy model is a multilayer perceptron, the proxy model is trained by a stochastic gradient descent method, and the loss function of the proxy model is root-mean-square loss:
Figure FDA0003248545880000021
SAccjis a structure vector ajCorresponding network accuracy prediction value, AccjIs a structure vector ajThe corresponding network accuracy assessment value, N, indicates the batch size.
5. The convolutional neural network compression method based on proxy model and gradient optimization of claim 1, wherein in the search stage the micro-structurable parameter a ═ (a) is determined by the following formula1,A2,...,An) Generating a structure vector a:
ai=sigmoid(Ai)*(1-min_a)+min_a,i=1,2,...,n;
where min _ a represents the minimum pruning rate, min _ a ∈ [0,1 ].
6. The convolutional neural network compression method based on proxy model and gradient optimization of claim 1, wherein the micro-configurable parameters are updated by a random gradient descent method, and a loss function used in the process of updating the micro-configurable parameters is as follows:
loss=a_loss+γ*f_loss;
wherein: a _ loss is-SAcc, and SAcc is a predicted value of network accuracy; f _ loss is a floating point operand; gamma is a penalty coefficient.
7. The convolutional neural network compression method based on proxy model and gradient optimization of claim 1, wherein knowledge distillation method is used to train the sub-networks obtained by pruning in the pruning and recovery phase to recover the accuracy.
8. A convolutional neural network compression system based on a proxy model and gradient optimization, comprising:
initialization module (1):
the method comprises the steps of obtaining a proxy model, a target network to be compressed and a target compression ratio of the target network, and initializing a micro-structure parameter and an experience pool; entering a model preheating stage after initialization is completed;
model preheating module (2):
for performing the following steps in each pass of the model warm-up phase: randomly generating a group of structure vectors comprising the pruning rate of each layer of the target network according to the target compression rate, acquiring a network accuracy evaluation value corresponding to the structure vectors, and adding the randomly generated structure vectors and the corresponding network accuracy evaluation values into the experience pool as samples;
after the number of samples in the experience pool has accumulated to exceed a preset batch size, the following steps are also performed in each round of the model warm-up phase: extracting batch-sized samples from the experience pool to train the proxy model; when the number of executed rounds exceeds the preset number of model preheating rounds, ending the model preheating stage and entering a searching stage;
search module (3):
for performing the following steps in each round of the search phase when the number of rounds performed has not reached a preset maximum number of run rounds: generating a structure vector by the micro-structure parameter, and inputting the structure vector generated by the micro-structure parameter into the proxy model to obtain a corresponding network accuracy predicted value and a floating point operand; optimizing and updating the micro-structurable parameter according to the network accuracy predicted value and the floating point operand;
when the number of executed rounds does not reach the preset number of search rounds, the following steps are also executed in each round of the search phase: acquiring a corresponding network accuracy evaluation value of a structure vector generated by the micro-structure parameter, and adding the structure vector generated by the micro-structure parameter and the corresponding network accuracy evaluation value into the experience pool as a sample; extracting batch-sized samples from the experience pool to train the proxy model;
when the number of executed rounds reaches the maximum number of running rounds, ending the searching stage and entering a network pruning stage;
pruning and recovery module (4):
for generating a structure vector from the micro-structurable parameters; pruning the target network in a mode of deleting a filter according to a structure vector generated by the micro-structure parameter; performing recovery training on the accuracy of the network on the sub-networks obtained by pruning in the pruning and recovery stage;
wherein, the maximum operation round number is larger than the search round number and the model preheating round number.
9. A medium having a computer program stored thereon, characterized in that: the computer program when executed by a processor implements the steps of the convolutional neural network compression method based on proxy model and gradient optimization of any of claims 1 to 7.
10. A computer device, characterized by: comprising a medium, a processor and a computer program stored in the medium and executable by the processor, which computer program, when executed by the processor, carries out the steps of the proxy model and gradient optimization based convolutional neural network compression method as claimed in any one of claims 1 to 7.
CN202111039434.9A 2021-09-06 2021-09-06 Convolutional neural network compression method based on agent model and gradient optimization Pending CN113837378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111039434.9A CN113837378A (en) 2021-09-06 2021-09-06 Convolutional neural network compression method based on agent model and gradient optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111039434.9A CN113837378A (en) 2021-09-06 2021-09-06 Convolutional neural network compression method based on agent model and gradient optimization

Publications (1)

Publication Number Publication Date
CN113837378A true CN113837378A (en) 2021-12-24

Family

ID=78962250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111039434.9A Pending CN113837378A (en) 2021-09-06 2021-09-06 Convolutional neural network compression method based on agent model and gradient optimization

Country Status (1)

Country Link
CN (1) CN113837378A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998648A (en) * 2022-05-16 2022-09-02 电子科技大学 Performance prediction compression method based on gradient architecture search
WO2023248454A1 (en) * 2022-06-24 2023-12-28 日立Astemo株式会社 Computation device and computation method
CN117910536A (en) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 Text generation method, and model gradient pruning method, device, equipment and medium thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114998648A (en) * 2022-05-16 2022-09-02 电子科技大学 Performance prediction compression method based on gradient architecture search
WO2023248454A1 (en) * 2022-06-24 2023-12-28 日立Astemo株式会社 Computation device and computation method
CN117910536A (en) * 2024-03-19 2024-04-19 浪潮电子信息产业股份有限公司 Text generation method, and model gradient pruning method, device, equipment and medium thereof

Similar Documents

Publication Publication Date Title
CN113837378A (en) Convolutional neural network compression method based on agent model and gradient optimization
Tung et al. Clip-q: Deep network compression learning by in-parallel pruning-quantization
Gomez et al. Learning sparse networks using targeted dropout
CN110366734B (en) Optimizing neural network architecture
CN108229667B (en) Trimming based on artificial neural network classification
CN109844773B (en) Processing sequences using convolutional neural networks
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
US20210004677A1 (en) Data compression using jointly trained encoder, decoder, and prior neural networks
CN114037844A (en) Global rank perception neural network model compression method based on filter characteristic diagram
CN110428045A (en) Depth convolutional neural networks compression method based on Tucker algorithm
CN112101547B (en) Pruning method and device for network model, electronic equipment and storage medium
CN108875752A (en) Image processing method and device, computer readable storage medium
CN111598238A (en) Compression method and device of deep learning model
CN110222171A (en) A kind of application of disaggregated model, disaggregated model training method and device
CN109885723A (en) A kind of generation method of video dynamic thumbnail, the method and device of model training
CN109919252A (en) The method for generating classifier using a small number of mark images
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
CN112001496A (en) Neural network structure searching method and system, electronic device and storage medium
CN113705811A (en) Model training method, device, computer program product and equipment
CN114282666A (en) Structured pruning method and device based on local sparse constraint
CN112101364A (en) Semantic segmentation method based on parameter importance incremental learning
CN111506748B (en) Method and device for managing intelligent database for face recognition
CN112507114A (en) Multi-input LSTM-CNN text classification method and system based on word attention mechanism
CN112115131A (en) Data denoising method, device and equipment and computer readable storage medium
CN108039168A (en) Acoustic model optimization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination