CN111291862A

CN111291862A - Method and apparatus for model compression

Info

Publication number: CN111291862A
Application number: CN202010038026.0A
Authority: CN
Inventors: 杨国亮; 李放; 喻丁玲
Original assignee: Buddhist Tzu Chi General Hospital
Current assignee: Jiangxi University of Science and Technology; Buddhist Tzu Chi General Hospital
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-06-16

Abstract

Embodiments of the present disclosure disclose methods and apparatus for model compression. One embodiment of the method comprises: acquiring a training sample set and a test sample set; processing the model to be compressed with the convolution layer and the BN layer according to the model processing step; extracting the elasticity value and BN layer parameter of the convolution filter stored in association with the historical accuracy, and determining the convolution filter to be deleted in the model to be compressed according to the elasticity value and BN layer parameter of the convolution filter and the preset model deletion probability; and deleting the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain the compressed model. This embodiment facilitates compression of deep learning models.

Description

Method and apparatus for model compression

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for model compression.

Background

With the development of artificial intelligence, deep learning makes breakthrough progress in various fields. The deep learning model may be deployed on different types of processing devices, and the processing devices may implement specific functions, such as image classification, object detection, and the like, through the deployed deep learning model.

In the related art, the deep learning model usually needs to occupy a large amount of storage space and consume a large amount of computation, so that the computation performance of the deep learning model is not high. Therefore, in the related art, the deep learning model needs to be compressed.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for model compression.

In a first aspect, an embodiment of the present disclosure provides a method for model compression, the method including: acquiring a training sample set and a test sample set, wherein the training samples in the training sample set comprise training data and first marking data corresponding to the training data, and the test samples in the test sample set comprise test data and second marking data corresponding to the test data; the model to be compressed with the convolution layer and the bn (batch normalization) layer is processed as follows: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the test accuracy corresponding to the test sample set at present according to the actual output and the expected output corresponding to the test samples in the test sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step; extracting the elasticity value and BN layer parameter of the convolution filter stored in association with the historical accuracy, and determining the convolution filter to be deleted in the model to be compressed according to the elasticity value and BN layer parameter of the convolution filter and the preset model deletion probability; and deleting the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain the compressed model.

In some embodiments, training data of training samples in the training sample set and first labeled data corresponding to the training data are respectively used as input and expected output of a model to be compressed, and training is performed to obtain a trained model, including: dividing the training sample set into at least one sub-training sample set; traversing each sub-training sample set, and respectively taking training data of training samples in the currently accessed sub-training sample set and first labeled data corresponding to the training data as input and expected output of the model to be compressed so as to train the model to be compressed; in response to the traversal being completed, training results in a trained model.

In some embodiments, the method further comprises: the compressed model is used as a model to be finely adjusted, and the following model fine adjustment steps are executed on the model to be finely adjusted: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be finely adjusted to obtain a trained model; adding one to the second count value; in response to the fact that the second counting value is smaller than a preset second counting threshold value, taking the trained model as a model to be subjected to fine tuning, and continuing to execute the model fine tuning step; and in response to determining that the second count value is greater than or equal to a second preset count threshold, taking the trained model as a fine-tuned model.

In some embodiments, the predetermined elasticity value calculation formula includes:

wherein k is the kth convolutional layer of the model to be compressed, i is the ith filter in the kth convolutional layer of the model to be compressed,

is the elastic value of the i-th filter in the k-th convolutional layer of the model to be compressed,

is the set of parameters for the ith filter in the kth convolutional layer of the model to be compressed,

is the total number of parameters, w, of the i-th filter in the k-th convolutional layer of the model to be compressed_jIs the j parameter in the i filter in the k convolutional layer of the model to be compressed, L (W) is the loss value corresponding to the i filter in the k convolutional layer of the model to be compressed, L is the loss function of the model to be compressed,

and (4) as the partial derivative of the loss function of the model to be compressed for the jth parameter, sigma is a summation operator, and | is an absolute value operator.

In a second aspect, an embodiment of the present disclosure provides an apparatus for model compression, the apparatus including: the data acquisition unit is configured to acquire a training sample set and a test sample set, wherein training samples in the training sample set comprise training data and first marking data corresponding to the training data, and test samples in the test sample set comprise test data and second marking data corresponding to the test data; a model processing unit configured to process a model to be compressed having a convolutional layer and a BN layer according to the following model processing steps: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the test accuracy corresponding to the test sample set at present according to the actual output and the expected output corresponding to the test samples in the test sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step; a content determination unit configured to extract an elastic value and a BN layer parameter of the convolution filter stored in association with the historical accuracy, and determine the convolution filter to be deleted in the model to be compressed according to the elastic value, the BN layer parameter and a preset model deletion probability of the convolution filter; and the model compression unit is configured to delete the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain a compressed model.

In some embodiments, in the model processing unit, training data of training samples in the training sample set and first labeled data corresponding to the training data are respectively used as input and expected output of the model to be compressed, and training to obtain the trained model includes: dividing the training sample set into at least one sub-training sample set; traversing each sub-training sample set, and respectively taking training data of training samples in the currently accessed sub-training sample set and first labeled data corresponding to the training data as input and expected output of the model to be compressed so as to train the model to be compressed; in response to the traversal being completed, training results in a trained model.

In some embodiments, the apparatus further comprises a model fine tuning unit configured to: taking the compressed model as a model to be finely adjusted, and performing the following model fine adjustment steps on the model to be finely adjusted: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be finely adjusted to obtain a trained model; adding one to the second count value; in response to the fact that the second counting value is smaller than a preset second counting threshold value, taking the trained model as a model to be subjected to fine tuning, and continuing to execute the model fine tuning step; and in response to determining that the second count value is greater than or equal to a second preset count threshold, taking the trained model as a fine-tuned model.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when executed by the one or more processors, cause the one or more processors to implement a method as described in any implementation of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor implements the method as described in any of the implementations of the first aspect.

According to the method and the device for compressing the model, the training sample set and the testing sample set are adopted to execute the model processing steps for the model to be compressed for multiple times, and the parameter of the model to be compressed corresponding to the highest testing accuracy is used as an analysis object, so that the accuracy of data analysis can be improved, and the accuracy of model compression is improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram for one embodiment of a method for model compression according to the present disclosure;

FIG. 2 is a schematic block diagram of one embodiment of an apparatus for model compression according to the present disclosure;

FIG. 3 is a schematic block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. Those skilled in the art will also appreciate that, although the terms "first," "second," etc. may be used herein to describe various annotation data, count values, count thresholds, etc., these annotation data, count values, count thresholds, etc. should not be limited by these terms. These terms are used only to distinguish one annotation datum, count value, count threshold from other annotation data, count values, count thresholds.

FIG. 1 illustrates a flow 100 of one embodiment of a method for model compression according to the present disclosure. The method for model compression comprises the following steps:

step 101, a training sample set and a testing sample set are obtained.

The training samples in the training sample set comprise training data and first marking data corresponding to the training data, and the test samples in the test sample set comprise test data and second marking data corresponding to the test data.

In this embodiment, the execution subject of the method for model compression may be a server. The server may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server.

In this embodiment, the executing entity may obtain a training sample set and a testing sample set. It should be noted that the training sample set and the testing sample set may be directly stored locally, or may be stored in other electronic devices communicatively connected to the execution subject. When the training sample set and the test sample set are stored locally, the executing agent may directly extract the locally stored training sample set and test sample set for processing. When the training sample set and the test sample set are stored in other electronic equipment in communication connection with the execution subject, the execution subject may acquire the training sample set and the test sample set for processing through a wired connection manner or a wireless connection manner.

In practice, in order to verify the actual effectiveness of the method provided by the present embodiment, a standard data set widely used in the art is usually selected as a training sample set and a testing sample set. Specifically, as an example, a CIFAR-10 dataset, a NWPU-RESISC45 dataset, or the like may be selected as the training sample set and the testing sample set. Where the CIFAR-10 dataset consists of 60000 color images of size 32X 32, there are 10 classes with 6000 images in each class. The NWPU-RESISC45 dataset is a REmote Sensing Image Scene Classification (RESISC) dataset created by North-West University of Industrial (NWPU). The data set contains 31500 images of 256 × 256 sizes, covering 45 scene classes, such as airports, baseball fields, etc., with 700 images per class. In practice, one part, e.g. 20%, of the selected standard data set may be used as a training sample set, and another part, e.g. 80%, may be used as a testing sample set. In addition, since the size of the image in the NWPU-RESISC45 data set is large, in order to increase the data processing speed, in practice, the size of the image in the data set may be cut in advance.

Step 102, processing the model to be compressed with the convolution layer and the BN layer according to the following model processing steps.

Wherein, the model processing step may include:

and 1021, training the training data of the training samples in the training sample set and the first labeled data corresponding to the training data respectively as the input and the expected output of the model to be compressed to obtain the trained model.

Here, for each training sample in the training sample set, the executing entity may use the training data of the training sample as an input of the model to be compressed, and use the first labeled data corresponding to the training data as an expected output of the model to be compressed, so as to implement training of the model to be compressed. After the matched mould rows are trained by using all the training samples in the training sample set, the trained model can be obtained.

In some optional implementation manners of this embodiment, the training data of the training samples in the training sample set and the first labeled data corresponding to the training data are respectively used as the input and the expected output of the model to be compressed, and the training to obtain the trained model may include:

first, a training sample set is divided into at least one sub-training sample set.

In practice, the number of training samples in each sub-training sample set obtained by division is the same.

Then, traversing each sub-training sample set, and respectively taking the training data of the training samples in the currently accessed sub-training sample set and the first labeled data corresponding to the training data as the input and the expected output of the model to be compressed so as to train the model to be compressed.

And finally, responding to the completion of traversal, and training to obtain a trained model.

In this implementation, when the model is trained by using the GPU (graphics Processing unit), if the training sample data is obtained by reading the training sample data from the file, the usage rate of the GPU may be reduced, and the training sample data is obtained by directly importing the training sample data into the memory, so that the usage rate of the GPU is relatively high. And because the data volume of the training sample set is usually large, the training sample set cannot be imported into the memory at one time. Therefore, the training sample set may be divided to obtain a plurality of sub-training sample sets, and then each sub-training sample set is introduced into the memory in batches. Therefore, the utilization rate of the GPU can be improved, and the improvement of the data processing efficiency is facilitated.

And 1022, extracting the convolutional layer parameters of the convolutional layer of the trained model and the BN layer parameters of the BN layer, and calculating to obtain the elastic value of the convolutional filter in the convolutional layer by using the extracted convolutional layer parameters and a preset elastic value calculation formula.

Here, after obtaining the trained model, the execution body may directly extract all convolutional layer parameters of the convolutional layer of the trained model and all BN layer parameters of the BN layer.

It should be noted that the model to be compressed generally has a plurality of convolution layers and a plurality of BN layers. There are typically multiple convolution filters in a convolutional layer, and the number of output channels of the convolutional layer is the same as the number of convolution filters. Each output channel of the convolutional layer may have a corresponding BN layer channel in the BN layer. Each convolution filter in a convolutional layer has parameters, and the parameters of each convolution filter in a convolutional layer may be referred to as convolutional layer parameters for that convolutional layer. Each BN layer channel in the BN layer has parameters, and the parameters of each BN layer channel in the BN layer may be referred to as BN layer parameters. In addition, convolutional layer parameters typically include a weight parameter and a bias parameter for each convolutional filter. In practice, only the weight parameters are usually considered when analyzing convolutional layers.

Here, in order to more specifically explain the relationship between the concepts, the following is exemplified: if the model to be compressed has one convolutional layer, if the dimension of the input data of the convolutional layer is [32, 3, 32, 32], it indicates that the input of the convolutional layer can be 32 RGB images with the size of 32 × 32, 3 is an input channel and 3 input channels respectively correspond to R, G, B channels of the image, and the first 32 is the number of images that can be processed each time. Further, if the output data of the convolutional layer has dimensions of [32, 64, 32, 32] and the convolutional kernel size of each convolutional filter in the convolutional layer is 3 × 3, it can indicate that the convolutional layer has 64 output channels, i.e., 64 convolutional filters, and each convolutional filter has parameters having dimensions of [3, 3, 3], and when only the weight parameters are considered, the convolutional layer can include 64 × 3 × 3 × 3 convolutional layer parameters. In addition, for each output channel of the convolutional layer, one BN layer channel corresponds to one BN layer channel.

By way of further example, if the model to be compressed has a convolutional layer with 3 input channels, 64 output channels (i.e., the convolutional layer has 64 convolutional filters), and the size of the convolutional kernel for each convolutional filter is 3 × 3, then the dimension of the convolutional layer parameters for the convolutional layer is (64 × 3 × 3 × 3). Further, if it is determined that 10 convolutional filters of the convolutional layer need to be deleted, the dimension of the convolutional layer parameter of the convolutional layer becomes (54 × 3 × 3 × 3). At this time, the BN layer also needs to delete the corresponding 10 BN layer channels, and to keep 54 of the original 64 BN layer channels.

In addition, the data processing procedure for each BN layer channel is as follows:

in the formula (1), Z_inInput data for BN layer channels, Z_outAs the output data of the BN layer channel,

is an intermediate variable, μ_BFor the mean, σ, of the data that can be processed each time for the convolutional layer_BFor the standard deviation of the data that can be processed by the convolution layer at each time, e is a preset constant close to zero, and γ and β are two variable parameters respectively.

As can be seen from the above description, each BN layer channel generally has two BN layer parameters, i.e., parameter γ and parameter β.

The predetermined elasticity value calculation formula may be a predetermined calculation formula for calculating the importance of each convolution filter in the convolution layer.

Alternatively, the above elastic value calculation formula may be formula (2):

in the formula (2), k is the kth convolutional layer of the model to be compressed, i is the ith filter in the kth convolutional layer of the model to be compressed,

as a model to be compressedSet of parameters for the ith filter in k convolutional layers, w_jIs the jth parameter in the ith filter in the kth convolutional layer of the model to be compressed, L (W | W |)_j0) is the loss value corresponding to the deleted jth parameter in the ith filter in the kth convolutional layer of the model to be compressed, L is the loss function of the model to be compressed,

It should be noted that, in general, it is considered that the greater the loss value corresponding to a certain convolution filter in a convolution layer being deleted, the higher the importance of the convolution filter, that is, the less suitable it is for deletion.

In this embodiment, after extracting the convolutional layer parameters, the execution body may calculate the elastic values of the convolutional filters in the convolutional layers by using the extracted convolutional layer parameters and a preset elastic value calculation formula.

In some optional implementations of the present embodiment, the preset elasticity value calculation formula may also be formula (3):

It is noted that the expression (3) in the present implementation comes from the following:

first, the elasticity value of the jth parameter in the ith filter in the kth convolutional layer of the model to be compressed is defined as: the ratio of the variation ratio of the loss value when the parameter is deleted to the variation ratio of the parameter value itself is expressed as follows:

then, a first order Taylor approximation is performed on L (W) to obtain the following relationship:

then, substituting equation (5) into expression (4), the elasticity value of the jth parameter in the ith filter in the kth convolutional layer of the model to be compressed can be obtained as:

then, the elastic value of the ith filter in the kth convolutional layer of the model to be compressed can be obtained as:

finally, since the number of the parameters of the convolution filters in different convolution layers is different, when calculating the elastic value of each convolution filter, an average value calculation step is added, namely dividing by the elastic value

To obtain the above formula (3).

In this implementation, the above-mentioned average value calculation step can realize the overall balance of the importance of each convolution filter, thereby realizing the accurate deletion of the convolution filter and contributing to the improvement of the accuracy of model compression. In addition, the ratio of the variation proportion of the loss value when the parameter is deleted to the variation proportion of the parameter value is used for measuring the importance degree of the parameter, so that the importance degree of the parameter can be measured more stably and accurately, the importance degree of the convolution filter can be measured more stably and accurately, and the accuracy of compressing the model can be further improved.

And 1023, respectively taking the test data of the test sample in the test sample set and the second labeled data corresponding to the test data as the input and the expected output of the trained model to obtain the actual output corresponding to the test data of the test sample.

Here, after training the model to be compressed once, the executing entity may perform model testing on the trained model obtained after training. Specifically, for each test sample in the test sample set, the executing entity may use the test data of the test sample as an input of the trained model, and use the second labeled data corresponding to the test data as an expected output of the trained model, so as to implement testing of the model to be compressed. Where there is one desired output and one actual output for each test data.

And 1024, determining the test accuracy corresponding to the current test sample set according to the actual output and the expected output corresponding to the test samples in the test sample set.

Here, the executing agent may execute the following determination steps for each test sample in the test sample set: and comparing the actual output corresponding to the test sample with the expected output, and if the actual output and the expected output are the same, determining that the test is accurate. Otherwise, if the two are different, the test is not accurate.

In this embodiment, the number of times of accurate tests can be obtained by performing the above determination step on each test sample in the test sample set. Then, the ratio of the obtained accurate testing times to the total number of elements in the testing sample set can be used as the testing accuracy corresponding to the testing sample set.

And 1025, in response to determining that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy to the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in an associated manner.

Wherein, the value of the pre-stored historical accuracy is 0 under the default condition.

Here, through the switching of the historical accuracy, the model parameters at the time of the highest test accuracy can be stored, so that the capturing of the relatively optimal model parameters and the relatively optimal model is realized.

And step 1026, adding one to the first counting value, and taking the trained model as the model to be compressed.

Wherein, the value of the first counting value is 0 under the default condition.

Step 1027, in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step.

The first count threshold may be a preset value, such as 100, 200, etc. The executing body may repeat the model processing step when the first count value is smaller than the first count threshold, and stop the model processing step until the first count value is greater than or equal to the first count threshold.

And 103, extracting the elasticity value and BN layer parameter of the convolution filter stored in association with the historical accuracy, and determining the convolution filter to be deleted in the model to be compressed according to the elasticity value, BN layer parameter and preset model deletion probability of the convolution filter.

The model pruning probability generally refers to the probability that all convolution filters in the model to be compressed are pruned. As an example, if there are 100 convolution filters in the model to be compressed, and the model pruning probability is 20%, then 20 convolution filters in the model to be compressed will be deleted.

It is noted that the greater the elasticity value of the convolution filter, the greater the importance of the convolution filter with respect to the model to be compressed. Therefore, it is common to delete the convolution filter with a low elasticity value in the model to be compressed. In addition, as can be seen from the above equation (1), since the output of the convolution filter is the input of the corresponding BN layer channel, the output Z of the BN layer channel increases as the parameter γ of the corresponding BN layer channel increases_outIs subject to a corresponding input Z_inThe larger the influence of the value of (b) is, that is, the larger the influence of the convolution filter on the corresponding BN layer channel is, the higher the importance of the convolution filter on the model to be compressed is as a whole.

By combining the above analysis, the present embodiment may perform a comprehensive analysis on the elasticity value of the convolution filter and the parameter γ value of the corresponding BN layer channel, so as to determine the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed. As an example, the elastic value and the corresponding parameter γ value may be calculated for each convolution filter. And then determining a part of the convolution filters with smaller corresponding product values in all the convolution filters as the convolution filters to be deleted. As another example, a part of all convolution filters having a smaller corresponding elasticity value may be determined as the first pre-erased convolution filter. Then, a part of the convolution filters corresponding to the smaller value of the parameter γ among all the convolution filters is determined as a second pre-erased convolution filter. And finally, taking the convolution filter which is simultaneously determined as the first pre-deleted convolution filter and the second pre-deleted convolution filter in all the convolution filters as the convolution filter to be deleted.

And 104, deleting the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain the compressed model.

Here, after determining the convolution filter to be deleted, the execution main body may directly delete the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed, so as to obtain the compressed model.

In the embodiment, the training sample set and the test sample set are adopted to execute a plurality of model processing steps on the model to be compressed, and the parameter of the model to be compressed corresponding to the highest test accuracy is used as an analysis object, so that the accuracy of data analysis can be improved, and the accuracy of compressing the model is improved.

In some optional implementations of this embodiment, the method for compressing a model may further include: and taking the compressed model as a model to be finely adjusted, and performing the following model fine adjustment steps on the model to be finely adjusted. Wherein, the model fine tuning step may include:

step one, training data of training samples in a training sample set and first labeled data corresponding to the training data are respectively used as input and expected output of a model to be finely tuned, and the trained model is obtained through training.

And step two, adding one to the second counting value.

Wherein, the value of the second counting value is 0 under the default condition.

And step three, in response to the fact that the second counting value is smaller than the preset second counting threshold value, taking the trained model as the model to be subjected to fine tuning, and continuing to execute the model fine tuning step. And in response to determining that the second count value is greater than or equal to a second preset count threshold, taking the trained model as a fine-tuned model.

The second count threshold may be a predetermined value, such as 100, 200, etc. The execution body may repeat the model fine-tuning step when the second count value is smaller than the second count threshold, and stop performing the model fine-tuning step until the second count value is greater than or equal to the second count threshold. And when the step of fine tuning the model is stopped, taking the model obtained by training as the fine tuned model.

It should be noted that, since the parameters in the compressed model are obtained by training the model to be compressed using the training sample set, if the parameters obtained by training the model to be compressed are directly used as the parameters of the compressed model, the accuracy of the compressed model may be reduced. Therefore, the training sample set is adopted to train the compressed model continuously, which is beneficial to obtaining parameters more suitable for the compressed model, thereby improving the precision of the compressed model.

Tables 1-4 show experimental comparison data before and after compression of a model to be compressed using the method for model compression of the present disclosure. The training sample set and the testing sample set in tables 1-4 are all selected from CIFAR-10 data set, and the models to be compressed in tables 1-4 are VGG16 network, DenseNet40 network, Resnet164 network and ResNet56 network, respectively.

TABLE 1 Experimental comparison data with model to be compressed as VGG16 network

TABLE 2 Experimental comparison data for DenseNet40 as model to be compressed

TABLE 3 Experimental comparison data with Resnet164 network as model to be compressed

TABLE 4 Experimental comparison data for ResNet56 network as model to be compressed

As can be seen from the experimental comparison data in tables 1 to 4, after the model to be compressed is compressed by the method for compressing a model of the present disclosure, the difference between the test accuracy of the model before compression and the test accuracy of the model after compression is not large. Specifically, taking the data in table 1 as an example, when the model reduction probability is 70%, the test accuracy of the compressed model is 0.25% higher than that of the model before compression. And when the model deletion probability is 80%, the test accuracy of the compressed model is only reduced by 0.16% compared with the test accuracy of the model before compression.

In addition, as can be seen from the experimental comparison data in tables 1 to 4, after the model to be compressed is compressed by the method for compressing a model of the present disclosure, the parameters and the calculated amount of the compressed model are significantly reduced.

In summary, it can be known from the above experimental data that the method for compressing the model according to the present disclosure can reduce the parameter quantity and the calculation quantity of the model while ensuring the test accuracy of the model, can save the storage space for storing the model, and can save the calculation resources required by the operation of the model.

With further reference to fig. 2, as an implementation of the method shown in fig. 1, the present disclosure provides an embodiment of an apparatus for model compression, which corresponds to the embodiment of the method shown in fig. 1, and which may be applied in various electronic devices in particular.

As shown in fig. 2, the apparatus 200 for model compression of the present embodiment includes: a data obtaining unit 201 configured to obtain a training sample set and a test sample set, where a training sample in the training sample set includes training data and first labeling data corresponding to the training data, and a test sample in the test sample set includes test data and second labeling data corresponding to the test data; a model processing unit 202 configured to process the model to be compressed having the convolutional layer and the BN layer according to the following model processing steps: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the test accuracy corresponding to the test sample set at present according to the actual output and the expected output corresponding to the test samples in the test sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step; a content determining unit 203 configured to extract the elasticity value and the BN layer parameter of the convolution filter stored in association with the historical accuracy, and determine the convolution filter to be deleted in the model to be compressed according to the elasticity value, the BN layer parameter, and a preset model deletion probability of the convolution filter; and the model compression unit 204 is configured to delete the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed, so as to obtain a compressed model.

In some optional implementation manners of this embodiment, in the model processing unit 202, training data of training samples in the training sample set and first labeled data corresponding to the training data are respectively used as input and expected output of the model to be compressed, and training to obtain the trained model may include: first, a training sample set is divided into at least one sub-training sample set. Then, traversing each sub-training sample set, and respectively taking the training data of the training samples in the currently accessed sub-training sample set and the first labeled data corresponding to the training data as the input and the expected output of the model to be compressed so as to train the model to be compressed. And finally, responding to the completion of traversal, and training to obtain a trained model.

In some optional implementations of this embodiment, the apparatus may further include a model fine-tuning unit (not shown in the figure). The model fine-tuning unit may be configured to: taking the compressed model as a model to be finely adjusted, and performing the following model fine adjustment steps on the model to be finely adjusted: firstly, training data of training samples in a training sample set and first labeling data corresponding to the training data are respectively used as input and expected output of a model to be fine-tuned, and the trained model is obtained through training. Then, the second count value is incremented by one. And then, in response to the fact that the second counting value is smaller than a preset second counting threshold value, taking the trained model as the model to be subjected to fine tuning, and continuing to execute the model fine tuning step. And finally, in response to determining that the second count value is greater than or equal to a second preset count threshold, taking the trained model as a fine-tuned model.

In some optional implementations of this embodiment, the preset elasticity value calculation formula includes:

According to the device provided by the embodiment of the disclosure, the training sample set and the test sample set are adopted to execute the model processing steps for multiple times on the model to be compressed, and the parameter of the model to be compressed corresponding to the highest test accuracy is taken as the analysis object, so that the accuracy of data analysis can be improved, and the accuracy of model compression is improved.

Referring now to FIG. 3, a block diagram of an electronic device (e.g., a server) 300 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a Central Processing Unit (CPU), a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 3 may represent one device or may represent multiple devices, as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing apparatus 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the steps of: acquiring a training sample set and a test sample set, wherein the training samples in the training sample set comprise training data and first marking data corresponding to the training data, and the test samples in the test sample set comprise test data and second marking data corresponding to the test data; processing a model to be compressed with a convolution layer and a BN layer according to the following model processing steps: training data of training samples in the training sample set and first labeling data corresponding to the training data respectively as input and expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the test accuracy corresponding to the test sample set at present according to the actual output and the expected output corresponding to the test samples in the test sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step; extracting the elasticity value and BN layer parameter of the convolution filter stored in association with the historical accuracy, and determining the convolution filter to be deleted in the model to be compressed according to the elasticity value and BN layer parameter of the convolution filter and the preset model deletion probability; and deleting the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain the compressed model.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a data acquisition unit, a model processing unit, a content determination unit, and a model compression unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, the data acquisition unit may also be described as a "unit that acquires a training sample set and a test sample set".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for model compression, comprising:

acquiring a training sample set and a test sample set, wherein training samples in the training sample set comprise training data and first marking data corresponding to the training data, and test samples in the test sample set comprise test data and second marking data corresponding to the test data;

processing a model to be compressed with a convolution layer and a BN layer according to the following model processing steps: training the training data of the training samples in the training sample set and the first labeled data corresponding to the training data respectively as the input and the expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the current testing accuracy corresponding to the testing sample set according to the actual output and the expected output corresponding to the testing samples in the testing sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step;

extracting the elasticity value and BN layer parameter of the convolution filter stored in association with the historical accuracy, and determining the convolution filter to be deleted in the model to be compressed according to the elasticity value and BN layer parameter of the convolution filter and the preset model deletion probability;

and deleting the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain the compressed model.

2. The method according to claim 1, wherein training the training data of the training samples in the training sample set and the first labeled data corresponding to the training data as the input and the expected output of the model to be compressed respectively to obtain the trained model comprises:

dividing the training sample set into at least one sub-training sample set;

traversing each sub-training sample set, and respectively taking training data of training samples in the currently accessed sub-training sample set and first labeled data corresponding to the training data as input and expected output of the model to be compressed so as to train the model to be compressed;

in response to the traversal being completed, training results in a trained model.

3. The method of claim 1, wherein the method further comprises:

taking the compressed model as a model to be finely adjusted, and performing the following model fine adjustment steps on the model to be finely adjusted:

training the training data of the training samples in the training sample set and the first labeled data corresponding to the training data respectively as the input and the expected output of the model to be finely adjusted to obtain a trained model; adding one to the second count value; in response to the fact that the second counting value is smaller than a preset second counting threshold value, taking the trained model as a model to be subjected to fine tuning, and continuing to execute the model fine tuning step; and in response to determining that the second count value is greater than or equal to a second preset count threshold, taking the trained model as a fine-tuned model.

4. The method of claim 1, wherein the predetermined elasticity value calculation formula comprises:

wherein k isThe kth convolutional layer of the model to be compressed, i is the ith filter in the kth convolutional layer of the model to be compressed,

5. An apparatus for model compression, comprising:

a data obtaining unit configured to obtain a training sample set and a test sample set, wherein training samples in the training sample set include training data and first labeling data corresponding to the training data, and test samples in the test sample set include test data and second labeling data corresponding to the test data;

a model processing unit configured to process a model to be compressed having a convolutional layer and a BN layer according to the following model processing steps: training the training data of the training samples in the training sample set and the first labeled data corresponding to the training data respectively as the input and the expected output of the model to be compressed to obtain a trained model; extracting convolution layer parameters of the convolution layer of the trained model and BN layer parameters of the BN layer, and calculating to obtain an elastic value of a convolution filter in the convolution layer by adopting the extracted convolution layer parameters and a preset elastic value calculation formula; respectively taking the test data of the test samples in the test sample set and the second label data corresponding to the test data as the input and expected output of the trained model to obtain actual outputs corresponding to the test data of the test samples; determining the current testing accuracy corresponding to the testing sample set according to the actual output and the expected output corresponding to the testing samples in the testing sample set; in response to the fact that the determined test accuracy is higher than the pre-stored historical accuracy, switching the determined test accuracy into the historical accuracy, and storing the determined test accuracy, the BN layer parameters of the BN layer and the calculated elastic value of the convolution filter in the convolution layer in a correlation mode; adding one to the first count value, and taking the trained model as a model to be compressed; in response to determining that the first count value is less than a preset first count threshold, continuing to perform the model processing step;

a content determination unit configured to extract an elastic value and a BN layer parameter of the convolution filter stored in association with the historical accuracy, and determine the convolution filter to be deleted in the model to be compressed according to the elastic value, the BN layer parameter and a preset model deletion probability of the convolution filter;

and the model compression unit is configured to delete the convolution filter to be deleted and the BN layer channel corresponding to the convolution filter to be deleted in the model to be compressed to obtain a compressed model.

6. The apparatus according to claim 5, wherein the training data of the training samples in the training sample set and the first labeled data corresponding to the training data are respectively used as the input and the expected output of the model to be compressed, and the training of the model processing unit obtains the trained model, and includes:

dividing the training sample set into at least one sub-training sample set;

7. The apparatus of claim 5, wherein the apparatus further comprises a model fine tuning unit configured to:

8. The apparatus of claim 5, wherein the predetermined elasticity value calculation formula comprises:

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.