CN110874635A

CN110874635A - Deep neural network model compression method and device

Info

Publication number: CN110874635A
Application number: CN201811015359.0A
Authority: CN
Inventors: 张渊; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-03-10
Anticipated expiration: 2038-08-31
Also published as: CN110874635B

Abstract

The embodiment of the application provides a deep neural network model compression method and a device, wherein the deep neural network model compression method comprises the following steps: acquiring the current calculation state of a network layer in a deep neural network model to be compressed; obtaining the compression amount of a network layer through a pre-trained calculation model according to the current calculation state; compressing the network layer based on the compression amount; and determining the deep neural network model after network layer compression. Through the scheme, the output performance of the deep neural network model can be ensured.

Description

Deep neural network model compression method and device

Technical Field

The application relates to the technical field of deep learning, in particular to a deep neural network model compression method and device.

Background

DNN (Deep Neural Network) is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain. At present, DNNs such as CNN (Convolutional Neural Network), RNN (recurrent Neural Network), LSTM (Long Short Term Memory Network), and the like have been well applied in the aspects of target detection and segmentation, behavior detection and recognition, voice recognition, and the like.

With the increasingly complex actual scenes such as recognition, detection and the like, the requirements on the DNN function are continuously improved, the DNN network structure is also increasingly complex, the network layer number is continuously increased, and the computational complexity, the hard disk storage, the memory consumption and the like are greatly increased along with the increase of the network layer number. This requires that the hardware platform running DNN has large computation, high memory, high bandwidth, etc. However, hardware platform resources are usually limited, and how to reduce the cost of DNN on hardware platform resources has become an urgent problem to be solved in the development of deep learning technology.

In order to reduce the cost of DNN on hardware platform resources, a DNN model compression method is correspondingly provided, the compression amount of each network layer is manually set, and the compression amount is used for performing structured compression processing such as matrix decomposition, channel cutting and the like on each network layer, so that the calculated amount of each network layer can be reduced, and the purpose of reducing the cost of DNN on hardware platform resources is achieved. However, the setting of the compression amount is easily affected by human subjectivity, and the unreasonable setting of the compression amount directly affects the output performance of the DNN model.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for compressing a deep neural network model, so as to ensure output performance of the deep neural network model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a deep neural network model compression method, where the method includes:

acquiring the current calculation state of a network layer in a deep neural network model to be compressed;

obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state;

compressing the network layer based on the compression amount;

and determining the deep neural network model after network layer compression.

Optionally, the current computation state of the network layer includes: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

the acquiring the current calculation state of the network layer in the deep neural network model to be compressed includes:

acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;

and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.

Optionally, after compressing the network layer based on the compression amount, the method further includes:

and returning and executing the current calculation state of the network layer in the deep neural network model to be compressed aiming at the next network layer until all the network layers in the deep neural network model to be compressed are compressed.

Optionally, after determining the network layer compressed deep neural network model, the method further includes:

obtaining a sample set;

according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;

and updating the model parameters of the pre-trained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.

Optionally, after obtaining the first target deep neural network model with the current calculated amount reaching the preset target calculated amount and the model precision being greater than the preset threshold, the method further includes:

and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.

In a second aspect, an embodiment of the present application provides a deep neural network model compression apparatus, where the apparatus includes:

the acquisition module is used for acquiring the current calculation state of a network layer in the deep neural network model to be compressed;

the compression amount calculation module is used for obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state;

a compression module for compressing the network layer based on the compression amount; and determining the deep neural network model after network layer compression.

the acquisition module is specifically configured to:

Optionally, the obtaining module is further configured to:

and aiming at the next network layer, obtaining the current calculation state of the network layer in the deep neural network model to be compressed until all the network layers in the deep neural network model to be compressed are compressed.

Optionally, the apparatus further comprises: a short-time fine tuning module for:

obtaining a sample set;

Optionally, the apparatus further comprises: a long-term fine tuning module for:

According to the deep neural network model compression method and device provided by the embodiment of the application, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pre-trained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by the aid of the pre-trained calculation model according to the current calculation state and the current calculation state, and the calculation model used in compression amount calculation is pre-trained and has a self-learning function.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart illustrating a deep neural network model compression method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flowchart illustrating an example of a deep neural network model compression method according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a deep neural network model compression apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to ensure the output performance of the deep neural network model, the embodiment of the application provides a deep neural network model compression method, a deep neural network model compression device, an electronic device and a machine-readable storage medium.

Next, a method for compressing a deep neural network model provided in the embodiment of the present application is first described.

An execution main body of the deep neural network model compression method provided by the embodiment of the present application may be an electronic device for executing an intelligent algorithm, the electronic device may be an intelligent device having functions of target detection and segmentation, behavior detection and recognition, or voice recognition, for example, a remote computer, a remote server, an intelligent camera, an intelligent voice device, and the like, and the execution main body at least includes a processor loaded with a core processing chip. The method for implementing the deep neural network model compression method provided by the embodiment of the application can be at least one of software, hardware circuit and logic circuit arranged in an execution main body.

As shown in fig. 1, a deep neural network model compression method provided in an embodiment of the present application may include the following steps:

s101, obtaining the current calculation state of a network layer in the deep neural network model to be compressed.

The network layers in the deep neural network model to be compressed may include a convolution Conv layer, an inner product Innerproduct layer, and the like, and each network layer includes a parameter weight tensor for performing network operation.

The current computation state of the network layer is that the network layer performs network operation currently, and the generated information related to the computation amount can be represented in a vector form, for example, the information may include information such as the current computation amount of the network layer, the compressed computation amount, and the computation amount that needs to be compressed.

Optionally, the current computation state of the network layer may specifically include: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer.

S101 may specifically be:

The preset target calculation amount of the deep neural network model to be compressed is the calculation amount which needs to be reached after the deep neural network model is compressed, and is related to the capability of a hardware platform for operating the deep neural network model, namely the preset target calculation amount can be less than or equal to the maximum calculation amount which can be borne by the hardware platform. Based on the preset target calculated amount, the maximum calculated amount which can be allocated to each network layer can be determined according to the number of network layers, the scale of the network layers and the like of the deep neural network to be compressed.

The current calculated amount of the network layer may be determined according to the total calculated amount of the deep neural network model to be compressed and the current structure of each network layer (for example, the number of channels of the network layer, the size of a convolution kernel, the size of a weight tensor, and the like), and a difference value between the total calculated amount of the deep neural network model to be compressed and a preset target calculated amount is a calculated amount of the deep neural network model to be compressed, which needs to be compressed, for example, if the total calculated amount of the deep neural network model to be compressed is 130GB, and the preset target calculated amount is 100GB, the calculated amount of the deep neural network model to be compressed needs to be 30 GB. The amount of computation required to be compressed by each network layer can be determined according to the current structure of each network layer.

The compressed calculation amount of the network layer is a calculation amount of which the compression has been completed in the network layer, and the calculation amount to be compressed is a calculation amount of which the network layer needs to be compressed, for example, the ith network layer needs to compress a calculation amount of 5GB in total, and the ith network layer needs to compress a calculation amount of 2GB, so that the calculation amount to be compressed of the ith network layer is 3 GB.

And S102, obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state.

The pre-trained calculation model can be understood as a controller recorded with the corresponding relation between the calculation state and the compression amount, the controller can be understood as a hardware calculation module, and can also be understood as a software calculation unit, the calculation model can be a traditional neural network model such as CNN, RNN and the like, the calculation model is obtained by training in advance according to the corresponding relation between the calculation state sample and the compression amount, and the calculation model has higher accuracy in calculating the compression amount of a network layer. The compression amount is data that needs to be compressed when the network layer is compressed, and includes, for example, the size of each small matrix after matrix decomposition, the number of channels that need to be clipped, and the like. The training mode of the network model for calculating the compression amount is similar to the training mode of the traditional neural network models such as CNN and RNN, and the details are not repeated here.

S103, compressing the network layer based on the compression amount.

The compression amount is given to data that needs to be compressed when the network layer is compressed, and the network layer can be compressed based on the compression amount, for example, the ith network layer includes 256 filters, and if the compression amount is 56 filters obtained through the steps of S101 and S102, 56 filters need to be cut out from the 256 filters.

The compression method is not specifically limited in this embodiment, and the structured compression is performed by using the clipping filters as described above, and 56 filters may be randomly clipped, or 56 filters may be clipped in the order from small to large according to the sum of absolute values of weights. The compression mode can adopt any reasonable compression method at present, and is not described herein again.

Optionally, after S103, the method for compressing a deep neural network model provided in the embodiment of the present application may further include the following steps:

and returning to execute S101 to S103 for the next network layer until all network layers in the deep neural network model to be compressed are compressed.

The compression of the deep neural network model may be performed on one or some of the network layers, and of course, in order to reduce the computation amount of the deep neural network model to the maximum extent, all the network layers may be compressed, and the steps from S101 to S103 are performed for each network layer.

And S104, determining the deep neural network model after network layer compression.

After the network layers are compressed, the deep neural network model is the compressed deep neural network model, and as described above, the deep neural network model may be the deep neural network model in which all the network layers are compressed, or may be the deep neural network model in which one or a part of the network layers are compressed.

Optionally, after S104, the method for compressing a deep neural network model provided in the embodiment of the present application may further include the following steps:

obtaining a sample set;

according to the sample set and a preset iteration period, adjusting network parameters of the deep neural network model after network layer compression to obtain model precision;

and updating the model parameters of the pre-trained calculation model according to the model precision, and returning to execute S101 to S103 until obtaining a first target deep neural network model with the current calculation amount reaching a preset target calculation amount and the model precision being greater than a preset threshold value.

Because the deep neural network model is compressed, and a certain difference exists between the actual output of the deep neural network and the output of the initial deep neural network, and the precision is reduced, therefore, in order to enable the precision of the compressed deep neural network model to be closer to the initial precision of the initial deep neural network model, the network parameters of the deep neural network model need to be adjusted, and if the precision of the compressed deep neural network model is required to reach the initial precision of the initial deep neural network model, the period of the adjustment iteration is very long, therefore, in order to ensure the rapidity of the adjustment iteration, the adjustment iteration time efficiency is accelerated, and the period of the adjustment iteration can be set to be small, namely, the short-time adjustment iteration process is carried out.

After one adjustment iteration, the model precision of the deep neural network model can be improved to a certain extent, the improved model precision can be obtained, based on the model precision, the model parameters of the calculation model for calculating the compression amount can be updated, so that the compression amount for network layer compression is adjusted, the purpose of further improving the model precision is achieved, and in this way, the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold value can be obtained.

Optionally, after the step of obtaining the first target deep neural network model of which the current calculated amount reaches the preset target calculated amount and the model precision is greater than the preset threshold, the method for compressing the deep neural network model provided in the embodiment of the present application may further include the following steps:

After the first target deep neural network model is obtained, although the model accuracy of the first target deep neural network model is greatly improved, a certain difference exists between the model accuracy of the first target deep neural network model and the initial accuracy of the initial deep neural network model, therefore, in order to ensure that the model accuracy of the deep neural network model can reach the initial accuracy of the initial deep neural network model, iterative adjustment is continuously performed on a sample set by using the first target deep neural network model until the model converges, and the model accuracy reaches the initial accuracy. This iterative process of adjustment may require a long iteration period to pass, and thus may be a long iterative process of adjustment.

By applying the embodiment, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pre-trained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by the aid of the pre-trained calculation model according to the current calculation state and the current calculation state, and the calculation model used in compression amount calculation is pre-trained and has a self-learning function.

For convenience of understanding, the following describes in detail a deep neural network model compression method provided in an embodiment of the present application with reference to a specific example, and as shown in fig. 2, specific steps may include:

the method comprises the following steps: given to-be-compressed deep neural network model Net and calculated amount Flops required to be achieved after model compression_req。

Step two: computing the current computing state s of the ith layer of the network_iThe state is represented in a vector form, and may include information such as the amount of computation of the current layer, the amount of computation that the network has compressed, the amount of computation that the network needs to compress, and the like.

Step three: calculating the current state s_iInput controller R (controller R contains model parameters theta)_R) The controller R calculates the state s according to the current state_iGiving the compression a of the current i-layer_iBased on the compression amount a_iAnd performing structural compression on the ith layer. Wherein the structure is compressedThe method can adopt any reasonable structured compression method at present, and the controller R is a pre-trained calculation model.

Step four: and performing the operations of the second step and the third step on the i +1 layer, and repeating the steps until all layers in the deep neural network model are traversed.

Step five: on a sample set, carrying out short-time fine adjustment on the compressed deep neural network model, feeding back the model precision to a controller R, and updating a model parameter theta by the controller R according to a feedback signal_RAnd preparing for the next iteration.

Step six: and jumping to the step one, and repeating the next iteration until the deep neural network model which meets the model calculation amount and has the highest precision is obtained.

Step seven: and finally, performing long-term fine adjustment on the sample set by using the deep neural network model obtained in the step six until the model converges, and recovering the model precision.

In the scheme, the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, so that the inaccuracy and workload of manual setting of the compression amount are avoided, and the model precision can be improved to the greatest extent on the premise of meeting the calculated amount.

Corresponding to the foregoing method embodiment, an embodiment of the present application provides a deep neural network model compression apparatus, and as shown in fig. 3, the deep neural network model compression apparatus may include:

an obtaining module 310, configured to obtain a current computation state of a network layer in a deep neural network model to be compressed;

a compression amount calculation module 320, configured to obtain, according to the current calculation state, a compression amount of the network layer through a pre-trained calculation model;

a compression module 330, configured to compress the network layer based on the compression amount; and determining the deep neural network model after network layer compression.

Optionally, the current computation state of the network layer may include: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

the obtaining module 310 may be specifically configured to:

Optionally, the obtaining module 310 may be further configured to:

Optionally, the apparatus may further include: a short-time fine tuning module for:

obtaining a sample set;

Optionally, the apparatus may further include: a long-term fine tuning module for:

In order to guarantee the output performance of the deep neural network model, the embodiment of the present application further provides an electronic device, as shown in fig. 4, including a processor 401 and a machine-readable storage medium 402, wherein,

a machine-readable storage medium 402 for storing machine-executable instructions executable by the processor 401;

a processor 401 configured to be caused by machine executable instructions stored on a machine readable storage medium 402 to perform all the steps of the deep neural network model compression method provided by the embodiments of the present application.

The machine-readable storage medium 402 and the processor 401 may be in data communication by way of a wired or wireless connection, and the electronic device may communicate with other devices by way of a wired or wireless communication interface.

The machine-readable storage medium may include a RAM (Random Access Memory) and a NVM (Non-volatile Memory), such as at least one disk Memory. Alternatively, the machine-readable storage medium may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In this embodiment, the processor of the electronic device can realize that: the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, inaccuracy and workload of manual setting of the compression amount are avoided, the output performance of the deep neural network model is guaranteed, and the model precision can be improved to the maximum extent on the premise of meeting the calculated amount.

In addition, corresponding to the deep neural network model compression method provided in the foregoing embodiments, the present application provides a machine-readable storage medium for machine-executable instructions, which cause a processor to execute all the steps of the deep neural network model compression method provided in the present application.

In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the deep neural network model compression method provided in this embodiment when running, so that the following can be implemented: the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, inaccuracy and workload of manual setting of the compression amount are avoided, the output performance of the deep neural network model is guaranteed, and the model precision can be improved to the maximum extent on the premise of meeting the calculated amount.

For the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, and the machine-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for deep neural network model compression, the method comprising:

compressing the network layer based on the compression amount;

and determining the deep neural network model after network layer compression.

2. The method of claim 1, wherein the current computational state of the network layer comprises: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

3. The method of claim 1, wherein after said compressing the network layer based on the amount of compression, the method further comprises:

4. The method of claim 1, wherein after the determining the network layer compressed deep neural network model, the method further comprises:

obtaining a sample set;

5. The method according to claim 4, wherein after obtaining the first target deep neural network model with the current calculated amount reaching the preset target calculated amount and the model accuracy being greater than the preset threshold, the method further comprises:

6. An apparatus for deep neural network model compression, the apparatus comprising:

7. The apparatus of claim 6, wherein the current computation state of the network layer comprises: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

the acquisition module is specifically configured to:

8. The apparatus of claim 6, wherein the obtaining module is further configured to:

9. The apparatus of claim 6, further comprising: a short-time fine tuning module for:

obtaining a sample set;

10. The apparatus of claim 9, further comprising: a long-term fine tuning module for: