CN110874635B

CN110874635B - Deep neural network model compression method and device

Info

Publication number: CN110874635B
Application number: CN201811015359.0A
Authority: CN
Inventors: 张渊; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2023-06-30
Anticipated expiration: 2038-08-31
Also published as: CN110874635A

Abstract

The embodiment of the application provides a deep neural network model compression method and device, wherein the deep neural network model compression method comprises the following steps: acquiring the current calculation state of a network layer in a deep neural network model to be compressed; according to the current calculation state, obtaining the compression amount of a network layer through a pre-trained calculation model; compressing the network layer based on the compression amount; and determining a depth neural network model after network layer compression. By the scheme, the output performance of the deep neural network model can be ensured.

Description

Deep neural network model compression method and device

Technical Field

The application relates to the technical field of deep learning, in particular to a deep neural network model compression method and device.

Background

DNN (Deep Neural Network ) is an emerging field in machine learning research, parsing data by mimicking the mechanism of the human brain, and is an intelligent model for analytical learning by building and modeling the human brain. At present, DNNs such as CNN (Convolutional Neural Network ), RNN (Recurrent Neural Network, recurrent neural network), LSTM (Long Short Term Memory, long-short-term memory network) and the like have been well applied in the aspects of target detection and segmentation, behavior detection and recognition, speech recognition and the like.

Along with the increasing complexity of the actual scenes such as identification, detection and the like, the function requirements on DNN are continuously improved, the network structure of DNN is also more and more complex, the network layer number is continuously increased, and the calculation complexity, hard disk storage, memory consumption and the like are greatly increased. This requires that the hardware platform running DNN has the characteristics of large calculation amount, high memory, high bandwidth, etc. However, the hardware platform resources are usually limited, and how to reduce the overhead of DNN on the hardware platform resources has become a problem to be solved in the development of deep learning technology.

In order to reduce the expenditure of DNN on hardware platform resources, a DNN model compression method is correspondingly provided, and the compression amount of each network layer is set manually, and the structural compression processing such as matrix decomposition, channel cutting and the like is carried out on each network layer based on the compression amount, so that the calculation amount of each network layer can be reduced, and the purpose of reducing the expenditure of DNN on hardware platform resources is achieved. However, the setting of the compression amount is easy to be subjectively influenced by manpower, and the unreasonable setting of the compression amount directly influences the output performance of the DNN model.

Disclosure of Invention

The embodiment of the application aims to provide a deep neural network model compression method and device so as to ensure the output performance of a deep neural network model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a deep neural network model compression method, where the method includes:

acquiring the current calculation state of a network layer in a deep neural network model to be compressed;

obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state;

compressing the network layer based on the compression amount;

and determining a depth neural network model after network layer compression.

Optionally, the current computing state of the network layer includes: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

the obtaining the current calculation state of the network layer in the deep neural network model to be compressed includes:

acquiring a preset target calculated amount of a depth neural network model to be compressed, a current calculated amount of a network layer in the depth neural network model to be compressed and a compressed calculated amount;

and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.

Optionally, after the compressing the network layer based on the compression amount, the method further includes:

and for the next network layer, returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until compression is completed on all the network layers in the deep neural network model to be compressed.

Optionally, after the determining the network layer compressed deep neural network model, the method further includes:

acquiring a sample set;

according to the sample set and a preset iteration period, network parameters of the depth neural network model after the network layer compression are adjusted to obtain model precision;

and updating model parameters of the pre-trained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the depth neural network model to be compressed until a first target depth neural network model with the current calculation amount reaching a preset target calculation amount and the model precision being larger than a preset threshold value is obtained.

Optionally, after the obtaining the first target depth neural network model with the current calculated amount reaching the preset target calculated amount and the model precision being greater than the preset threshold, the method further includes:

and according to the sample set, adjusting network parameters of the first target deep neural network model until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.

In a second aspect, embodiments of the present application provide a deep neural network model compression apparatus, the apparatus including:

the acquisition module is used for acquiring the current calculation state of the network layer in the deep neural network model to be compressed;

the compression amount calculation module is used for obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state;

the compression module is used for compressing the network layer based on the compression amount; and determining a depth neural network model after network layer compression.

the acquisition module is specifically configured to:

Optionally, the acquiring module is further configured to:

and aiming at the next network layer, acquiring the current calculation state of the network layer in the deep neural network model to be compressed until compression is completed on all the network layers in the deep neural network model to be compressed.

Optionally, the apparatus further includes: short-time fine tuning module for:

acquiring a sample set;

Optionally, the apparatus further includes: a long-time fine tuning module for:

According to the depth neural network model compression method and device, the current calculation state of the network layer in the depth neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pre-trained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the depth neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by utilizing a pre-trained calculation model according to the current calculation state, and the calculation model used in the calculation of the compression amount is obtained by pre-training and has a self-learning function, so that the rationality of the compression amount according to which the network layer compresses can be ensured by utilizing the pre-trained calculation model, and the output performance of the deep neural network model is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a deep neural network model compression method according to an embodiment of the present application;

FIG. 2 is a flow chart of an example of a deep neural network model compression method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a deep neural network model compression device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In order to ensure output performance of a deep neural network model, embodiments of the present application provide a method, an apparatus, an electronic device, and a machine-readable storage medium for compressing the deep neural network model.

Next, first, a deep neural network model compression method provided in an embodiment of the present application will be described.

An execution body of the deep neural network model compression method provided by the embodiment of the application may be an electronic device executing an intelligent algorithm, where the electronic device may be an intelligent device with functions of target detection and segmentation, behavior detection and recognition or voice recognition, for example, a remote computer, a remote server, an intelligent camera, an intelligent voice device, etc., and the execution body should at least include a processor loaded with a core processor. The manner of implementing the deep neural network model compression method provided by the embodiment of the application may be at least one manner of software, hardware circuits and logic circuits arranged in an execution body.

As shown in fig. 1, the method for compressing a deep neural network model provided in the embodiment of the application may include the following steps:

s101, acquiring the current calculation state of a network layer in the deep neural network model to be compressed.

The network layers in the deep neural network model to be compressed can comprise a convolution Conv layer, an Inner Product layer and the like, and each network layer comprises a parameter weight tensor for carrying out network operation.

The current calculation state of the network layer is that the network layer performs network operation currently, and the generated information related to the calculation amount can be represented in a vector form, for example, the information including the current calculation amount of the network layer, the compressed calculation amount, the calculation amount needing to be compressed and the like can be included.

Optionally, the current computing state of the network layer may specifically include: the current computation amount of the network layer, the compressed computation amount and the computation amount to be compressed.

S101 may specifically be:

The preset target calculation amount of the deep neural network model to be compressed is the calculation amount required to be achieved after the deep neural network model is compressed, and is related to the capability of a hardware platform running the deep neural network model, namely the preset target calculation amount can be smaller than or equal to the maximum calculation amount which can be born by the hardware platform. Based on the preset target calculated amount, the maximum calculated amount which can be distributed to each network layer can be determined according to the number of network layers, the scale of the network layers and the like of the deep neural network to be compressed.

The current calculated amount of the network layer can be determined according to the total calculated amount of the deep neural network model to be compressed and the current structure (such as the number of channels, convolution kernel size, weight tensor size, etc.) of each network layer, wherein the difference between the total calculated amount of the deep neural network model to be compressed and the preset target calculated amount is the calculated amount of the deep neural network model to be compressed, for example, the total calculated amount of the deep neural network model to be compressed is 130GB, the preset target calculated amount is 100GB, and the calculated amount of the deep neural network model to be compressed is 30GB. According to the current structure of each network layer, the calculation amount of each network layer to be compressed can be determined.

The compressed computation amount of the network layer is the computation amount of the network layer which has completed compression, and the computation amount to be compressed is the computation amount of the network layer which also needs compression, for example, the computation amount of the ith network layer which needs compression of 5GB in total, and the computation amount of the network layer which has been compressed by 2GB is 3GB.

S102, obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state.

The pre-trained calculation model can be understood as a controller which records the corresponding relation between the calculation state and the compression amount, the controller can be understood as a hardware calculation module or a software calculation unit, the calculation model can be a traditional CNN, RNN and other neural network model, and the calculation model can be obtained by training according to the corresponding relation between the calculation state sample and the compression amount in advance, so that the accuracy in calculating the compression amount of a network layer is higher. The compression amount is data to be compressed when the network layer is compressed, such as the size of each small matrix after matrix decomposition, the number of channels to be cut, and the like. The training mode of the network model for calculating the compression is similar to that of the traditional CNN, RNN and other neural network models, and is not repeated here.

And S103, compressing the network layer based on the compression amount.

The compression amount gives the data to be compressed when the network layer is compressed, and based on the compression amount, the network layer can be compressed, for example, the ith network layer contains 256 filters, and by the steps of S101 and S102, 56 filters can be obtained, and then 56 filters need to be cut out from the 256 filters.

The compression method is not particularly limited in this embodiment, and the structured compression is performed in the manner of clipping filters as described above, and 56 filters may be clipped randomly, or 56 filters may be clipped in order of the sum of the absolute values of the weights from small to large. Any reasonable compression method can be adopted in the compression mode, and details are not repeated here.

Optionally, after S103, the deep neural network model compression method provided in the embodiment of the present application may further include the following steps:

and for the next network layer, returning to S101 to S103 until all network layers in the deep neural network model to be compressed are compressed.

The compression of the deep neural network model may be performed on one or some of the network layers, and, of course, in order to ensure that the computation amount of the deep neural network model is reduced to the greatest extent, all the network layers may be compressed, and steps S101 to S103 may be performed for each network layer.

S104, determining a depth neural network model after network layer compression.

After the network layers are compressed, the deep neural network model is the compressed deep neural network model, and as described above, the deep neural network model may be the deep neural network model after all the network layers are compressed, or may be the deep neural network model after one network layer or a part of the network layers are compressed.

Optionally, after S104, the deep neural network model compression method provided in the embodiment of the present application may further include the following steps:

acquiring a sample set;

according to the sample set and a preset iteration period, network parameters of the depth neural network model after network layer compression are adjusted to obtain model accuracy;

and updating model parameters of a pre-trained calculation model according to the model precision, and returning to execute S101 to S103 until a first target depth neural network model with the current calculated amount reaching a preset target calculated amount and the model precision being larger than a preset threshold value is obtained.

Because the depth neural network model is compressed, a certain difference exists between the actual output of the depth neural network and the output of the initial depth neural network, and the precision is reduced, in order to enable the precision of the compressed depth neural network model to be closer to the initial precision of the initial depth neural network model, network parameters of the depth neural network model need to be adjusted, and if the precision of the compressed depth neural network model is required to reach the initial precision of the initial depth neural network model, the period of the adjustment iteration is very long, so that the timeliness of the adjustment iteration is accelerated, and the period of the adjustment iteration can be set smaller, namely, a short-time adjustment iteration process is carried out.

After one adjustment iteration, the model precision of the deep neural network model can be improved to a certain extent, the improved model precision can be obtained, based on the model precision, the model parameters of the calculation model for calculating the compression amount can be updated, so that the compression amount for compressing the network layer can be adjusted, the purpose of further improving the model precision is achieved, and thus the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being larger than the preset threshold value can be obtained.

Optionally, after the step of obtaining the first target deep neural network model with the current calculated amount reaching the preset target calculated amount and the model precision being greater than the preset threshold, the method for compressing the deep neural network model provided in the embodiment of the application may further include the following steps:

After the first target deep neural network model is obtained, although the model precision of the first target deep neural network model is greatly improved, a certain gap exists between the model precision of the first target deep neural network model and the initial precision of the initial deep neural network model, so that in order to ensure that the model precision of the deep neural network model can reach the initial precision of the initial deep neural network model, the iterative adjustment is continuously carried out on the sample set by the first target deep neural network model until the model converges and the model precision reaches the initial precision. This process of adjusting the iteration may require a longer iteration period to pass, and thus may be a long-term adjustment iteration process.

By applying the embodiment, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through a pre-trained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by utilizing a pre-trained calculation model according to the current calculation state, and the calculation model used in the calculation of the compression amount is obtained by pre-training and has a self-learning function, so that the rationality of the compression amount according to which the network layer compresses can be ensured by utilizing the pre-trained calculation model, and the output performance of the deep neural network model is ensured.

For easy understanding, the method for compressing the deep neural network model provided in the embodiment of the present application is described in detail below with reference to a specific example, and as shown in fig. 2, specific steps may include:

step one: given a deep neural network model Net to be compressed and calculated amount Flots required to be achieved after model compression _req 。

Step two: computing network i-th layer current computing state s _i The state is expressed in a vector form, and may include information such as the calculated amount of the current layer, the calculated amount of the compressed network, the calculated amount of the network to be compressed, and the like.

Step three: the current calculation state s _i Input controller R (controller R contains model parameters θ _R ) The controller R is based on the current calculation state s _i Give the compression amount a of the current i layer _i Based on the compression amount a _i And carrying out structural compression on the ith layer. The structured compression method can adopt any reasonable structured compression method at present, and the controller R is a pre-trained calculation model.

Step four: and performing the second and third operations on the i+1 layer, and reciprocating the steps until all layers in the deep neural network model are traversed once.

Step five: on the sample set, performing short-time fine tuning on the compressed deep neural network model, and feeding back the model precision to the controller R, wherein the controller R updates the model parameter theta by using the feedback signal _R Prepare for the next iteration.

Step six: and (3) jumping to the first step, and repeating the next iteration until the deep neural network model which meets the calculated amount of the model and has the highest precision is obtained.

Step seven: and finally, performing long-time fine tuning on the sample set by the deep neural network model obtained in the step six until the model converges, and recovering the model precision.

In the scheme, the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and continuous iterative evolution is carried out, so that inaccuracy and workload of manually setting the compression amount are avoided, and the model precision can be improved to the greatest extent on the premise of meeting the calculation amount.

Corresponding to the above method embodiment, the embodiment of the present application provides a deep neural network model compression device, as shown in fig. 3, where the deep neural network model compression device may include:

an obtaining module 310, configured to obtain a current calculation state of a network layer in the deep neural network model to be compressed;

the compression amount calculation module 320 is configured to obtain, according to the current calculation state, a compression amount of the network layer through a pre-trained calculation model;

a compression module 330, configured to compress the network layer based on the compression amount; and determining a depth neural network model after network layer compression.

Optionally, the current computing state of the network layer may include: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

the obtaining module 310 may specifically be configured to:

Optionally, the obtaining module 310 may be further configured to:

Optionally, the apparatus may further include: short-time fine tuning module for:

acquiring a sample set;

Optionally, the apparatus may further include: a long-time fine tuning module for:

To ensure output performance of the deep neural network model, embodiments of the present application also provide an electronic device, as shown in fig. 4, comprising a processor 401 and a machine-readable storage medium 402, wherein,

a machine-readable storage medium 402 for storing machine-executable instructions that are executable by the processor 401;

the processor 401 is configured to be caused to perform all the steps of the deep neural network model compression method provided in the embodiment of the present application by machine executable instructions stored on the machine readable storage medium 402.

The machine-readable storage medium 402 and the processor 401 may be in data transmission by way of a wired connection or a wireless connection, and the electronic device may communicate with other devices through a wired communication interface or a wireless communication interface.

The machine-readable storage medium may include RAM (Random Access Memory ) or NVM (Non-volatile Memory), such as at least one magnetic disk Memory. In the alternative, the machine-readable storage medium may also be at least one memory device located remotely from the foregoing processor.

The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processor ), ASIC (Application Specific Integrated Circuit, application specific integrated circuit), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In this embodiment, the processor of the electronic device is capable of implementing by reading the machine executable instructions stored in the machine readable storage medium and by executing the machine executable instructions: the compression amount of each network layer in the deep neural network model is adjusted through an algorithm and continuous iterative evolution is carried out, so that inaccuracy and workload of manually setting the compression amount are avoided, the output performance of the deep neural network model is ensured, and the model precision can be improved to the greatest extent on the premise of meeting the calculation amount.

In addition, corresponding to the deep neural network model compression method provided in the above embodiments, the present application provides a machine-readable storage medium for machine-executable instructions that cause a processor to perform all the steps of the deep neural network model compression method provided in the present application.

In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the deep neural network model compression method provided in the embodiment of the present application at runtime, so that it can implement: the compression amount of each network layer in the deep neural network model is adjusted through an algorithm and continuous iterative evolution is carried out, so that inaccuracy and workload of manually setting the compression amount are avoided, the output performance of the deep neural network model is ensured, and the model precision can be improved to the greatest extent on the premise of meeting the calculation amount.

For the electronic device and the machine-readable storage medium embodiments, the description is relatively simple, and reference should be made to part of the description of the method embodiments for the relevant matters, since the method content involved is basically similar to the method embodiments described above.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, and machine-readable storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for compressing a deep neural network model, applied to an electronic device, the method comprising:

compressing the network layer based on the compression amount;

determining a depth neural network model compressed by a network layer, wherein the compressed depth neural network model operates on a hardware platform; the compressed deep neural network model is used for target detection and segmentation, behavior detection and recognition, or voice recognition;

the pre-trained calculation model is a controller, and the controller comprises model parameters;

the step of obtaining the compression amount of the network layer through a pre-trained calculation model according to the current calculation state comprises the following steps: inputting the current calculation state into the controller, and giving the compression amount of the network layer by the controller according to the current calculation state;

the step of compressing the network layer based on the compression amount includes: the controller performs structured compression on the network layer based on the compression amount;

all network layer compression is completed in the deep neural network model, and the method further comprises the following steps: short-time fine tuning is carried out on the compressed deep neural network model, the model precision is fed back to the controller, and the controller updates the model parameters by the feedback signal;

the current computing state of the network layer includes: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;

acquiring preset target calculated amount of a depth neural network model to be compressed, current calculated amount of a network layer in the depth neural network model to be compressed and compressed calculated amount, wherein the preset target calculated amount is the maximum calculated amount which can be born by a hardware platform for running the depth neural network model, and the hardware platform comprises at least one of CPU, NP, DSP, ASIC, FPGA, other programmable logic devices, discrete gates or transistor logic devices or discrete hardware components;

2. The method of claim 1, wherein after said compressing the network layer based on the compression amount, the method further comprises:

3. The method of claim 1, wherein after said determining the network layer compressed deep neural network model, the method further comprises:

acquiring a sample set;

4. A method according to claim 3, wherein after the obtaining the first target depth neural network model in which the current calculation amount reaches the preset target calculation amount and the model accuracy is greater than the preset threshold value, the method further comprises:

5. A deep neural network model compression apparatus, characterized in that it is applied to an electronic device, the apparatus comprising:

the compression module is used for compressing the network layer based on the compression amount; determining a depth neural network model compressed by a network layer, wherein the compressed depth neural network model operates on a hardware platform; the compressed deep neural network model is used for target detection and segmentation, behavior detection and recognition, or voice recognition;

the compression amount calculation module is specifically configured to: inputting the current calculation state into the controller, and giving the compression amount of the network layer by the controller according to the current calculation state;

the compression module is specifically configured to: the controller performs structured compression on the network layer based on the compression amount;

the device also comprises a short-time fine tuning module, which is used for carrying out short-time fine tuning on the compressed deep neural network model after all network layers in the deep neural network model are compressed, and feeding back the model precision to the controller, and the controller updates the model parameters by using the feedback signal;

the acquisition module is specifically configured to:

6. The apparatus of claim 5, wherein the acquisition module is further configured to:

7. The apparatus of claim 5, wherein the apparatus further comprises: short-time fine tuning module for:

acquiring a sample set;

8. The apparatus of claim 7, wherein the apparatus further comprises: a long-time fine tuning module for: