CN110874635A  Deep neural network model compression method and device  Google Patents
Deep neural network model compression method and device Download PDFInfo
 Publication number
 CN110874635A CN110874635A CN201811015359.0A CN201811015359A CN110874635A CN 110874635 A CN110874635 A CN 110874635A CN 201811015359 A CN201811015359 A CN 201811015359A CN 110874635 A CN110874635 A CN 110874635A
 Authority
 CN
 China
 Prior art keywords
 deep neural
 neural network
 model
 compressed
 amount
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
 238000003062 neural network model Methods 0.000 title claims abstract description 161
 238000007906 compression Methods 0.000 title claims abstract description 106
 238000004364 calculation method Methods 0.000 claims abstract description 121
 230000000875 corresponding Effects 0.000 description 7
 230000001537 neural Effects 0.000 description 7
 238000000034 method Methods 0.000 description 6
 238000001514 detection method Methods 0.000 description 5
 230000015654 memory Effects 0.000 description 4
 239000011159 matrix material Substances 0.000 description 3
 210000004556 Brain Anatomy 0.000 description 2
 230000006399 behavior Effects 0.000 description 2
 238000004891 communication Methods 0.000 description 2
 238000000354 decomposition reaction Methods 0.000 description 2
 238000010586 diagram Methods 0.000 description 2
 230000011218 segmentation Effects 0.000 description 2
 230000018109 developmental process Effects 0.000 description 1
 238000005516 engineering process Methods 0.000 description 1
 238000010801 machine learning Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 230000000306 recurrent Effects 0.000 description 1
 230000006403 shortterm memory Effects 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Computing arrangements based on biological models using neural network models
 G06N3/08—Learning methods
 G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Computing arrangements based on biological models using neural network models
 G06N3/04—Architectures, e.g. interconnection topology
 G06N3/0454—Architectures, e.g. interconnection topology using a combination of multiple neural nets
Abstract
The embodiment of the application provides a deep neural network model compression method and a device, wherein the deep neural network model compression method comprises the following steps: acquiring the current calculation state of a network layer in a deep neural network model to be compressed; obtaining the compression amount of a network layer through a pretrained calculation model according to the current calculation state; compressing the network layer based on the compression amount; and determining the deep neural network model after network layer compression. Through the scheme, the output performance of the deep neural network model can be ensured.
Description
Technical Field
The application relates to the technical field of deep learning, in particular to a deep neural network model compression method and device.
Background
DNN (Deep Neural Network) is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain. At present, DNNs such as CNN (Convolutional Neural Network), RNN (recurrent Neural Network), LSTM (Long Short Term Memory Network), and the like have been well applied in the aspects of target detection and segmentation, behavior detection and recognition, voice recognition, and the like.
With the increasingly complex actual scenes such as recognition, detection and the like, the requirements on the DNN function are continuously improved, the DNN network structure is also increasingly complex, the network layer number is continuously increased, and the computational complexity, the hard disk storage, the memory consumption and the like are greatly increased along with the increase of the network layer number. This requires that the hardware platform running DNN has large computation, high memory, high bandwidth, etc. However, hardware platform resources are usually limited, and how to reduce the cost of DNN on hardware platform resources has become an urgent problem to be solved in the development of deep learning technology.
In order to reduce the cost of DNN on hardware platform resources, a DNN model compression method is correspondingly provided, the compression amount of each network layer is manually set, and the compression amount is used for performing structured compression processing such as matrix decomposition, channel cutting and the like on each network layer, so that the calculated amount of each network layer can be reduced, and the purpose of reducing the cost of DNN on hardware platform resources is achieved. However, the setting of the compression amount is easily affected by human subjectivity, and the unreasonable setting of the compression amount directly affects the output performance of the DNN model.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for compressing a deep neural network model, so as to ensure output performance of the deep neural network model. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a deep neural network model compression method, where the method includes:
acquiring the current calculation state of a network layer in a deep neural network model to be compressed;
obtaining the compression amount of the network layer through a pretrained calculation model according to the current calculation state;
compressing the network layer based on the compression amount;
and determining the deep neural network model after network layer compression.
Optionally, the current computation state of the network layer includes: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;
the acquiring the current calculation state of the network layer in the deep neural network model to be compressed includes:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
Optionally, after compressing the network layer based on the compression amount, the method further includes:
and returning and executing the current calculation state of the network layer in the deep neural network model to be compressed aiming at the next network layer until all the network layers in the deep neural network model to be compressed are compressed.
Optionally, after determining the network layer compressed deep neural network model, the method further includes:
obtaining a sample set;
according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.
Optionally, after obtaining the first target deep neural network model with the current calculated amount reaching the preset target calculated amount and the model precision being greater than the preset threshold, the method further includes:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
In a second aspect, an embodiment of the present application provides a deep neural network model compression apparatus, where the apparatus includes:
the acquisition module is used for acquiring the current calculation state of a network layer in the deep neural network model to be compressed;
the compression amount calculation module is used for obtaining the compression amount of the network layer through a pretrained calculation model according to the current calculation state;
a compression module for compressing the network layer based on the compression amount; and determining the deep neural network model after network layer compression.
Optionally, the current computation state of the network layer includes: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;
the acquisition module is specifically configured to:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
Optionally, the obtaining module is further configured to:
and aiming at the next network layer, obtaining the current calculation state of the network layer in the deep neural network model to be compressed until all the network layers in the deep neural network model to be compressed are compressed.
Optionally, the apparatus further comprises: a shorttime fine tuning module for:
obtaining a sample set;
according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.
Optionally, the apparatus further comprises: a longterm fine tuning module for:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
According to the deep neural network model compression method and device provided by the embodiment of the application, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pretrained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by the aid of the pretrained calculation model according to the current calculation state and the current calculation state, and the calculation model used in compression amount calculation is pretrained and has a selflearning function.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating a deep neural network model compression method according to an embodiment of the present disclosure;
FIG. 2 is a schematic flowchart illustrating an example of a deep neural network model compression method according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a deep neural network model compression apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to ensure the output performance of the deep neural network model, the embodiment of the application provides a deep neural network model compression method, a deep neural network model compression device, an electronic device and a machinereadable storage medium.
Next, a method for compressing a deep neural network model provided in the embodiment of the present application is first described.
An execution main body of the deep neural network model compression method provided by the embodiment of the present application may be an electronic device for executing an intelligent algorithm, the electronic device may be an intelligent device having functions of target detection and segmentation, behavior detection and recognition, or voice recognition, for example, a remote computer, a remote server, an intelligent camera, an intelligent voice device, and the like, and the execution main body at least includes a processor loaded with a core processing chip. The method for implementing the deep neural network model compression method provided by the embodiment of the application can be at least one of software, hardware circuit and logic circuit arranged in an execution main body.
As shown in fig. 1, a deep neural network model compression method provided in an embodiment of the present application may include the following steps:
s101, obtaining the current calculation state of a network layer in the deep neural network model to be compressed.
The network layers in the deep neural network model to be compressed may include a convolution Conv layer, an inner product Innerproduct layer, and the like, and each network layer includes a parameter weight tensor for performing network operation.
The current computation state of the network layer is that the network layer performs network operation currently, and the generated information related to the computation amount can be represented in a vector form, for example, the information may include information such as the current computation amount of the network layer, the compressed computation amount, and the computation amount that needs to be compressed.
Optionally, the current computation state of the network layer may specifically include: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer.
S101 may specifically be:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
The preset target calculation amount of the deep neural network model to be compressed is the calculation amount which needs to be reached after the deep neural network model is compressed, and is related to the capability of a hardware platform for operating the deep neural network model, namely the preset target calculation amount can be less than or equal to the maximum calculation amount which can be borne by the hardware platform. Based on the preset target calculated amount, the maximum calculated amount which can be allocated to each network layer can be determined according to the number of network layers, the scale of the network layers and the like of the deep neural network to be compressed.
The current calculated amount of the network layer may be determined according to the total calculated amount of the deep neural network model to be compressed and the current structure of each network layer (for example, the number of channels of the network layer, the size of a convolution kernel, the size of a weight tensor, and the like), and a difference value between the total calculated amount of the deep neural network model to be compressed and a preset target calculated amount is a calculated amount of the deep neural network model to be compressed, which needs to be compressed, for example, if the total calculated amount of the deep neural network model to be compressed is 130GB, and the preset target calculated amount is 100GB, the calculated amount of the deep neural network model to be compressed needs to be 30 GB. The amount of computation required to be compressed by each network layer can be determined according to the current structure of each network layer.
The compressed calculation amount of the network layer is a calculation amount of which the compression has been completed in the network layer, and the calculation amount to be compressed is a calculation amount of which the network layer needs to be compressed, for example, the ith network layer needs to compress a calculation amount of 5GB in total, and the ith network layer needs to compress a calculation amount of 2GB, so that the calculation amount to be compressed of the ith network layer is 3 GB.
And S102, obtaining the compression amount of the network layer through a pretrained calculation model according to the current calculation state.
The pretrained calculation model can be understood as a controller recorded with the corresponding relation between the calculation state and the compression amount, the controller can be understood as a hardware calculation module, and can also be understood as a software calculation unit, the calculation model can be a traditional neural network model such as CNN, RNN and the like, the calculation model is obtained by training in advance according to the corresponding relation between the calculation state sample and the compression amount, and the calculation model has higher accuracy in calculating the compression amount of a network layer. The compression amount is data that needs to be compressed when the network layer is compressed, and includes, for example, the size of each small matrix after matrix decomposition, the number of channels that need to be clipped, and the like. The training mode of the network model for calculating the compression amount is similar to the training mode of the traditional neural network models such as CNN and RNN, and the details are not repeated here.
S103, compressing the network layer based on the compression amount.
The compression amount is given to data that needs to be compressed when the network layer is compressed, and the network layer can be compressed based on the compression amount, for example, the ith network layer includes 256 filters, and if the compression amount is 56 filters obtained through the steps of S101 and S102, 56 filters need to be cut out from the 256 filters.
The compression method is not specifically limited in this embodiment, and the structured compression is performed by using the clipping filters as described above, and 56 filters may be randomly clipped, or 56 filters may be clipped in the order from small to large according to the sum of absolute values of weights. The compression mode can adopt any reasonable compression method at present, and is not described herein again.
Optionally, after S103, the method for compressing a deep neural network model provided in the embodiment of the present application may further include the following steps:
and returning to execute S101 to S103 for the next network layer until all network layers in the deep neural network model to be compressed are compressed.
The compression of the deep neural network model may be performed on one or some of the network layers, and of course, in order to reduce the computation amount of the deep neural network model to the maximum extent, all the network layers may be compressed, and the steps from S101 to S103 are performed for each network layer.
And S104, determining the deep neural network model after network layer compression.
After the network layers are compressed, the deep neural network model is the compressed deep neural network model, and as described above, the deep neural network model may be the deep neural network model in which all the network layers are compressed, or may be the deep neural network model in which one or a part of the network layers are compressed.
Optionally, after S104, the method for compressing a deep neural network model provided in the embodiment of the present application may further include the following steps:
obtaining a sample set;
according to the sample set and a preset iteration period, adjusting network parameters of the deep neural network model after network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute S101 to S103 until obtaining a first target deep neural network model with the current calculation amount reaching a preset target calculation amount and the model precision being greater than a preset threshold value.
Because the deep neural network model is compressed, and a certain difference exists between the actual output of the deep neural network and the output of the initial deep neural network, and the precision is reduced, therefore, in order to enable the precision of the compressed deep neural network model to be closer to the initial precision of the initial deep neural network model, the network parameters of the deep neural network model need to be adjusted, and if the precision of the compressed deep neural network model is required to reach the initial precision of the initial deep neural network model, the period of the adjustment iteration is very long, therefore, in order to ensure the rapidity of the adjustment iteration, the adjustment iteration time efficiency is accelerated, and the period of the adjustment iteration can be set to be small, namely, the shorttime adjustment iteration process is carried out.
After one adjustment iteration, the model precision of the deep neural network model can be improved to a certain extent, the improved model precision can be obtained, based on the model precision, the model parameters of the calculation model for calculating the compression amount can be updated, so that the compression amount for network layer compression is adjusted, the purpose of further improving the model precision is achieved, and in this way, the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold value can be obtained.
Optionally, after the step of obtaining the first target deep neural network model of which the current calculated amount reaches the preset target calculated amount and the model precision is greater than the preset threshold, the method for compressing the deep neural network model provided in the embodiment of the present application may further include the following steps:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
After the first target deep neural network model is obtained, although the model accuracy of the first target deep neural network model is greatly improved, a certain difference exists between the model accuracy of the first target deep neural network model and the initial accuracy of the initial deep neural network model, therefore, in order to ensure that the model accuracy of the deep neural network model can reach the initial accuracy of the initial deep neural network model, iterative adjustment is continuously performed on a sample set by using the first target deep neural network model until the model converges, and the model accuracy reaches the initial accuracy. This iterative process of adjustment may require a long iteration period to pass, and thus may be a long iterative process of adjustment.
By applying the embodiment, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pretrained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by the aid of the pretrained calculation model according to the current calculation state and the current calculation state, and the calculation model used in compression amount calculation is pretrained and has a selflearning function.
For convenience of understanding, the following describes in detail a deep neural network model compression method provided in an embodiment of the present application with reference to a specific example, and as shown in fig. 2, specific steps may include:
the method comprises the following steps: given tobecompressed deep neural network model Net and calculated amount Flops required to be achieved after model compression_{req}。
Step two: computing the current computing state s of the ith layer of the network_{i}The state is represented in a vector form, and may include information such as the amount of computation of the current layer, the amount of computation that the network has compressed, the amount of computation that the network needs to compress, and the like.
Step three: calculating the current state s_{i}Input controller R (controller R contains model parameters theta)_{R}) The controller R calculates the state s according to the current state_{i}Giving the compression a of the current ilayer_{i}Based on the compression amount a_{i}And performing structural compression on the ith layer. Wherein the structure is compressedThe method can adopt any reasonable structured compression method at present, and the controller R is a pretrained calculation model.
Step four: and performing the operations of the second step and the third step on the i +1 layer, and repeating the steps until all layers in the deep neural network model are traversed.
Step five: on a sample set, carrying out shorttime fine adjustment on the compressed deep neural network model, feeding back the model precision to a controller R, and updating a model parameter theta by the controller R according to a feedback signal_{R}And preparing for the next iteration.
Step six: and jumping to the step one, and repeating the next iteration until the deep neural network model which meets the model calculation amount and has the highest precision is obtained.
Step seven: and finally, performing longterm fine adjustment on the sample set by using the deep neural network model obtained in the step six until the model converges, and recovering the model precision.
In the scheme, the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, so that the inaccuracy and workload of manual setting of the compression amount are avoided, and the model precision can be improved to the greatest extent on the premise of meeting the calculated amount.
Corresponding to the foregoing method embodiment, an embodiment of the present application provides a deep neural network model compression apparatus, and as shown in fig. 3, the deep neural network model compression apparatus may include:
an obtaining module 310, configured to obtain a current computation state of a network layer in a deep neural network model to be compressed;
a compression amount calculation module 320, configured to obtain, according to the current calculation state, a compression amount of the network layer through a pretrained calculation model;
a compression module 330, configured to compress the network layer based on the compression amount; and determining the deep neural network model after network layer compression.
Optionally, the current computation state of the network layer may include: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;
the obtaining module 310 may be specifically configured to:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
Optionally, the obtaining module 310 may be further configured to:
and aiming at the next network layer, obtaining the current calculation state of the network layer in the deep neural network model to be compressed until all the network layers in the deep neural network model to be compressed are compressed.
Optionally, the apparatus may further include: a shorttime fine tuning module for:
obtaining a sample set;
according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.
Optionally, the apparatus may further include: a longterm fine tuning module for:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
By applying the embodiment, the current calculation state of the network layer in the deep neural network model to be compressed is obtained, the compression amount of the network layer is obtained through the pretrained calculation model according to the current calculation state, the network layer is compressed based on the compression amount, and the deep neural network model after the network layer is compressed is determined. The compression amount corresponding to the current calculation state can be obtained by the aid of the pretrained calculation model according to the current calculation state and the current calculation state, and the calculation model used in compression amount calculation is pretrained and has a selflearning function.
In order to guarantee the output performance of the deep neural network model, the embodiment of the present application further provides an electronic device, as shown in fig. 4, including a processor 401 and a machinereadable storage medium 402, wherein,
a machinereadable storage medium 402 for storing machineexecutable instructions executable by the processor 401;
a processor 401 configured to be caused by machine executable instructions stored on a machine readable storage medium 402 to perform all the steps of the deep neural network model compression method provided by the embodiments of the present application.
The machinereadable storage medium 402 and the processor 401 may be in data communication by way of a wired or wireless connection, and the electronic device may communicate with other devices by way of a wired or wireless communication interface.
The machinereadable storage medium may include a RAM (Random Access Memory) and a NVM (Nonvolatile Memory), such as at least one disk Memory. Alternatively, the machinereadable storage medium may be at least one memory device located remotely from the processor.
The Processor may be a generalpurpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (FieldProgrammable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In this embodiment, the processor of the electronic device can realize that: the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, inaccuracy and workload of manual setting of the compression amount are avoided, the output performance of the deep neural network model is guaranteed, and the model precision can be improved to the maximum extent on the premise of meeting the calculated amount.
In addition, corresponding to the deep neural network model compression method provided in the foregoing embodiments, the present application provides a machinereadable storage medium for machineexecutable instructions, which cause a processor to execute all the steps of the deep neural network model compression method provided in the present application.
In this embodiment, the machinereadable storage medium stores machineexecutable instructions for executing the deep neural network model compression method provided in this embodiment when running, so that the following can be implemented: the compression amount of each network layer in the deep neural network model is adjusted through the algorithm and is continuously iterated and evolved, inaccuracy and workload of manual setting of the compression amount are avoided, the output performance of the deep neural network model is guaranteed, and the model precision can be improved to the maximum extent on the premise of meeting the calculated amount.
For the embodiments of the electronic device and the machinereadable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a nonexclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, and the machinereadable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.
Claims (10)
1. A method for deep neural network model compression, the method comprising:
acquiring the current calculation state of a network layer in a deep neural network model to be compressed;
obtaining the compression amount of the network layer through a pretrained calculation model according to the current calculation state;
compressing the network layer based on the compression amount;
and determining the deep neural network model after network layer compression.
2. The method of claim 1, wherein the current computational state of the network layer comprises: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;
the acquiring the current calculation state of the network layer in the deep neural network model to be compressed includes:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
3. The method of claim 1, wherein after said compressing the network layer based on the amount of compression, the method further comprises:
and returning and executing the current calculation state of the network layer in the deep neural network model to be compressed aiming at the next network layer until all the network layers in the deep neural network model to be compressed are compressed.
4. The method of claim 1, wherein after the determining the network layer compressed deep neural network model, the method further comprises:
obtaining a sample set;
according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.
5. The method according to claim 4, wherein after obtaining the first target deep neural network model with the current calculated amount reaching the preset target calculated amount and the model accuracy being greater than the preset threshold, the method further comprises:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
6. An apparatus for deep neural network model compression, the apparatus comprising:
the acquisition module is used for acquiring the current calculation state of a network layer in the deep neural network model to be compressed;
the compression amount calculation module is used for obtaining the compression amount of the network layer through a pretrained calculation model according to the current calculation state;
a compression module for compressing the network layer based on the compression amount; and determining the deep neural network model after network layer compression.
7. The apparatus of claim 6, wherein the current computation state of the network layer comprises: the current calculated amount, the compressed calculated amount and the calculated amount to be compressed of the network layer;
the acquisition module is specifically configured to:
acquiring a preset target calculation amount of a deep neural network model to be compressed, a current calculation amount and a compressed calculation amount of a network layer in the deep neural network model to be compressed;
and calculating the calculated amount to be compressed of the network layer according to the preset target calculated amount, the current calculated amount and the compressed calculated amount.
8. The apparatus of claim 6, wherein the obtaining module is further configured to:
and aiming at the next network layer, obtaining the current calculation state of the network layer in the deep neural network model to be compressed until all the network layers in the deep neural network model to be compressed are compressed.
9. The apparatus of claim 6, further comprising: a shorttime fine tuning module for:
obtaining a sample set;
according to the sample set and a preset iteration cycle, adjusting network parameters of the deep neural network model after the network layer compression to obtain model precision;
and updating the model parameters of the pretrained calculation model according to the model precision, and returning to execute the current calculation state of the network layer in the deep neural network model to be compressed until the first target deep neural network model with the current calculation amount reaching the preset target calculation amount and the model precision being greater than the preset threshold is obtained.
10. The apparatus of claim 9, further comprising: a longterm fine tuning module for:
and adjusting the network parameters of the first target deep neural network model according to the sample set until a second target deep neural network model with the model precision reaching the initial precision of the deep neural network model to be compressed is obtained.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201811015359.0A CN110874635A (en)  20180831  20180831  Deep neural network model compression method and device 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201811015359.0A CN110874635A (en)  20180831  20180831  Deep neural network model compression method and device 
Publications (1)
Publication Number  Publication Date 

CN110874635A true CN110874635A (en)  20200310 
Family
ID=69715937
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201811015359.0A Pending CN110874635A (en)  20180831  20180831  Deep neural network model compression method and device 
Country Status (1)
Country  Link 

CN (1)  CN110874635A (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN111814967A (en) *  20200911  20201023  鹏城实验室  Method, apparatus and storage medium for calculating inferential computation of neural network model 
CN113762510A (en) *  20210909  20211207  北京百度网讯科技有限公司  Data processing method and device for target model, electronic equipment and medium 

2018
 20180831 CN CN201811015359.0A patent/CN110874635A/en active Pending
Cited By (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN111814967A (en) *  20200911  20201023  鹏城实验室  Method, apparatus and storage medium for calculating inferential computation of neural network model 
CN111814967B (en) *  20200911  20210223  鹏城实验室  Method, apparatus and storage medium for calculating inferential computation of neural network model 
CN113762510A (en) *  20210909  20211207  北京百度网讯科技有限公司  Data processing method and device for target model, electronic equipment and medium 
Similar Documents
Publication  Publication Date  Title 

US10521729B2 (en)  Neural architecture search for convolutional neural networks  
JP6817431B2 (en)  Neural architecture search  
US10552737B2 (en)  Artificial neural network classbased pruning  
US20200311552A1 (en)  Device and method for compressing machine learning model  
CN111406267A (en)  Neural architecture search using performancepredictive neural networks  
CN110874635A (en)  Deep neural network model compression method and device  
CN111406264A (en)  Neural architecture search  
CN111382906A (en)  Power load prediction method, system, equipment and computer readable storage medium  
CN111144561A (en)  Neural network model determining method and device  
KR20200089588A (en)  Electronic device and method for controlling the electronic device thereof  
CN110874625A (en)  Deep neural network quantification method and device  
CN110766142A (en)  Model generation method and device  
CN110363297A (en)  Neural metwork training and image processing method, device, equipment and medium  
WO2019146189A1 (en)  Neural network rank optimization device and optimization method  
CN113806993A (en)  Wireless sensor structure design optimization method, device, equipment and medium  
CN109472357A (en)  Trimming and retraining method for convolutional neural networks  
Cao et al.  Lstm network based traffic flow prediction for cellular networks  
CN111178258A (en)  Image identification method, system, equipment and readable storage medium  
CN111079899A (en)  Neural network model compression method, system, device and medium  
CN111291883A (en)  Data processing method and data processing device  
CN108734265A (en)  Compression method and device, terminal, the storage medium of deep neural network model  
CN113254472B (en)  Parameter configuration method, device, equipment and readable storage medium  
EP3767548A1 (en)  Delivery of compressed neural networks  
CN112446461A (en)  Neural network model training method and device  
CN114072809A (en)  Small and fast video processing network via neural architectural search 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination 