CN110705708A  Compression method and device of convolutional neural network model and computer storage medium  Google Patents
Compression method and device of convolutional neural network model and computer storage medium Download PDFInfo
 Publication number
 CN110705708A CN110705708A CN201910959688.9A CN201910959688A CN110705708A CN 110705708 A CN110705708 A CN 110705708A CN 201910959688 A CN201910959688 A CN 201910959688A CN 110705708 A CN110705708 A CN 110705708A
 Authority
 CN
 China
 Prior art keywords
 pruning
 convolutional
 neural network
 network model
 layer
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
 238000003062 neural network model Methods 0.000 title claims abstract description 69
 238000007906 compression Methods 0.000 title claims abstract description 28
 230000035945 sensitivity Effects 0.000 claims description 8
 238000001514 detection method Methods 0.000 claims description 5
 230000000694 effects Effects 0.000 abstract description 7
 230000001537 neural Effects 0.000 description 9
 238000000354 decomposition reaction Methods 0.000 description 7
 238000004364 calculation method Methods 0.000 description 3
 238000005516 engineering process Methods 0.000 description 3
 240000007072 Prunus domestica Species 0.000 description 1
 230000003044 adaptive Effects 0.000 description 1
 238000004458 analytical method Methods 0.000 description 1
 238000000605 extraction Methods 0.000 description 1
 239000011159 matrix material Substances 0.000 description 1
 238000000034 method Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 238000005457 optimization Methods 0.000 description 1
 238000011002 quantification Methods 0.000 description 1
 230000000717 retained Effects 0.000 description 1
 238000000859 sublimation Methods 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Computing arrangements based on biological models using neural network models
 G06N3/08—Learning methods
 G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning

 G06N3/045—
Abstract
The invention discloses a compression method and a device of a convolutional neural network model and a computer storage medium, wherein the compression method comprises the following steps: respectively pruning each convolutional layer of the first convolutional neural network model, and determining the pruning step length of each convolutional layer; and then, according to the pruning step length of each convolutional layer, pruning the first convolutional neural network model layer by layer to obtain a compressed second convolutional neural network model. The invention ensures the compression effect of the model, reduces the compression time, improves the compression efficiency and has obvious practical value.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for compressing a convolutional neural network model and a computer storage medium.
Background
With the development of deep learning techniques, a number of machine vision algorithms began to use convolutional neural networks. Because of the strong feature extraction capability of the convolutional layer, the neural network uses the convolutional layer in a large amount, however, the convolutional layer usually contains a large amount of parameters and calculation, which results in an increasingly large model scale. Therefore, in the prior art, the convolutional neural network is usually operated on a desktop or a server. The convolutional neural network is difficult to deploy on an embedded system with limited resources due to the huge model and calculation amount of the convolutional neural network. At the same time, the need to deploy machine vision algorithms on embedded systems is becoming more stringent.
In recent years, model compression technology has appeared to facilitate deployment of machine vision algorithms to embedded ends. The goal of the model compression technology is to reduce the parameters and the calculation amount of the model, and the common model compression technology comprises three categories of pruning, quantization and weight decomposition. Pruning refers to pruning unimportant weights by a certain discrimination method to achieve the purpose of compression. Quantization is to convert the weight value into a data type with a smaller occupied bit number, and the converted bit number is usually 2 bits, 3 bits, 8 bits, and so on. The weight decomposition is to decompose the weight matrix to realize compression, and the common decomposition methods include CP decomposition, SVD decomposition, and the like.
However, these compression methods have respective problems: the pruning method does not consider the difference between different layers, and usually the layer with the largest computation amount or the largest parameter amount is selected for pruning, so that the pruning efficiency is low. The quantification method has large effect limitation, and after the network or the data set is replaced, the model expression fluctuation is large, and the universality is poor. The weight decomposition method is difficult to deploy other compression methods on the basis of the weight decomposition method due to the fact that the weight is decomposed, and expansibility is poor.
It is noted that the information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The technical problem to be solved by the present invention is to overcome the defects of the prior art, and a first object of the present invention is to provide a compression method of a convolutional neural network model, which greatly improves pruning efficiency and pruning effect by a dynamic pruning method with adaptive step length; it is a second object of the present invention to provide a convolutional neural network model compression apparatus and a third object of the present invention to provide a computer storage medium.
In order to achieve the first object, the invention provides a method for compressing a convolutional neural network model, which comprises the following steps:
s100: respectively pruning each convolutional layer of the first convolutional neural network model, and determining the pruning step length of each convolutional layer;
s200: and according to the pruning step length of each convolutional layer, pruning the first convolutional neural network model layer by layer to obtain a compressed second convolutional neural network model.
Optionally, the method for determining a pruning step for each convolutional layer in step S100 includes setting an initial pruning step and a pruning order of the convolutional cores, and then performing the following steps:
s110, carrying out pruning attempt on the convolutional layer to obtain a pruned intermediate convolutional neural network model;
s120: finetuning the intermediate convolutional neural network model, and operating the finetuned intermediate convolutional neural network model on a verification set to obtain an mAP after attempted pruning;
s130: if the mAP after the attempted pruning is lower than a first preset mAP threshold value, reducing the initial pruning step length to a first pruning step length; if the mAP after the attempted pruning is higher than the first preset mAP threshold value, increasing the initial pruning step length to a second pruning step length; otherwise, determining the initial pruning step length as the pruning step length of the convolutional layer;
s140: if the pruning change step length meets the preset change step length threshold, executing step S110 to continue pruning the convolutional layer; otherwise, the initial pruning step length is used as the pruning step length of the convolutional layer.
Optionally, the method for setting the initial pruning step size includes:
setting an initial pruning proportion;
and obtaining the initial pruning step length of the convolutional layer according to the initial pruning proportion and the number of the convolutional kernels.
Optionally, the method for setting the pruning order of the convolution kernels comprises:
calculating the sum of absolute values of weights contained in each convolution kernel;
and setting the sequence of the sum of the absolute values of the weights contained in the convolution kernel from small to large as the pruning sequence of the convolution kernel.
Optionally, before step S200, the method further includes sorting each of the convolutional layers, and taking the order of sensitivity of the convolutional layers from high to low as a pruning order of the convolutional layers.
Optionally, the pruning the first convolutional neural network model layer by layer in step S200 includes, according to the pruning sequence of the convolutional layers, pruning the convolutional layers one by one according to the pruning step length of each convolutional layer, and judging whether pruning is completed one by one, and pruning the next convolutional layer after the previous convolutional layer is pruned until all the convolutional layers are pruned, so as to obtain the compressed second convolutional neural network model.
Optionally, the method for determining whether pruning of the convolutional layer is completed in step S200 includes the following steps,
step S210: finetuning the first convolutional neural network model after pruning, and operating the first convolutional neural network model after fine tuning on a verification set to obtain the mAP after pruning;
step S220: if any kind of AP is 0, completing pruning of the convolutional layer; otherwise, executing step S230;
step S230: judging whether the mAP after pruning is lower than a second preset mAP threshold value or not, and if so, completing pruning of the convolutional layer; otherwise, go to step S240;
step S240: if the pruning step length of the convolutional layer is smaller than the number of the residual convolutional cores, continuously pruning the convolutional layer according to the pruning step length of the convolutional layer; and if not, correcting the pruning step length of the convolutional layer to a third pruning step length, and continuing pruning the convolutional layer.
Optionally, before step S100, training to obtain the first convolutional neural network model based on a target detection algorithm.
For the second purpose of the present invention, the present invention also provides a compressing apparatus of a convolutional neural network model, comprising a processor adapted to implement instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to: the steps of implementing a method of compression of a convolutional neural network model as claimed in any preceding claim.
To achieve the third object of the present invention, the present invention also provides a computer storage medium having computer executable instructions stored thereon, which when executed implement the method of compressing a convolutional neural network model as described in any one of the above.
Compared with the prior art, the compression method of the convolutional neural network model provided by the invention has the following beneficial effects:
(1) the insensitive layer is pruned to a large extent, and the pruning amplitude of the sensitive layer is smaller, so that the model expression after pruning can be effectively improved.
(2) The step length is dynamically determined, which is beneficial to improving the pruning speed.
(3) Compared with the traditional pruning method, the method can dynamically determine the pruning step length according to the actual conditions of each layer, and greatly improves the pruning efficiency and the pruning effect.
The compression device of the convolutional neural network model and the computer storage medium have the same inventive concept, so the same beneficial effects are achieved.
Drawings
FIG. 1 is a flowchart illustrating an overall method for compressing a convolutional neural network model according to an embodiment of the present invention;
FIG. 2 is a flow chart of the compression method of the convolutional neural network model for determining the pruning step according to the sensitivity level according to the embodiment of the present invention;
fig. 3 is a flowchart of layerbylayer pruning of a compression method of a convolutional neural network model according to an embodiment of the present invention.
Detailed Description
To make the objects, advantages and features of the present invention more apparent, the present invention is further described by the following specific examples in conjunction with the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. Based on the embodiments of the present invention, those skilled in the art should obtain all other embodiments without any creative effort, and all other embodiments shall fall within the protection scope of the present invention.
It should be noted that the terms "first" and "second" in the description and claims of the present invention and the drawings described above are used for descriptive purposes only and are not necessarily used for describing a particular order or sequence, but are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
The core idea of the invention is to provide a compression method of a convolutional neural network model, dynamically determining the pruning step length of each layer according to the sensitivity of each layer, pruning convolutional layers layer by layer according to the sensitivity and the determined pruning step length, and pruning the next convolutional layer after completing pruning of one convolutional layer. The insensitive layer is subjected to largeamplitude pruning, and the sensitive layer is subjected to small pruning amplitude; for the convolution kernels of the same convolutional layer, pruning is performed from small to large according to importance.
Specifically, referring to fig. 1, the compression method of the convolutional neural network model provided in this embodiment includes the following steps:
s100: respectively pruning each convolutional layer of the first convolutional neural network model, and determining the pruning step length of each convolutional layer;
s200: and according to the pruning step length of each convolutional layer, pruning the first convolutional neural network model layer by layer to obtain a compressed second convolutional neural network model.
Obviously, in other embodiments, before performing step S100, the first convolutional neural network model may be obtained based on target detection algorithm training. In particular, the present invention does not limit the target detection algorithm for obtaining the first convolutional neural network model, nor the application scenario of the first convolutional neural network model, i.e. the abovementioned compression method of the convolutional neural network model can be applied to any field using the convolutional neural network model, but not limited thereto. For example, fields including, but not limited to, image recognition, speech recognition, and document analysis.
Specifically, the method for determining the pruning step size of each convolutional layer in step S100 further includes setting an initial pruning step size and the pruning order of the convolutional cores, and then performing the steps shown in fig. 2. For the convenience of understanding, the following first describes the setting of the initial pruning step size and the pruning sequence of the convolution kernel, and then, with reference to fig. 2, the method for determining the pruning step size is described in detail.
In one embodiment, the method for setting the initial pruning step length comprises the following steps:
the first step is as follows: setting an initial pruning proportion, specifically, the initial pruning proportion is 3% to 10% of the number of convolution kernels of the convolutional layer where the initial pruning proportion is, in this embodiment, 5% of the number of convolution kernels of the convolutional layer where the initial pruning proportion is.
The second step is that: and obtaining the initial pruning step length of the convolutional layer according to the initial pruning proportion and the number of the convolutional kernels.
As a preferred embodiment, the method for setting the pruning order of the convolution kernels comprises the following steps:
first, the sum of the absolute values of the weights included in each convolution kernel is calculated.
Then, the order of the sum of absolute values of the weights included in the convolution kernel from small to large is set as the pruning order of the convolution kernel. Namely: and sequencing the importance of each convolution kernel according to the sum of the absolute values of the weights contained in each convolution kernel, wherein the greater the sum of the absolute values of the weights contained in the convolution kernels is, the more important the convolution kernels are. The lower the importance of the convolution kernel, the greater the likelihood of pruning.
With reference to fig. 2, the method for determining the pruning step length is described as follows, and specifically includes the following steps:
and S110, carrying out pruning attempt on the convolutional layer to obtain a pruned intermediate convolutional neural network model.
S120: finetuning (finetune) the intermediate convolutional neural network model, and operating the finetuned intermediate convolutional neural network model on a verification set to obtain the mAP after trial pruning. Compared with the training from the beginning, the fine adjustment can save a large amount of computing resources and computing time, improve computing efficiency, even improve accuracy, and effectively avoid risks of model nonconvergence, parameter optimization insufficiency, low accuracy, low model sublimation capability and the like. mAP: mean Average Precision, Average Precision, is first to find the Average Precision ap (Average Precision) in one category, then to find the Average Precision of all categories (mean Average Precision), the maps are used to judge the accuracy of identification in target detection, i.e. to measure the probability of the detected article.
S130: and correcting the initial pruning step length according to the mAP after the pruning attempt. If the mAP after the attempted pruning is lower than a first preset mAP threshold value, reducing the initial pruning step length to a first pruning step length; if the mAP after the attempted pruning is higher than the first preset mAP threshold value, increasing the initial pruning step length to a second pruning step length; otherwise, determining the initial pruning step length as the pruning step length of the convolutional layer. According to actual needs, the first preset mAP threshold may be set to 70% to 90% of the initial mAP, in this embodiment, the first preset mAP threshold is 80% of the initial mAP, and the initial mAP is the mAP of the first convolutional neural network model before pruning; the first pruning step length is 30% to 70% of the initial pruning step length; the second pruning step is 3% to 10% of the initial pruning step plus the number of convolution kernels.
Specifically, in this embodiment, when the maps after the attempted pruning is lower than 80% of the initial maps, the pruning step size is halved, that is, the first pruning step size is 50% of the initial pruning step size; when the mAP after the attempted pruning is higher than 80% of an initial mAP, increasing the initial pruning step size by 5% of the number of convolution kernels of the convolutional layer.
S140: if the pruning change step length meets the preset change step length threshold, executing step S110 to continue pruning the convolutional layer; otherwise, the initial pruning step length is used as the pruning step length of the convolutional layer. In this embodiment, when the pruning change step length is greater than 1 and less than 30% of the number of convolution kernels in the convolution layer, the operation of pruning the convolution layer is continued. It can be seen that the pruning step size of each of the convolutional layers of the convolutional neural network model is dynamically determined. The pruning proportion and the mAP after pruning can be well considered.
Thus, the pruning step size in each of the convolutional layers and the pruning order of the convolutional kernels in the convolutional layers have been dynamically determined. In one embodiment, step S200 is executed next to prune the first convolutional neural network model layer by layer according to the pruning step of each convolutional layer, so as to obtain a compressed second convolutional neural network model.
In another embodiment of the present invention, as shown in fig. 3, before step S200, the method further includes sorting each of the convolutional layers, and using the order of sensitivity of the convolutional layers from high to low as the pruning order of the convolutional layers. Namely: and arranging each convolutional layer in the first convolutional neural network in a descending order according to the sensitivity of the convolutional layer, wherein the convolutional layer with high sensitivity is arranged in front.
Specifically, the pruning the first convolutional neural network model layer by layer in the step S200 includes, according to the pruning sequence of the convolutional layers, pruning the convolutional layers one by one according to the pruning step length of each convolutional layer, and judging whether pruning is completed one by one, and pruning the next convolutional layer after the previous convolutional layer is pruned until all the convolutional layers are pruned, so as to obtain the compressed second convolutional neural network model.
In this embodiment, the method for determining whether pruning of the convolutional layer is completed in step S200 includes the following steps:
step S210: and finely adjusting the first convolutional neural network model after pruning, and operating the first convolutional neural network model after fine adjustment on a verification set to obtain the mAP after pruning.
Step S220: if any kind of AP is 0, completing pruning of the convolutional layer; otherwise, step S230 is executed. The purpose of this step is to ensure that either category can be retained.
Step S230: judging whether the mAP after pruning is lower than a second preset mAP threshold value or not, and if so, completing pruning of the convolutional layer; otherwise, step S240 is performed. Wherein the second predetermined mAP threshold is 70% to 90% of the initial mAP, and in this embodiment, the second predetermined mAP threshold is 80% of the initial mAP.
Step S240: if the pruning step length of the convolutional layer is smaller than the number of the residual convolutional cores, continuously pruning the convolutional layer according to the pruning step length of the convolutional layer; and if not, correcting the pruning step length of the convolutional layer to a third pruning step length, and continuing pruning the convolutional layer. Wherein the third pruning step is 10% to 50% of the number of convolution kernels remaining for the convolutional layer, and in this embodiment, the third pruning step is 25% of the number of convolution kernels remaining for the convolutional layer.
In particular, the above embodiments are merely introduced to separate the method for determining the pruning step size of the convolutional layer from the step of specifically performing pruning for convenience of description, and do not constitute any limitation to the present invention. It is obvious that those skilled in the art can implement the method for determining the pruning step size of each of the convolutional layers into the detailed operation of layerbylayer pruning in other embodiments without creative labor according to the disclosure of the present invention, and it is within the scope of the present invention.
In yet another embodiment of the present invention, there is provided a compressing apparatus for a convolutional neural network model, including a processor adapted to implement each instruction; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to: the steps of implementing the method for compressing a convolutional neural network model according to any one of the above embodiments have been described in detail, and are not described herein again.
In another embodiment of the present invention, there is further provided a computerreadable storage medium, where computerexecutable instructions are stored on the computerreadable storage medium, and when executed, the steps of the method for compressing a convolutional neural network model are implemented, and the specific steps are described in detail above, and are not repeated herein.
Through the description of the above embodiments, those skilled in the art can clearly understand that the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on this understanding, the portions of the present invention that contribute to the prior art can be embodied in the form of software products. The computer software product is stored on a computerreadable storage medium and includes instructions for causing an apparatus, including but not limited to a computer and/or a mobile phone, to perform the method according to the various embodiments of the present invention.
In summary, the present invention discloses a convolutional neural network compression method, device and computer readable storage medium, in which the convolutional neural network compression method uses the sum of absolute values of weights included in convolutional kernels to determine the importance of each convolutional kernel, and dynamically determines the pruning step length according to the actual situation of each convolutional layer. The problem that the number of parameters and the calculated amount of the conventional convolutional neural network are huge is solved. And the pruning speed is effectively improved while the model performance is ensured, and the possibility of deploying the convolutional neural network model on the embedded system with limited resources is provided.
The above embodiments are merely illustrative of the principles and effects of the present invention, and do not limit the scope of the present invention, which includes but is not limited to the configurations listed in the above embodiments, and one skilled in the art can take the above embodiments as a basis, and any changes and modifications made by one skilled in the art according to the above disclosure are within the scope of the claims.
Claims (10)
1. A compression method of a convolutional neural network model is characterized by comprising the following steps:
s100: respectively pruning each convolutional layer of the first convolutional neural network model, and determining the pruning step length of each convolutional layer;
s200: and according to the pruning step length of each convolutional layer, pruning the first convolutional neural network model layer by layer to obtain a compressed second convolutional neural network model.
2. The method of claim 1, wherein the step S100 of determining the pruning step for each convolutional layer comprises setting an initial pruning step and a pruning order of convolutional kernels, and then performing the following steps:
s110, carrying out pruning attempt on the convolutional layer to obtain a pruned intermediate convolutional neural network model;
s120: finetuning the intermediate convolutional neural network model, and operating the finetuned intermediate convolutional neural network model on a verification set to obtain an mAP after attempted pruning;
s130: if the mAP after the attempted pruning is lower than a first preset mAP threshold value, reducing the initial pruning step length to a first pruning step length; if the mAP after the attempted pruning is higher than the first preset mAP threshold value, increasing the initial pruning step length to a second pruning step length; otherwise, go to step S150;
s140: if the pruning change step length meets the preset change step length threshold, executing step S110 to continue pruning the convolutional layer; otherwise, go to step S150;
s150: and taking the initial pruning step length as the pruning step length of the convolutional layer.
3. The method of claim 2, wherein the step of setting the initial pruning step size comprises:
setting an initial pruning proportion;
and obtaining the initial pruning step length of the convolutional layer according to the initial pruning proportion and the number of the convolutional kernels.
4. The method of claim 2, wherein the step of setting the pruning order of the convolution kernels comprises:
calculating the sum of absolute values of weights contained in each convolution kernel;
and setting the sequence of the sum of the absolute values of the weights contained in the convolution kernel from small to large as the pruning sequence of the convolution kernel.
5. The method of claim 1, further comprising, before step S200, sorting each of the convolutional layers, and taking the order of the sensitivity of the convolutional layer from high to low as a pruning order of the convolutional layer.
6. The method according to claim 5, wherein the step S200 of pruning the first convolutional neural network model layer by layer comprises pruning the convolutional layers one by one according to the pruning sequence of the convolutional layers and according to the pruning step length of each convolutional layer and judging whether pruning is completed one by one, and pruning the next convolutional layer after the previous convolutional layer is pruned until all convolutional layers are pruned to obtain the compressed second convolutional neural network model.
7. The method of claim 6, wherein the step S200 of determining whether the convolutional layer has been pruned comprises the following steps,
step S210: finetuning the first convolutional neural network model after pruning, and operating the first convolutional neural network model after fine tuning on a verification set to obtain the mAP after pruning;
step S220: if any kind of AP is 0, completing pruning of the convolutional layer; otherwise, executing step S230;
step S230: judging whether the mAP after pruning is lower than a second preset mAP threshold value or not, and if so, completing pruning of the convolutional layer; otherwise, go to step S240;
step S240: if the pruning step length of the convolutional layer is smaller than the number of the residual convolutional cores, continuously pruning the convolutional layer according to the pruning step length of the convolutional layer; and if not, correcting the pruning step length of the convolutional layer to a third pruning step length, and continuing pruning the convolutional layer.
8. The method according to any one of claims 1 to 7, further comprising training the first convolutional neural network model based on a target detection algorithm before step S100.
9. An apparatus for compressing a convolutional neural network model, comprising a processor adapted to implement instructions; and a storage device adapted to store a plurality of instructions, the instructions adapted to be loaded and executed by the processor to: a step of implementing a compression method of the convolutional neural network model as claimed in any one of claims 1 to 8.
10. A computer storage medium having computerexecutable instructions stored thereon that, when executed, implement a method of compressing a convolutional neural network model as defined in any one of claims 18.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201910959688.9A CN110705708A (en)  20191010  20191010  Compression method and device of convolutional neural network model and computer storage medium 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201910959688.9A CN110705708A (en)  20191010  20191010  Compression method and device of convolutional neural network model and computer storage medium 
Publications (1)
Publication Number  Publication Date 

CN110705708A true CN110705708A (en)  20200117 
Family
ID=69200148
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201910959688.9A Pending CN110705708A (en)  20191010  20191010  Compression method and device of convolutional neural network model and computer storage medium 
Country Status (1)
Country  Link 

CN (1)  CN110705708A (en) 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN112100628A (en) *  20201116  20201218  支付宝(杭州)信息技术有限公司  Method and device for protecting safety of neural network model 
WO2021208151A1 (en) *  20200413  20211021  商汤集团有限公司  Model compression method, image processing method and device 

2019
 20191010 CN CN201910959688.9A patent/CN110705708A/en active Pending
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

WO2021208151A1 (en) *  20200413  20211021  商汤集团有限公司  Model compression method, image processing method and device 
CN112100628A (en) *  20201116  20201218  支付宝(杭州)信息技术有限公司  Method and device for protecting safety of neural network model 
Similar Documents
Publication  Publication Date  Title 

WO2019051941A1 (en)  Method, apparatus and device for identifying vehicle type, and computerreadable storage medium  
CN109002889B (en)  Adaptive iterative convolution neural network model compression method  
CN110705708A (en)  Compression method and device of convolutional neural network model and computer storage medium  
CN109145898A (en)  A kind of object detecting method based on convolutional neural networks and iterator mechanism  
US20020122593A1 (en)  Pattern recognition method and apparatus  
CN109583586B (en)  Convolution kernel processing method and device in voice recognition or image recognition  
CN111105017A (en)  Neural network quantization method and device and electronic equipment  
CN110807529A (en)  Training method, device, equipment and storage medium of machine learning model  
CN110647974A (en)  Network layer operation method and device in deep neural network  
CN112101543A (en)  Neural network model determination method and device, electronic equipment and readable storage medium  
CN110659735A (en)  Method, device and equipment for dynamically adjusting neural network channel  
CN114139678A (en)  Convolutional neural network quantization method and device, electronic equipment and storage medium  
CN114580281A (en)  Model quantization method, apparatus, device, storage medium, and program product  
US11544563B2 (en)  Data processing method and data processing device  
US20200349416A1 (en)  Determining computerexecuted ensemble model  
CN113642710A (en)  Network model quantification method, device, equipment and storage medium  
CN110674939A (en)  Deep neural network model compression method based on pruning threshold automatic search  
KR20210138382A (en)  Method and apparatus for multilevel stepwise quantization for neural network  
CN111598233A (en)  Compression method, device and equipment of deep learning model  
CN110706706A (en)  Voice recognition method, device, server and storage medium  
US11507782B2 (en)  Method, device, and program product for determining model compression rate  
CN113887709A (en)  Neural network adaptive quantization method, apparatus, device, medium, and product  
CN113255576B (en)  Face recognition method and device  
CN112613604A (en)  Neural network quantification method and device  
CN113988316A (en)  Method and device for training machine learning model 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
RJ01  Rejection of invention patent application after publication  
RJ01  Rejection of invention patent application after publication 
Application publication date: 20200117 