CN113361697A - Convolution network model compression method, system and storage medium - Google Patents

Convolution network model compression method, system and storage medium Download PDF

Info

Publication number
CN113361697A
CN113361697A CN202110797243.2A CN202110797243A CN113361697A CN 113361697 A CN113361697 A CN 113361697A CN 202110797243 A CN202110797243 A CN 202110797243A CN 113361697 A CN113361697 A CN 113361697A
Authority
CN
China
Prior art keywords
network model
convolutional network
converged
layer
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110797243.2A
Other languages
Chinese (zh)
Inventor
李卫东
刘平涛
罗博文
张招
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Lexway Technology Development Co ltd
Shenzhen Siyue Innovation Co ltd
Original Assignee
Wuhan Lexway Technology Development Co ltd
Shenzhen Siyue Innovation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Lexway Technology Development Co ltd, Shenzhen Siyue Innovation Co ltd filed Critical Wuhan Lexway Technology Development Co ltd
Priority to CN202110797243.2A priority Critical patent/CN113361697A/en
Publication of CN113361697A publication Critical patent/CN113361697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a convolution network model compression method, a convolution network model compression system and a storage medium.

Description

Convolution network model compression method, system and storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a convolution network model compression method, a convolution network model compression system and a convolution network model compression storage medium.
Background
The convolutional network model generally faces three problems when applied in reality: (1) the excellent performance of the convolutional network model depends on millions of trainable parameters in the model, the parameters and network structure information need to be stored in a hard disk and then loaded into a memory during reasoning, for example, a model pre-trained on ImageNet needs more than 300M space, which is a great burden for embedded devices; (2) the impact of runtime on memory usage, during inference, the intermediate activation/response of the convolutional network model requires even more memory space than the memory model parameters, which is a large burden for many applications with low computational power, even though it is not a problem for high performance GPUs; (3) calculation amount: convolution on high resolution pictures can be computationally intensive to operate, and a large convolution network model can take several minutes to process a picture on an embedded device, making it impractical for real-world applications.
Disclosure of Invention
To this end, embodiments of the present invention provide a method, a system, and a storage medium for compressing a convolutional network model, so as to solve at least one of the problems in the foregoing background art.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a convolutional network model compression method, including:
acquiring image data, making a training data set, building a convolution network model by using a deep learning framework, and executing the following steps on the convolution network model:
performing channel sparse regularization training on the convolutional network model by using a training data set until the convolutional network model is converged;
pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer;
finely adjusting the pruned convolution network model, and judging whether the finely adjusted convolution network model is converged;
if the finely tuned convolution network model is converged, storing model parameters of the finely tuned convolution network model to obtain a compressed convolution network model; and if the finely tuned convolution network model is not converged, repeating the steps.
Further, acquiring image data and making a training data set, and building a convolution network model by using a deep learning framework comprises the following steps:
acquiring image data according to an application scene and making a training data set;
calculating the mean and standard deviation of the training data set;
carrying out normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;
configuring the channel number of a convolution network model according to the category number contained in the application scene;
the channel sparse regularization training of the convolutional network model by utilizing the training data set comprises the following steps:
and performing channel sparse regularization training on the convolutional network model by utilizing the preprocessed training data set.
Further, the channel sparse regularization training of the convolutional network model by using the preprocessed training data set comprises:
and inputting the preprocessed training data set into a convolution network model, and performing channel sparse regularization training on the convolution network model to obtain an output value of the convolution network model, updated weight parameters and a scaling factor of the BN layer.
Further, the number of scaling factors is the same as the number of BN layers.
Further, pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer includes:
when a convolution network model is built, a BN layer is inserted after a convolution layer of the convolution network model, and a scaling factor and a translation parameter of the BN layer are obtained by training the convolution network model;
and pruning the channel of the converged convolutional network model by using the scaling factor and the translation parameter of the BN layer.
Further, pruning the channels of the converged convolutional network model comprises:
sorting the absolute values of the scaling factors of the BN layer corresponding to each channel of the converged convolutional network model according to a sorting rule;
intercepting the scaling factors of the BN layers at the corresponding positions after sorting as the global threshold values of all layers of the converged convolutional network model;
judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than a global threshold value or not;
and if so, cutting off a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold value.
In a second aspect, an embodiment of the present invention provides a convolutional network model compression system, where the system includes: a processor and a memory;
the memory is used for storing one or more program instructions;
a processor for executing one or more program instructions to perform any of the method steps of the above method for compressing a convolutional network model.
In a third aspect, embodiments of the present invention provide a computer storage medium having one or more program instructions embodied therein for execution by a convolutional network model compression system to perform any one of the method steps of a convolutional network model compression method as described above.
The embodiment of the invention has the following advantages:
the convolution network model compression method provided by the embodiment of the invention firstly carries out channel sparse regularization training on a convolution network model to make the convolution network model converge, then adopts the scaling factor of a BN layer to prune the channel of the converged convolution network model, then carries out fine tuning on the pruned convolution network model and judges whether the converged convolution network model is converged, and if the converged convolution network model is obtained, the compressed convolution network model is obtained, so that the size of the convolution network model and the occupation of a memory when the convolution network model operates are effectively reduced, the operation number is reduced while the precision is not influenced, the model compression and reasoning acceleration can be realized by using the traditional hardware and a deep learning software package, and other special hardware accelerators are not needed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.
Fig. 1 is a schematic flow chart of a convolution network model compression method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a convolution network model compression method according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a method for pruning channels of a converged convolutional network model according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of a further method for pruning channels of a converged convolutional network model according to an embodiment of the present invention;
fig. 5 is a block diagram of a convolution network model compression system according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present embodiment provides a convolution network model compression method, including:
s101, collecting image data, making a training data set, building a convolution network model by using a deep learning framework, and executing steps S102 to S105 on the convolution network model:
s102, performing channel sparse regularization training on the convolutional network model by using a training data set until the convolutional network model is converged;
s103, pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer;
s104, fine tuning the pruned convolution network model, and judging whether the fine tuned convolution network model is converged;
s105, if the finely tuned convolutional network model is converged, storing model parameters of the finely tuned convolutional network model to obtain a compressed convolutional network model; if the fine-tuned convolutional network model does not converge, the above steps S102 to S105 are repeated.
Specifically, the deep learning framework may be a pytorch deep learning framework, and the convolutional network model built by the pytorch deep learning framework may be, for example, an image classification model, which is not specifically limited in this embodiment.
The convolutional network model compression method provided by this embodiment performs channel sparse regularization training on a convolutional network model to converge the convolutional network model, prunes channels of the converged convolutional network model by using a scaling factor of a BN layer, then finely tunes the pruned convolutional network model and determines whether the converged convolutional network model is converged, and obtains the compressed convolutional network model if the converged convolutional network model is converged, thereby effectively reducing the size of the convolutional network model and the occupation of memory during the operation of the convolutional network model, reducing the number of operations without affecting the precision, and realizing model compression and inference acceleration by using conventional hardware and a deep learning software package without using other special hardware accelerators.
Further, as shown in fig. 2, acquiring image data and making a training data set, and building a convolutional network model by using a deep learning framework includes:
s201, acquiring image data according to an application scene and making a training data set;
in this step, an application scenario is an actual life scenario in which the convolutional network model is specifically applied, and categories included in the application scenario are, for example, different categories of fruits such as apples and bananas in one scenario, which is not specifically limited in this embodiment. And each class in the application scenario corresponds to one channel of the convolutional network model.
S202, calculating a mean value and a standard deviation of a training data set;
s203, carrying out normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;
s204, configuring the channel number of the convolution network model according to the category number contained in the application scene;
the channel sparse regularization training of the convolutional network model by utilizing the training data set comprises the following steps:
and performing channel sparse regularization training on the convolutional network model by utilizing the preprocessed training data set.
Further, the channel sparse regularization training of the convolutional network model by using the preprocessed training data set comprises:
and inputting the preprocessed training data set into a convolution network model, and performing channel sparse regularization training on the convolution network model to obtain an output value of the convolution network model, updated weight parameters and a scaling factor of the BN layer.
In this embodiment, a scaling factor γ is introduced for each channel and then multiplied by the output of the channel. And then training the network weight and the scaling factors jointly, and finally directly pruning the channels with small scaling factors and finely tuning the pruned network. The formula adopted for channel sparse regularization training of the convolutional network model is as follows:
Figure BDA0003163194070000061
wherein x is an input value, y is an output value, W is a weight parameter, γ is a scaling factor of the BN layer, λ is a balance factor, and g (.) is a penalty term on the scaling factor γ of the BN layer. Selection of L1Regularization, i.e. g(s) | s |, where s is L of the reference vector w1And (4) norm. The sub-gradient descent method as an unsmooth (non-conductive) L1Optimization of penalty terms while using smoothed L1The regular term replaces the penalty term, and the sub-gradient descent method is avoided from being used at the unsmooth point as much as possible.
Further, the number of scaling factors is the same as the number of BN layers.
Specifically, in the formula (1), the number of the scaling factors γ is the same as the number of BN layers.
Further, as shown in fig. 3, pruning the channels of the converged convolutional network model by using the scaling factor of the BN layer includes:
s301, when a convolution network model is built, inserting a BN layer into the convolution layer of the convolution network model, and training the convolution network model to obtain a scaling factor and a translation parameter of the BN layer;
and S302, pruning the channel of the converged convolution network model by using the scaling factor and the translation parameter of the BN layer.
In this embodiment, the formula for pruning the channels of the converged convolutional network model is as follows:
Figure BDA0003163194070000071
Figure BDA0003163194070000072
wherein z isinputIs an input value, z, of the BN layeroutputIs the output value of BN layer, gamma is the scaling factor of BN layer, beta is the translation parameter, muBIs taken as the mean value of the average value,
Figure BDA0003163194070000073
ε is a constant that is not zero, is the variance.
In particular, the essence of pruning a channel is to prune all input and output connections associated with the channel, thereby directly obtaining a narrow network without borrowing any special sparse computation packages. The scaling factor of the BN layer plays a role in selecting channels, and because the regular term and the weight loss function of the scaling factor of the BN layer are jointly optimized through the formula (1), the network can automatically identify and cut off unimportant channels, and the generalization performance of the network is hardly influenced.
And a BN layer is inserted after the convolution layer of the convolution network model, so that the scaling factor and the translation parameter of the BN layer are introduced. The scaling factor of the introduced BN layer, namely the scaling factor of the BN layer, is used as the scaling factor of the network slimming BN layer, so that additional cost is not required to be brought to the network.
Further, as shown in fig. 4, pruning the channels of the converged convolutional network model includes:
s401, sorting the absolute values of the scaling factors of the BN layers corresponding to each channel of the converged convolutional network model according to a sorting rule;
s402, intercepting the scaling factors of the BN layers at the corresponding positions after sorting as the global threshold values of all layers of the converged convolutional network model;
s403, judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than a global threshold value or not;
and S404, if so, clipping a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold.
Specifically, after introducing the regular term of the scale factor of the BN layer, the scale factor of many BN layers in the model tends to be 0. Then, clipping channels corresponding to the scaling factors of the BN layer close to 0, for example, assuming that the dimensions of the feature map after convolution are h × w × c, h and w are the height and width of the feature map, respectively, and c is the number of channels, sending the feature map into the BN layer will obtain the feature map after normalization, where each of the c feature maps corresponds to a set of scaling factors and balance factors of the BN layer, and thus, clipping channels corresponding to the scaling factors of the small BN layer is substantially to directly clip the convolution kernel corresponding to the feature map.
The reference criterion of the scaling factor of the small BN layer depends on the global threshold set for all layers of the entire convolutional network model, which may be defined as a proportion of the scaling factor values of all BN layers, and assuming that 70% of channels in the convolutional network model need to be clipped, the absolute values of the scaling factors of the BN layers may be sorted first according to a sorting rule, for example, the sorting rule may be sorted from small to large, which is not specifically limited in this embodiment, and then the scaling factor of the BN layer located at the 70 th% position in the scaling factors of the BN layers sorted from small to large is clipped as the global threshold according to a preset clipping rule, which in this embodiment is to clip the scaling factor of the BN layer located at the 70 th% position in the scaling factors of the BN layers sorted from small to large as the global threshold, or adopt another preset clipping rule according to actual requirements, the embodiment is not particularly limited. By the method, a compact convolution network model with few parameters, small memory occupation during operation and low calculation amount can be obtained.
In addition, the method is combined with other model compression methods (quantization and low-rank decomposition), so that the compression ratio can be further improved; the method is combined with other optimization acceleration methods (TensorRT and the like) to be used, so that the reasoning speed can be improved; if the inference precision loss is too large, knowledge distillation can be combined to effectively recover the lost precision, and a thin and compact network is obtained under the condition of almost no precision loss.
In a second aspect, as shown in fig. 5, an embodiment of the present invention provides a convolutional network model compression system, which includes: a processor 501 and a memory 502;
memory 502 is used to store one or more program instructions;
a processor 501 for executing one or more program instructions for performing any of the above method steps of the convolutional network model compression method.
In a third aspect, embodiments of the present invention provide a computer storage medium having one or more program instructions embodied therein for execution by a convolutional network model compression system to perform any one of the method steps of a convolutional network model compression method as described above.
In an embodiment of the invention, the processor may be an integrated circuit chip having signal processing capability. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component.
The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.
The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.
The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), SLDRAM (SLDRAM), and Direct Rambus RAM (DRRAM).
The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (8)

1. A method of compressing a convolutional network model, comprising:
acquiring image data, making a training data set, building a convolution network model by using a deep learning framework, and executing the following steps on the convolution network model:
performing channel sparse regularization training on the convolutional network model by using the training data set until the convolutional network model is converged;
pruning the channel of the converged convolutional network model by using the scaling factor of the BN layer;
finely adjusting the pruned convolution network model, and judging whether the finely adjusted convolution network model is converged;
if the finely tuned convolutional network model is converged, saving the model parameters of the finely tuned convolutional network model to obtain a compressed convolutional network model; and if the finely tuned convolution network model is not converged, repeating the steps.
2. The method for compressing the convolutional network model according to claim 1, wherein the acquiring image data and making a training data set, and the building of the convolutional network model by using the deep learning framework comprises the following steps:
acquiring image data according to an application scene and making a training data set;
calculating a mean and a standard deviation of the training data set;
performing normalization processing on the training data set according to the mean value and the standard deviation to obtain a preprocessed training data set;
configuring the channel number of a convolution network model according to the category number contained in the application scene;
the training of the sparse regularization channel of the convolutional network model by using the training data set comprises:
and performing channel sparse regularization training on the convolutional network model by using the preprocessed training data set.
3. The method of compressing a convolutional network model as claimed in claim 2, wherein the training of the convolutional network model with the preprocessed training data set for channel sparsity regularization comprises:
and inputting the preprocessed training data set into the convolution network model, and performing channel sparse regularization training on the convolution network model to obtain an output value of the convolution network model, updated weight parameters and a scaling factor of the BN layer.
4. The convolutional network model compression method of claim 3, wherein the number of scaling factors is the same as the number of BN layer layers.
5. The convolutional network model compression method of claim 1, wherein pruning the channels of the converged convolutional network model by using the scaling factor of the BN layer comprises:
when a convolutional network model is built, inserting a BN layer behind a convolutional layer of the convolutional network model, and training the convolutional network model to obtain a scaling factor and a translation parameter of the BN layer;
and pruning the channel of the converged convolutional network model by using the scaling factor and the translation parameter of the BN layer.
6. The convolutional network model compression method of claim 5, wherein the pruning the channels of the converged convolutional network model comprises:
sorting the absolute values of the scaling factors of the BN layer corresponding to each channel of the converged convolutional network model according to a sorting rule;
intercepting the scaling factors of the BN layers at the corresponding positions after sorting as the global threshold values of all layers of the converged convolutional network model;
judging whether the scaling factor of the BN layer corresponding to each channel of the converged convolutional network model is smaller than the global threshold value or not;
and if so, cutting off a channel of the converged convolutional network model corresponding to the scaling factor of the BN layer smaller than the global threshold.
7. A convolutional network model compression system, the system comprising: a processor and a memory;
the memory is to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method of any of claims 1-6.
8. A computer storage medium comprising one or more program instructions for executing the method of any one of claims 1-6 by a convolutional network model compression system.
CN202110797243.2A 2021-07-14 2021-07-14 Convolution network model compression method, system and storage medium Pending CN113361697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110797243.2A CN113361697A (en) 2021-07-14 2021-07-14 Convolution network model compression method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110797243.2A CN113361697A (en) 2021-07-14 2021-07-14 Convolution network model compression method, system and storage medium

Publications (1)

Publication Number Publication Date
CN113361697A true CN113361697A (en) 2021-09-07

Family

ID=77539485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110797243.2A Pending CN113361697A (en) 2021-07-14 2021-07-14 Convolution network model compression method, system and storage medium

Country Status (1)

Country Link
CN (1) CN113361697A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185307A (en) * 2023-04-24 2023-05-30 之江实验室 Storage method and device of model data, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199282A (en) * 2019-12-31 2020-05-26 的卢技术有限公司 Pruning method and device for convolutional neural network model
CN112101547A (en) * 2020-09-14 2020-12-18 中国科学院上海微系统与信息技术研究所 Pruning method and device for network model, electronic equipment and storage medium
JP2021022050A (en) * 2019-07-25 2021-02-18 国立大学法人 和歌山大学 Neural network compression method, neural network compression device, computer program, and method of producing compressed neural network data
CN112990420A (en) * 2019-12-02 2021-06-18 北京华航无线电测量研究所 Pruning method for convolutional neural network model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021022050A (en) * 2019-07-25 2021-02-18 国立大学法人 和歌山大学 Neural network compression method, neural network compression device, computer program, and method of producing compressed neural network data
CN112990420A (en) * 2019-12-02 2021-06-18 北京华航无线电测量研究所 Pruning method for convolutional neural network model
CN111199282A (en) * 2019-12-31 2020-05-26 的卢技术有限公司 Pruning method and device for convolutional neural network model
CN112101547A (en) * 2020-09-14 2020-12-18 中国科学院上海微系统与信息技术研究所 Pruning method and device for network model, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
尹文枫;梁玲燕;彭慧民;曹其春;赵健;董刚;赵雅倩;赵坤;: "卷积神经网络压缩与加速技术研究进展", 计算机系统应用, no. 09 *
白士磊;殷柯欣;朱建启;: "轻量级YOLOv3的交通标志检测算法", 计算机与现代化, no. 09 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185307A (en) * 2023-04-24 2023-05-30 之江实验室 Storage method and device of model data, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US20200410357A1 (en) Automatic thresholds for neural network pruning and retraining
CN107480770B (en) Neural network quantization and compression method and device capable of adjusting quantization bit width
CN112052951B (en) Pruning neural network method, system, equipment and readable storage medium
CN112396115A (en) Target detection method and device based on attention mechanism and computer equipment
KR20210129031A (en) Model compression method, image processing method and apparatus
US6344808B1 (en) MPEG-1 audio layer III decoding device achieving fast processing by eliminating an arithmetic operation providing a previously known operation result
US20200073581A1 (en) Processing of neural networks on electronic devices
CN103546158A (en) Compressed depth cache
CN110348453B (en) Object detection method and system based on cascade connection, storage medium and terminal
CN113361697A (en) Convolution network model compression method, system and storage medium
CN114698395A (en) Quantification method and device of neural network model, and data processing method and device
CN111709415A (en) Target detection method, target detection device, computer equipment and storage medium
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN112766397B (en) Classification network and implementation method and device thereof
US11741349B2 (en) Performing matrix-vector multiply operations for neural networks on electronic devices
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN111860557B (en) Image processing method and device, electronic equipment and computer storage medium
CN116543277A (en) Model construction method and target detection method
CN111160517A (en) Convolutional layer quantization method and device of deep neural network
Huang et al. A low-bit quantized and hls-based neural network fpga accelerator for object detection
CN113743593A (en) Neural network quantization method, system, storage medium and terminal
CN115705486A (en) Method and device for training quantitative model, electronic equipment and readable storage medium
CN113157987A (en) Data preprocessing method for machine learning algorithm and related equipment
CN112508187A (en) Machine learning model compression method, device and equipment
CN110751274A (en) Neural network compression method and system based on random projection hash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination