CN108985453A - Deep neural network model compression method based on the quantization of asymmetric ternary weight - Google Patents
Deep neural network model compression method based on the quantization of asymmetric ternary weight Download PDFInfo
- Publication number
- CN108985453A CN108985453A CN201810674698.3A CN201810674698A CN108985453A CN 108985453 A CN108985453 A CN 108985453A CN 201810674698 A CN201810674698 A CN 201810674698A CN 108985453 A CN108985453 A CN 108985453A
- Authority
- CN
- China
- Prior art keywords
- deep neural
- neural network
- weight
- quantization
- ternary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a kind of deep neural network model compression methods based on the quantization of asymmetric ternary weight, it include: in deep neural network training, before forward calculation each time, each layer of floating-point weight of network is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type network weight;The deep neural network completed to training carries out compression storage.The nuisance parameter for removing deep neural network, compresses network model, recognition accuracy of the quantization method on large data sets is effectively promoted.
Description
Technical field
The present invention relates to the compression technique areas of convolutional neural networks, are based on asymmetric ternary weight more particularly to one kind
The deep neural network model compression method of quantization.
Background technique
In recent years, with the fast development of deep learning algorithm, deep neural network speech recognition, image classification and from
State-of-the-art achievement is achieved in a series of machine learning tasks such as right Language Processing.However, typical deep neural network is logical
Often with there is millions of parameters, the embedded device for making it difficult to be deployed to only limited storage and computing resource is worked as
In, how to realize that the model compression of deep neural network becomes the important research direction of current depth study.
Currently, typical model compression method is divided into two kinds, one is the structure of optimization network is to reduce the ginseng of network
It keeps count of, the best paper Deep Compression of ICLR2016 is detailed to describe such method, and decades of times may be implemented
Model compression ratio, but such method realizes that difficulty is larger, step is complex.Secondly to be reduced by reducing neural network accuracy
Network storage, such as binaryzation network (Binary Connect) relatively conventional at present and symmetrical three-valued network (Ternary
Weight networks), the above method achieves the accuracy rate not less than floating type network on lesser data set, still
Loss in accuracy on biggish data set such as ImageNet is larger.
Newest ternary weight quantization method (Ternary weight networks) can arrive network weight quantization at present
In {-α, 0 ,+α } ternary value, the quantization method that uses are as follows:
In the selection of quantization method, think by after training, the positive negative weight of network meets identical distribution,
Significantly limit the ability to express of ternary weighting network.
Summary of the invention
For the above technical problems, object of the present invention is to: provide and a kind of quantified based on asymmetric ternary weight
Deep neural network model compression method, remove the nuisance parameter of deep neural network, network model compressed, effectively
Ground improves recognition accuracy of the quantization method on large data sets.
The technical scheme is that
A kind of deep neural network model compression method based on the quantization of asymmetric ternary weight, comprising the following steps:
S01: in deep neural network training, before forward calculation each time, by each layer of floating-point weight of network
It is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type network weight;
S02: the deep neural network completed to training carries out compression storage.
In preferred technical solution, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,It is corresponding
Zoom factor.
In preferred technical solution, loss brought by quantizing process is reduced by L2 normal form minimum, formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
In preferred technical solution, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
In preferred technical solution, compression storage is carried out by the way of 2-bit coding, in compression process, passes through shifting
16 ternary values are stored as a 32-bit fixed-point integer by bit manipulation.
The invention also discloses a kind of deep neural network model compression set based on the quantization of asymmetric ternary weight, packets
It includes:
One asymmetric ternary weighting network training module, deep neural network training when, forward calculation each time it
Before, each layer of floating-point weight of network is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type net
Network weight;
One asymmetric ternary weight memory module, the deep neural network completed to training carry out compression storage.
In preferred technical solution, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,It is corresponding
Zoom factor.
In preferred technical solution, loss brought by quantizing process is reduced by L2 normal form minimum, formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
In preferred technical solution, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
In preferred technical solution, compression storage is carried out by the way of 2-bit coding, in compression process, passes through shifting
16 ternary values are stored as a 32-bit fixed-point integer by bit manipulation.
Compared with prior art, the invention has the advantages that
1, different constraints is carried out for positive negative weight to improve the ability to express of three-valued network, and obtained by L2 constraint
The relationship between positive negative threshold value and corresponding zoom factor is taken, bring is lost during reducing quantization, and quantization is effectively promoted
Recognition accuracy of the method on large data sets.
2, the nuisance parameter for removing deep neural network, compresses network model, reduces deep neural network model
Storage, can easier be transplanted in embedded device and execute.
Detailed description of the invention
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is present invention quantization network training flow chart;
Fig. 2 is weight coding method schematic diagram of the present invention;
Fig. 3 is that VGG16 quantifies network accuracy rate on CIFAR-10 data set;
Fig. 4 is that AlexNet quantifies network accuracy rate on ImageNet.
Specific embodiment
Above scheme is described further below in conjunction with specific embodiment.It should be understood that these embodiments are for illustrating
The present invention and be not limited to limit the scope of the invention.Implementation condition used in the examples can be done according to the condition of specific producer
Further adjustment, the implementation condition being not specified is usually the condition in routine experiment.
Embodiment:
Deep neural network generally comprises millions of parameters and it is made to be difficult to be applied to only limited resources
In equipment, but most of parameter of usually network is all redundancy, thus this main object of the present invention be exactly remove it is superfluous
Remaining parameter, implementation model compression.The realization of technology is broadly divided into three steps:
(1): asymmetric ternary weight quantizing process:
Traditional floating type network weight is quantified ternary value in network training by asymmetric ternary weight quantization methodIn the middle, the method that quantization method uses threshold value setting, formula are as follows:
In formulaIt is the threshold value used in quantizing process, arbitrary floating number can be arrived according to its range assignment
In different ternary values.For corresponding zoom factor, lost for reducing quantizing process bring.It is arranged above
Threshold method in, existFour independent parameter factors.
It is as follows that the present invention reduces loss, formula brought by quantizing process using L2 normal form minimum:
Formula (1) is brought into (2), formula can be converted are as follows:
WhereinAndIt indicatesThe number of middle element.Threshold factor is independent of each other,Be one withUnrelated independent constant value.
Formula (3) solution can be converted are as follows:
For any given threshold valueZoom factorIt may be calculated:
Later willIt is brought into formula (4), threshold factorIt may be calculated:
In above formulaIt is positive value, since formula (7) (8) are without accurate calculated value, so in experimentation
In, it is assumed that network weight WlBasic DYNAMIC DISTRIBUTION is still met by its positive and negative values after training, that is, approximate calculation can be used
The approximation of method acquisition threshold factor:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n, finally combine formula (5),
(6), it (9), (10) and substitutes into formula (1), can quantify to obtain corresponding ternary power in original floating type weight
Weight values realize the sliding-model control of network weight.
(2) asymmetric ternary weighting network training process
Each layer of floating-point weight of network is all tied to by asymmetric ternary weight quantization methodThree
In member value, greatly reduce the redundancy condition of parameter, effectively prevent can over-fitting generation, but for one
The network that training is completed, which directlys adopt the quantization method, to have a huge impact the accuracy rate of network, it is therefore desirable to will quantify
Method is added in the training process of network to reduce network loss in accuracy.The training method of network and traditional floating type net
Network is similar, and training process is as shown in figure (1).
Figure (1) shows two key points of asymmetric ternary weighting network training: one is quantization method needs to add
Before forward calculation each time, the penalty values of network are obtained by the weight calculation after quantifying, and main purpose is amount to obtain
Influence of the change method to final result.The second is the original floating type network weight that uses of parameter more new stage and it is non-quantized after
Ternary weight, it is therefore intended that obtain small gradient updating value and network updated towards optimal direction always.
(3) asymmetric ternary weight storage method
Asymmetric ternary weighting network is by after training, each layer network weight can all quantify to arrive
In the middle, wherein l represents corresponding network layer, but ternary weight is still the expression of floating type, for implementation model storage
Compression, this technology carry out compression storage by the way of 2-bit coding, and specific coding mode is as shown in figure (2).2-bit
Coding can store four kinds of numerical value, indicates in this technology using wherein three kinds, in compression process, can pass through shifting function
16 ternary values are stored as a 32-bit fixed-point integer, can theoretically obtain 16 times or so of model compression ratio.
Asymmetric ternary weighting network (Asymmetric Ternary Networks ATNs) in CIFAR-10 and
Training process on ImageNet data set is as shown in figure (3) (4), compared to traditional ternary weighting network (Ternary
Weight Networks TWNs), the present invention effectively improves quantization network on CIFAR-10 and ImageNet data set
Recognition accuracy, shown in specific result such as table (1) (2):
Table (1) VGG network accuracy rate on CIFAR-10 data set
Table (2) AlexNet network accuracy rate on ImageNet data set
It can be seen that ATNs improves 0.41% recognition accuracy on CIFAR-10 data set compared to TWNs, simultaneously
It is also higher than floating type Network Recognition rate by 0.33%.On ImageNet data set, ATNs improves 2.25% compared to TWNs
Accuracy rate, only reduce 0.63% compared to floating type network, knowledge of the quantization method on large data sets be effectively promoted
Other accuracy rate.
The foregoing examples are merely illustrative of the technical concept and features of the invention, its object is to allow the person skilled in the art to be
It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all smart according to the present invention
The equivalent transformation or modification that refreshing essence is done, should be covered by the protection scope of the present invention.
Claims (10)
1. it is a kind of based on asymmetric ternary weight quantization deep neural network model compression method, which is characterized in that including with
Lower step:
S01: in deep neural network training, before forward calculation each time, each layer of floating-point weight of network is quantified
For asymmetrical ternary value, the parameter more new stage uses original floating type network weight;
S02: the deep neural network completed to training carries out compression storage.
2. the deep neural network model compression method according to claim 1 based on the quantization of asymmetric ternary weight,
It is characterized in that, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,For corresponding scaling
The factor.
3. the deep neural network model compression method according to claim 2 based on the quantization of asymmetric ternary weight,
It is characterized in that, loss brought by quantizing process is reduced by L2 normal form minimum, and formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
4. the deep neural network model compression method according to claim 3 based on the quantization of asymmetric ternary weight,
It is characterized in that, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
5. the deep neural network model compression method according to claim 1 based on the quantization of asymmetric ternary weight,
It is characterized in that, carries out compression storage by the way of 2-bit coding, in compression process, by shifting function by 16 ternarys
Value is stored as a 32-bit fixed-point integer.
6. a kind of deep neural network model compression set based on the quantization of asymmetric ternary weight characterized by comprising
One asymmetric ternary weighting network training module, will before forward calculation each time in deep neural network training
Each layer of floating-point weight of network is quantified as asymmetrical ternary value, and the parameter more new stage uses original floating type network weight
Weight;
One asymmetric ternary weight memory module, the deep neural network completed to training carry out compression storage.
7. the deep neural network model compression set according to claim 6 based on the quantization of asymmetric ternary weight,
It is characterized in that, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,For corresponding scaling
The factor.
8. the deep neural network model compression set according to claim 7 based on the quantization of asymmetric ternary weight,
It is characterized in that, loss brought by quantizing process is reduced by L2 normal form minimum, and formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
9. the deep neural network model compression set according to claim 8 based on the quantization of asymmetric ternary weight,
It is characterized in that, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
10. the deep neural network model compression set according to claim 6 based on the quantization of asymmetric ternary weight,
It is characterized in that, carries out compression storage by the way of 2-bit coding, in compression process, by shifting function by 16 ternarys
Value is stored as a 32-bit fixed-point integer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810674698.3A CN108985453A (en) | 2018-06-27 | 2018-06-27 | Deep neural network model compression method based on the quantization of asymmetric ternary weight |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810674698.3A CN108985453A (en) | 2018-06-27 | 2018-06-27 | Deep neural network model compression method based on the quantization of asymmetric ternary weight |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108985453A true CN108985453A (en) | 2018-12-11 |
Family
ID=64538977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810674698.3A Pending CN108985453A (en) | 2018-06-27 | 2018-06-27 | Deep neural network model compression method based on the quantization of asymmetric ternary weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108985453A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942148A (en) * | 2019-12-11 | 2020-03-31 | 北京工业大学 | Adaptive asymmetric quantization deep neural network model compression method |
CN111353517A (en) * | 2018-12-24 | 2020-06-30 | 杭州海康威视数字技术股份有限公司 | License plate recognition method and device and electronic equipment |
CN111681263A (en) * | 2020-05-25 | 2020-09-18 | 厦门大学 | Multi-scale antagonistic target tracking algorithm based on three-value quantization |
CN112561050A (en) * | 2019-09-25 | 2021-03-26 | 杭州海康威视数字技术股份有限公司 | Neural network model training method and device |
CN114492779A (en) * | 2022-02-16 | 2022-05-13 | 安谋科技(中国)有限公司 | Method for operating neural network model, readable medium and electronic device |
WO2022148071A1 (en) * | 2021-01-07 | 2022-07-14 | 苏州浪潮智能科技有限公司 | Image feature extraction method, apparatus and device, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104751228A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for constructing deep neural network |
CN107644254A (en) * | 2017-09-09 | 2018-01-30 | 复旦大学 | A kind of convolutional neural networks weight parameter quantifies training method and system |
CN107688849A (en) * | 2017-07-28 | 2018-02-13 | 北京深鉴科技有限公司 | A kind of dynamic strategy fixed point training method and device |
-
2018
- 2018-06-27 CN CN201810674698.3A patent/CN108985453A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104751228A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for constructing deep neural network |
CN107688849A (en) * | 2017-07-28 | 2018-02-13 | 北京深鉴科技有限公司 | A kind of dynamic strategy fixed point training method and device |
CN107644254A (en) * | 2017-09-09 | 2018-01-30 | 复旦大学 | A kind of convolutional neural networks weight parameter quantifies training method and system |
Non-Patent Citations (1)
Title |
---|
JIE DING: "Asymmetric Ternary Networks", 《2017 INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111353517A (en) * | 2018-12-24 | 2020-06-30 | 杭州海康威视数字技术股份有限公司 | License plate recognition method and device and electronic equipment |
CN111353517B (en) * | 2018-12-24 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | License plate recognition method and device and electronic equipment |
CN112561050A (en) * | 2019-09-25 | 2021-03-26 | 杭州海康威视数字技术股份有限公司 | Neural network model training method and device |
CN112561050B (en) * | 2019-09-25 | 2023-09-05 | 杭州海康威视数字技术股份有限公司 | Neural network model training method and device |
CN110942148A (en) * | 2019-12-11 | 2020-03-31 | 北京工业大学 | Adaptive asymmetric quantization deep neural network model compression method |
CN111681263A (en) * | 2020-05-25 | 2020-09-18 | 厦门大学 | Multi-scale antagonistic target tracking algorithm based on three-value quantization |
CN111681263B (en) * | 2020-05-25 | 2022-05-03 | 厦门大学 | Multi-scale antagonistic target tracking algorithm based on three-value quantization |
WO2022148071A1 (en) * | 2021-01-07 | 2022-07-14 | 苏州浪潮智能科技有限公司 | Image feature extraction method, apparatus and device, and storage medium |
CN114492779A (en) * | 2022-02-16 | 2022-05-13 | 安谋科技(中国)有限公司 | Method for operating neural network model, readable medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985453A (en) | Deep neural network model compression method based on the quantization of asymmetric ternary weight | |
WO2020238237A1 (en) | Power exponent quantization-based neural network compression method | |
CN107644254A (en) | A kind of convolutional neural networks weight parameter quantifies training method and system | |
WO2021258752A1 (en) | 4-bit quantization method and system for neural network | |
CN108764317A (en) | A kind of residual error convolutional neural networks image classification method based on multichannel characteristic weighing | |
CN110276451A (en) | One kind being based on the normalized deep neural network compression method of weight | |
CN108664993B (en) | Dense weight connection convolutional neural network image classification method | |
CN107395211A (en) | A kind of data processing method and device based on convolutional neural networks model | |
CN110443172A (en) | A kind of object detection method and system based on super-resolution and model compression | |
CN111931906A (en) | Deep neural network mixing precision quantification method based on structure search | |
CN113660113B (en) | Self-adaptive sparse parameter model design and quantization transmission method for distributed machine learning | |
CN110188877A (en) | A kind of neural network compression method and device | |
CN109409505A (en) | A method of the compression gradient for distributed deep learning | |
CN109325590A (en) | For realizing the device for the neural network processor that computational accuracy can be changed | |
CN110942148B (en) | Adaptive asymmetric quantization deep neural network model compression method | |
CN110837890A (en) | Weight value fixed-point quantization method for lightweight convolutional neural network | |
CN112488304A (en) | Heuristic filter pruning method and system in convolutional neural network | |
CN107748913A (en) | A kind of general miniaturization method of deep neural network | |
CN117521763A (en) | Artificial intelligent model compression method integrating regularized pruning and importance pruning | |
CN114756517A (en) | Visual Transformer compression method and system based on micro-quantization training | |
CN111831358A (en) | Weight precision configuration method, device, equipment and storage medium | |
CN110110852A (en) | A kind of method that deep learning network is transplanted to FPAG platform | |
CN114707637A (en) | Neural network quantitative deployment method, system and storage medium | |
CN110263917A (en) | A kind of neural network compression method and device | |
WO2020253692A1 (en) | Quantification method for deep learning network parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |