CN108985453A - Deep neural network model compression method based on the quantization of asymmetric ternary weight - Google Patents

Deep neural network model compression method based on the quantization of asymmetric ternary weight Download PDF

Info

Publication number
CN108985453A
CN108985453A CN201810674698.3A CN201810674698A CN108985453A CN 108985453 A CN108985453 A CN 108985453A CN 201810674698 A CN201810674698 A CN 201810674698A CN 108985453 A CN108985453 A CN 108985453A
Authority
CN
China
Prior art keywords
deep neural
neural network
weight
quantization
ternary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810674698.3A
Other languages
Chinese (zh)
Inventor
吴俊敏
丁杰
吴焕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Institute for Advanced Study USTC
Original Assignee
Suzhou Institute for Advanced Study USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Institute for Advanced Study USTC filed Critical Suzhou Institute for Advanced Study USTC
Priority to CN201810674698.3A priority Critical patent/CN108985453A/en
Publication of CN108985453A publication Critical patent/CN108985453A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a kind of deep neural network model compression methods based on the quantization of asymmetric ternary weight, it include: in deep neural network training, before forward calculation each time, each layer of floating-point weight of network is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type network weight;The deep neural network completed to training carries out compression storage.The nuisance parameter for removing deep neural network, compresses network model, recognition accuracy of the quantization method on large data sets is effectively promoted.

Description

Deep neural network model compression method based on the quantization of asymmetric ternary weight
Technical field
The present invention relates to the compression technique areas of convolutional neural networks, are based on asymmetric ternary weight more particularly to one kind The deep neural network model compression method of quantization.
Background technique
In recent years, with the fast development of deep learning algorithm, deep neural network speech recognition, image classification and from State-of-the-art achievement is achieved in a series of machine learning tasks such as right Language Processing.However, typical deep neural network is logical Often with there is millions of parameters, the embedded device for making it difficult to be deployed to only limited storage and computing resource is worked as In, how to realize that the model compression of deep neural network becomes the important research direction of current depth study.
Currently, typical model compression method is divided into two kinds, one is the structure of optimization network is to reduce the ginseng of network It keeps count of, the best paper Deep Compression of ICLR2016 is detailed to describe such method, and decades of times may be implemented Model compression ratio, but such method realizes that difficulty is larger, step is complex.Secondly to be reduced by reducing neural network accuracy Network storage, such as binaryzation network (Binary Connect) relatively conventional at present and symmetrical three-valued network (Ternary Weight networks), the above method achieves the accuracy rate not less than floating type network on lesser data set, still Loss in accuracy on biggish data set such as ImageNet is larger.
Newest ternary weight quantization method (Ternary weight networks) can arrive network weight quantization at present In {-α, 0 ,+α } ternary value, the quantization method that uses are as follows:
In the selection of quantization method, think by after training, the positive negative weight of network meets identical distribution, Significantly limit the ability to express of ternary weighting network.
Summary of the invention
For the above technical problems, object of the present invention is to: provide and a kind of quantified based on asymmetric ternary weight Deep neural network model compression method, remove the nuisance parameter of deep neural network, network model compressed, effectively Ground improves recognition accuracy of the quantization method on large data sets.
The technical scheme is that
A kind of deep neural network model compression method based on the quantization of asymmetric ternary weight, comprising the following steps:
S01: in deep neural network training, before forward calculation each time, by each layer of floating-point weight of network It is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type network weight;
S02: the deep neural network completed to training carries out compression storage.
In preferred technical solution, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,It is corresponding Zoom factor.
In preferred technical solution, loss brought by quantizing process is reduced by L2 normal form minimum, formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
In preferred technical solution, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
In preferred technical solution, compression storage is carried out by the way of 2-bit coding, in compression process, passes through shifting 16 ternary values are stored as a 32-bit fixed-point integer by bit manipulation.
The invention also discloses a kind of deep neural network model compression set based on the quantization of asymmetric ternary weight, packets It includes:
One asymmetric ternary weighting network training module, deep neural network training when, forward calculation each time it Before, each layer of floating-point weight of network is quantified as asymmetrical ternary value, the parameter more new stage uses original floating type net Network weight;
One asymmetric ternary weight memory module, the deep neural network completed to training carry out compression storage.
In preferred technical solution, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,It is corresponding Zoom factor.
In preferred technical solution, loss brought by quantizing process is reduced by L2 normal form minimum, formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
In preferred technical solution, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
In preferred technical solution, compression storage is carried out by the way of 2-bit coding, in compression process, passes through shifting 16 ternary values are stored as a 32-bit fixed-point integer by bit manipulation.
Compared with prior art, the invention has the advantages that
1, different constraints is carried out for positive negative weight to improve the ability to express of three-valued network, and obtained by L2 constraint The relationship between positive negative threshold value and corresponding zoom factor is taken, bring is lost during reducing quantization, and quantization is effectively promoted Recognition accuracy of the method on large data sets.
2, the nuisance parameter for removing deep neural network, compresses network model, reduces deep neural network model Storage, can easier be transplanted in embedded device and execute.
Detailed description of the invention
The invention will be further described with reference to the accompanying drawings and embodiments:
Fig. 1 is present invention quantization network training flow chart;
Fig. 2 is weight coding method schematic diagram of the present invention;
Fig. 3 is that VGG16 quantifies network accuracy rate on CIFAR-10 data set;
Fig. 4 is that AlexNet quantifies network accuracy rate on ImageNet.
Specific embodiment
Above scheme is described further below in conjunction with specific embodiment.It should be understood that these embodiments are for illustrating The present invention and be not limited to limit the scope of the invention.Implementation condition used in the examples can be done according to the condition of specific producer Further adjustment, the implementation condition being not specified is usually the condition in routine experiment.
Embodiment:
Deep neural network generally comprises millions of parameters and it is made to be difficult to be applied to only limited resources In equipment, but most of parameter of usually network is all redundancy, thus this main object of the present invention be exactly remove it is superfluous Remaining parameter, implementation model compression.The realization of technology is broadly divided into three steps:
(1): asymmetric ternary weight quantizing process:
Traditional floating type network weight is quantified ternary value in network training by asymmetric ternary weight quantization methodIn the middle, the method that quantization method uses threshold value setting, formula are as follows:
In formulaIt is the threshold value used in quantizing process, arbitrary floating number can be arrived according to its range assignment In different ternary values.For corresponding zoom factor, lost for reducing quantizing process bring.It is arranged above Threshold method in, existFour independent parameter factors.
It is as follows that the present invention reduces loss, formula brought by quantizing process using L2 normal form minimum:
Formula (1) is brought into (2), formula can be converted are as follows:
WhereinAndIt indicatesThe number of middle element.Threshold factor is independent of each other,Be one withUnrelated independent constant value.
Formula (3) solution can be converted are as follows:
For any given threshold valueZoom factorIt may be calculated:
Later willIt is brought into formula (4), threshold factorIt may be calculated:
In above formulaIt is positive value, since formula (7) (8) are without accurate calculated value, so in experimentation In, it is assumed that network weight WlBasic DYNAMIC DISTRIBUTION is still met by its positive and negative values after training, that is, approximate calculation can be used The approximation of method acquisition threshold factor:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n, finally combine formula (5), (6), it (9), (10) and substitutes into formula (1), can quantify to obtain corresponding ternary power in original floating type weight Weight values realize the sliding-model control of network weight.
(2) asymmetric ternary weighting network training process
Each layer of floating-point weight of network is all tied to by asymmetric ternary weight quantization methodThree In member value, greatly reduce the redundancy condition of parameter, effectively prevent can over-fitting generation, but for one The network that training is completed, which directlys adopt the quantization method, to have a huge impact the accuracy rate of network, it is therefore desirable to will quantify Method is added in the training process of network to reduce network loss in accuracy.The training method of network and traditional floating type net Network is similar, and training process is as shown in figure (1).
Figure (1) shows two key points of asymmetric ternary weighting network training: one is quantization method needs to add Before forward calculation each time, the penalty values of network are obtained by the weight calculation after quantifying, and main purpose is amount to obtain Influence of the change method to final result.The second is the original floating type network weight that uses of parameter more new stage and it is non-quantized after Ternary weight, it is therefore intended that obtain small gradient updating value and network updated towards optimal direction always.
(3) asymmetric ternary weight storage method
Asymmetric ternary weighting network is by after training, each layer network weight can all quantify to arrive In the middle, wherein l represents corresponding network layer, but ternary weight is still the expression of floating type, for implementation model storage Compression, this technology carry out compression storage by the way of 2-bit coding, and specific coding mode is as shown in figure (2).2-bit Coding can store four kinds of numerical value, indicates in this technology using wherein three kinds, in compression process, can pass through shifting function 16 ternary values are stored as a 32-bit fixed-point integer, can theoretically obtain 16 times or so of model compression ratio.
Asymmetric ternary weighting network (Asymmetric Ternary Networks ATNs) in CIFAR-10 and Training process on ImageNet data set is as shown in figure (3) (4), compared to traditional ternary weighting network (Ternary Weight Networks TWNs), the present invention effectively improves quantization network on CIFAR-10 and ImageNet data set Recognition accuracy, shown in specific result such as table (1) (2):
Table (1) VGG network accuracy rate on CIFAR-10 data set
Table (2) AlexNet network accuracy rate on ImageNet data set
It can be seen that ATNs improves 0.41% recognition accuracy on CIFAR-10 data set compared to TWNs, simultaneously It is also higher than floating type Network Recognition rate by 0.33%.On ImageNet data set, ATNs improves 2.25% compared to TWNs Accuracy rate, only reduce 0.63% compared to floating type network, knowledge of the quantization method on large data sets be effectively promoted Other accuracy rate.
The foregoing examples are merely illustrative of the technical concept and features of the invention, its object is to allow the person skilled in the art to be It cans understand the content of the present invention and implement it accordingly, it is not intended to limit the scope of the present invention.It is all smart according to the present invention The equivalent transformation or modification that refreshing essence is done, should be covered by the protection scope of the present invention.

Claims (10)

1. it is a kind of based on asymmetric ternary weight quantization deep neural network model compression method, which is characterized in that including with Lower step:
S01: in deep neural network training, before forward calculation each time, each layer of floating-point weight of network is quantified For asymmetrical ternary value, the parameter more new stage uses original floating type network weight;
S02: the deep neural network completed to training carries out compression storage.
2. the deep neural network model compression method according to claim 1 based on the quantization of asymmetric ternary weight, It is characterized in that, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,For corresponding scaling The factor.
3. the deep neural network model compression method according to claim 2 based on the quantization of asymmetric ternary weight, It is characterized in that, loss brought by quantizing process is reduced by L2 normal form minimum, and formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
4. the deep neural network model compression method according to claim 3 based on the quantization of asymmetric ternary weight, It is characterized in that, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
5. the deep neural network model compression method according to claim 1 based on the quantization of asymmetric ternary weight, It is characterized in that, carries out compression storage by the way of 2-bit coding, in compression process, by shifting function by 16 ternarys Value is stored as a 32-bit fixed-point integer.
6. a kind of deep neural network model compression set based on the quantization of asymmetric ternary weight characterized by comprising
One asymmetric ternary weighting network training module, will before forward calculation each time in deep neural network training Each layer of floating-point weight of network is quantified as asymmetrical ternary value, and the parameter more new stage uses original floating type network weight Weight;
One asymmetric ternary weight memory module, the deep neural network completed to training carry out compression storage.
7. the deep neural network model compression set according to claim 6 based on the quantization of asymmetric ternary weight, It is characterized in that, the ternary valueAre as follows:
Wherein l represents corresponding network layer,It is the threshold value used in quantizing process,For corresponding scaling The factor.
8. the deep neural network model compression set according to claim 7 based on the quantization of asymmetric ternary weight, It is characterized in that, loss brought by quantizing process is reduced by L2 normal form minimum, and formula is as follows:
For any given threshold valueZoom factorAre as follows:
WhereinAndIt indicatesThe number of middle element;
Threshold factorAre as follows:
9. the deep neural network model compression set according to claim 8 based on the quantization of asymmetric ternary weight, It is characterized in that, the approximation of threshold factor is obtained using the method for approximate calculation:
Wherein Ip=i | Wli>=0 | i=1,2 ... n }, In=i | Wli< 0 | i=1,2 ... n }.
10. the deep neural network model compression set according to claim 6 based on the quantization of asymmetric ternary weight, It is characterized in that, carries out compression storage by the way of 2-bit coding, in compression process, by shifting function by 16 ternarys Value is stored as a 32-bit fixed-point integer.
CN201810674698.3A 2018-06-27 2018-06-27 Deep neural network model compression method based on the quantization of asymmetric ternary weight Pending CN108985453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810674698.3A CN108985453A (en) 2018-06-27 2018-06-27 Deep neural network model compression method based on the quantization of asymmetric ternary weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810674698.3A CN108985453A (en) 2018-06-27 2018-06-27 Deep neural network model compression method based on the quantization of asymmetric ternary weight

Publications (1)

Publication Number Publication Date
CN108985453A true CN108985453A (en) 2018-12-11

Family

ID=64538977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810674698.3A Pending CN108985453A (en) 2018-06-27 2018-06-27 Deep neural network model compression method based on the quantization of asymmetric ternary weight

Country Status (1)

Country Link
CN (1) CN108985453A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942148A (en) * 2019-12-11 2020-03-31 北京工业大学 Adaptive asymmetric quantization deep neural network model compression method
CN111353517A (en) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN111681263A (en) * 2020-05-25 2020-09-18 厦门大学 Multi-scale antagonistic target tracking algorithm based on three-value quantization
CN112561050A (en) * 2019-09-25 2021-03-26 杭州海康威视数字技术股份有限公司 Neural network model training method and device
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device
WO2022148071A1 (en) * 2021-01-07 2022-07-14 苏州浪潮智能科技有限公司 Image feature extraction method, apparatus and device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751228A (en) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 Method and system for constructing deep neural network
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN107688849A (en) * 2017-07-28 2018-02-13 北京深鉴科技有限公司 A kind of dynamic strategy fixed point training method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751228A (en) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 Method and system for constructing deep neural network
CN107688849A (en) * 2017-07-28 2018-02-13 北京深鉴科技有限公司 A kind of dynamic strategy fixed point training method and device
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIE DING: "Asymmetric Ternary Networks", 《2017 INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353517A (en) * 2018-12-24 2020-06-30 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN111353517B (en) * 2018-12-24 2023-09-26 杭州海康威视数字技术股份有限公司 License plate recognition method and device and electronic equipment
CN112561050A (en) * 2019-09-25 2021-03-26 杭州海康威视数字技术股份有限公司 Neural network model training method and device
CN112561050B (en) * 2019-09-25 2023-09-05 杭州海康威视数字技术股份有限公司 Neural network model training method and device
CN110942148A (en) * 2019-12-11 2020-03-31 北京工业大学 Adaptive asymmetric quantization deep neural network model compression method
CN111681263A (en) * 2020-05-25 2020-09-18 厦门大学 Multi-scale antagonistic target tracking algorithm based on three-value quantization
CN111681263B (en) * 2020-05-25 2022-05-03 厦门大学 Multi-scale antagonistic target tracking algorithm based on three-value quantization
WO2022148071A1 (en) * 2021-01-07 2022-07-14 苏州浪潮智能科技有限公司 Image feature extraction method, apparatus and device, and storage medium
CN114492779A (en) * 2022-02-16 2022-05-13 安谋科技(中国)有限公司 Method for operating neural network model, readable medium and electronic device

Similar Documents

Publication Publication Date Title
CN108985453A (en) Deep neural network model compression method based on the quantization of asymmetric ternary weight
WO2020238237A1 (en) Power exponent quantization-based neural network compression method
CN107644254A (en) A kind of convolutional neural networks weight parameter quantifies training method and system
WO2021258752A1 (en) 4-bit quantization method and system for neural network
CN108764317A (en) A kind of residual error convolutional neural networks image classification method based on multichannel characteristic weighing
CN110276451A (en) One kind being based on the normalized deep neural network compression method of weight
CN108664993B (en) Dense weight connection convolutional neural network image classification method
CN107395211A (en) A kind of data processing method and device based on convolutional neural networks model
CN110443172A (en) A kind of object detection method and system based on super-resolution and model compression
CN111931906A (en) Deep neural network mixing precision quantification method based on structure search
CN113660113B (en) Self-adaptive sparse parameter model design and quantization transmission method for distributed machine learning
CN110188877A (en) A kind of neural network compression method and device
CN109409505A (en) A method of the compression gradient for distributed deep learning
CN109325590A (en) For realizing the device for the neural network processor that computational accuracy can be changed
CN110942148B (en) Adaptive asymmetric quantization deep neural network model compression method
CN110837890A (en) Weight value fixed-point quantization method for lightweight convolutional neural network
CN112488304A (en) Heuristic filter pruning method and system in convolutional neural network
CN107748913A (en) A kind of general miniaturization method of deep neural network
CN117521763A (en) Artificial intelligent model compression method integrating regularized pruning and importance pruning
CN114756517A (en) Visual Transformer compression method and system based on micro-quantization training
CN111831358A (en) Weight precision configuration method, device, equipment and storage medium
CN110110852A (en) A kind of method that deep learning network is transplanted to FPAG platform
CN114707637A (en) Neural network quantitative deployment method, system and storage medium
CN110263917A (en) A kind of neural network compression method and device
WO2020253692A1 (en) Quantification method for deep learning network parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181211