CN108229681A - A kind of neural network model compression method, system, device and readable storage medium storing program for executing - Google Patents
A kind of neural network model compression method, system, device and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN108229681A CN108229681A CN201711465541.1A CN201711465541A CN108229681A CN 108229681 A CN108229681 A CN 108229681A CN 201711465541 A CN201711465541 A CN 201711465541A CN 108229681 A CN108229681 A CN 108229681A
- Authority
- CN
- China
- Prior art keywords
- neural network
- network model
- quantified
- stored
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses a kind of neural network model compression method, system, device and computer readable storage mediums, and the method includes treating cutting neural network model using neural network method of cutting out to be cut to obtain neural network model to be quantified;The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;The neural network model to be stored is stored using compressed format.It can be seen that, a kind of neural network model compression method provided in an embodiment of the present invention, by being cut to neural network model, INQ algorithms is used to quantify to it after cutting simultaneously, in the case where effectively ensureing that compressed model accuracy does not lose, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.
Description
Technical field
The present invention relates to artificial intelligence field, more specifically to a kind of neural network model compression method, system,
Device and computer readable storage medium.
Background technology
Current era either in daily life or internet world, all can't steer clear of a word, AI
(Artificial Intelligence), i.e. artificial intelligence.The application of AI has penetrated into many aspects, such as recognition of face,
Speech recognition, text-processing, weiqi play chess, game fighting, automatic Pilot, picture beautification, lip-read language, even stratum breaking is imitative
True simulation etc..At many aspects, the ability of accuracy and process problem alreadys exceed the mankind, therefore, has very wide
Wealthy application prospect and imagination space.In the algorithmic technique in AI fields, deep learning is competing in ImageNet since 2012
Won the championship in match with absolute advantage, just caused the extensive concern of academia and industrial quarters, scientists from all over the world, researcher,
Enterprise, Web Community are all in the research and development for the neural network model for studying and pushing deep learning energetically.
As deep learning makes a breakthrough progress in every field, the demand of real life scene is applied it to also more
Strongly, especially in today, mobile and portable electronic device greatly facilitates people's life, and deep learning will be carried greatly
These high equipment it is intelligent with it is recreational.Therefore, by the neural network model of deep learning be deployed in mobile terminal with it is embedded
System is become as active demand.
But for the neural network model of deep learning in actual deployment, moulded dimension is usually excessive under normal conditions, one
As in the case of, neural network model is differed from tens to up to a hundred million, such file size, for mobile terminal, download
When the flow and bandwidth contributions that expend caused by transmission latency it is long be that user is intolerable, it is and embedding for some
Embedded system, memory space are very limited, may store so big nerve net without enough memory spaces at all
Network model file.
Meanwhile computing resource and computing capability are required high.It is mobile when large-scale neural network model is used to be calculated
End and embedded system or computing resource needed for it can not be provided or calculate slow, lead to operating lag too Gao Erwu
Method meets practical application scene.
In addition, neural network model power consumption is also big.During neural computing, processor needs frequently to read god
Parameter through network model, therefore larger neural network model also accordingly brings higher internal storage access number, and it is frequent
Power consumption can be also greatly improved in internal storage access.
Although common model compression method reduces moulded dimension, and model parameter is taken the mode of sparse matrix into
Row storage, but the precision of model is also inevitably declined.In addition compression method also takes to compressed model weight
Newly trained method reduces the loss of model accuracy, but in the case where the operational performance for being using model reasoning prediction has significantly
Drop.
Therefore, how to ensure the precision of neural network model while neural network model is compressed, be those skilled in the art
Problem to be solved.
Invention content
The purpose of the present invention is to provide a kind of neural network model compression method, system, device and computer-readable deposit
Storage media, to ensure the precision of neural network model while neural network model is compressed.
To achieve the above object, an embodiment of the present invention provides following technical solutions:
A kind of neural network model compression method, including:
Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network mould to be quantified
Type;
The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;
The neural network model to be stored is stored using compressed format.
Wherein, the neural network method of cutting out includes dynamic network trimming algorithm.
Wherein, it is described using neural network method of cutting out treat cut neural network model cut to obtain god to be quantified
Through network model, including:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described
The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model;
S202 utilizes formulaIt updates described in every layer
The weight of each footmark in weight parameter, to obtain updated every layer of weight parameter;WhereinRepresent nerve to be cut out
Footmark is the weight coefficient of (i, j) in network kth layer;Represent that footmark is the weight of (i, j) in neural network kth layer
First two-value mask;β is positive learning rate;L () represents loss function;⊙ represents Hadamard Product Operators;I represents weight
Coefficient matrix WkFootmark range;
S203 utilizes formulaUpdate every layer of weight parameter
In each footmark weight two-value mask, to obtain updated first two-value mask code matrix corresponding with every layer of weight parameter;
Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueAbsolute value be less than akWhen, then two-value
MaskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskIt is updated to 1;WhenIt is absolute
Value is between akWith bkBetween when, thenValue do not do and update;
S204 updates iterations and learning rate according to predetermined manner;
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, using this more
The every layer of weight parameter obtained after new and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update
Determine neural network model to be quantified.
Wherein, it is described that the neural network model to be quantified is quantified to obtain nerve net to be stored using INQ algorithms
Network model, including:
S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified
The weight parameter of the beginningization reference model;By the second two-value mask corresponding with every layer of weight parameter in the reference model
Matrix initialisation is 1;
S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale
Weights group and weights group to be trained;
S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization
Battle array is updated to 0, and update quantitative rate;Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights
Ratio in number;
S304 updates the weight parameter of weights group to be trained;
S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%;If so, determine institute
There is the weight parameter after quantization to determine neural network model to be stored;If it is not, then return to S302.
Wherein, it is described that the neural network model to be stored is stored using compressed format, including:It is deposited according to predetermined bit position
Store up the weight parameter of the neural network model to be stored.
In order to solve the above technical problems, the present invention also provides a kind of neural network model compressibility, including:
Module is cut, is cut to obtain the amount for the treatment of for treating cutting neural network model using neural network method of cutting out
Change neural network model;
Quantization modules, for being quantified to obtain god to be stored to the neural network model to be quantified using INQ algorithms
Through network model;
Memory module, for storing the neural network model to be stored using compressed format.
Wherein, the neural network method of cutting out includes dynamic network trimming algorithm.
Wherein, the memory module, specifically for storing the neural network model to be stored according to predetermined bit position
Weight parameter.
The present invention also provides a kind of neural network model compression set, including:
Memory, for storing computer program;
The step of processor, for performing computer program when, realize the neural network model compression method.
The present invention also provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium
The step of calculation machine program, the computer program realizes neural network model compression method when being executed by processor.
By above scheme it is found that a kind of neural network model compression method provided by the invention, including the use of nerve net
Network method of cutting out treats cutting neural network model and is cut to obtain neural network model to be quantified;Using INQ algorithms to institute
Neural network model to be quantified is stated to be quantified to obtain neural network model to be stored;It is stored using compressed format described to be stored
Neural network model.
It can be seen that a kind of neural network model compression method provided by the invention, by being carried out to neural network model
It cuts, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive;INQ is used to it after cutting simultaneously
Algorithm is quantified, and can effectively ensure that compressed model accuracy does not lose.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of neural network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 2 cuts weight change schematic diagram for a kind of specific DNS disclosed by the embodiments of the present invention;
Fig. 3 is a kind of specific network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 4 is a kind of specific neural network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 5 is a kind of neural network model compressibility structure diagram disclosed by the embodiments of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of neural network model compression method, system, device and computer-readable storages
Medium, to ensure the precision of neural network model while neural network model is compressed.
Referring to Fig. 1, a kind of neural network model compression method provided in an embodiment of the present invention specifically includes:
S101 treats cutting neural network model using neural network method of cutting out and is cut to obtain nerve net to be quantified
Network model.
In the present solution, the neural network model that cutting is treated first with neural network method of cutting out is cut, with
The value of fractional weight parameter in neural network is set to 0, makes these weights into useless weight, so as in propagated forward process
In, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned fractional weight parameter is general
The weight parameter petite for absolute value.
As preference, neural network method of cutting out can select DNS (Dynamic Network Surgery, Dynamic Networks
Network is cut) algorithm.
Specifically, it treats to cut and be cut after neural network model is cut as a result, and it is to be quantified to be used as it
Neural network model, and then quantification treatment is carried out to it.
S102 quantifies the neural network model to be quantified using INQ algorithms to obtain neural network mould to be stored
Type.
Using neural network parameter quantization method to after the cutting that is obtained in S101 as a result, neural network mould i.e. to be quantified
The quantized result that type is quantified can be carried out storing.As preference, INQ may be used in the method for quantization
(Incremental Network Quantization, cumulative network quantization) algorithm, the neural network parameter amount finally obtained
It is after change as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.
It should be noted that INQ technologies propose gradual neural network quantification thought, core is the introduction of parameter point
Three kinds of group, quantization and retraining operations.Each layer parameter in full precision floating-point network model is divided into two groups first in the implementation,
Parameter in first group directly will be quantified and be fixed, and the parameter in another group will compensate quantization to model by retraining
Caused by loss of significance.Then, it operates the full precision floating point parameters after iterated application successively to completion retraining for above-mentioned three kinds
Part, until model quantifies completely.It is grouped by ingenious coupling parameter, quantization and retraining operate, it is suppressed that model value
Performance loss caused by change, so as to be suitable for the neural network model of arbitrary structures in practice.
In addition, INQ technologies, during model quantization, all parameters are constrained to binary representation, and comprising zero, this
Kind quantization is so that last model is highly suitable for disposing on hardware and accelerate.Such as on FPGA, complicated full precision floating-point
Multiplying will directly be replaced with simple shifting function.
S103 stores the neural network model to be stored using default compressed format.
Specifically, its quantized result can be stored after quantifying to neural network model, is deposited using compression
Storage form stores the parameter value after quantization.As preference, compression storage format can be the storage of low bit position.
For example, bit is preset as 4, storage form is as shown in table 1.Wherein actual value is the power of network model to be stored
Weight parameter value, i.e., the weight parameter value after quantization, 4bit are expressed as corresponding 4bit of the value and represent corresponding value.
Table 1
4bit expressions | Actual value | 4bit expressions | Actual value |
0000 | 0.00 | 1000 | 2-1 |
0001 | -2-1 | 1001 | 2-2 |
0010 | -2-2 | 1010 | 2-3 |
0011 | -2-3 | 1011 | 2-4 |
0100 | -2-4 | 1100 | 2-5 |
0101 | -2-5 | 1101 | 2-6 |
0110 | -2-6 | 1110 | 2-7 |
0111 | -2-7 | 1111 | Without correspondence |
It can be seen that a kind of neural network model compression method provided in an embodiment of the present invention, by neural network mould
Type is cut, while INQ algorithms is used to quantify to it after cutting, and is effectively ensureing that compressed model accuracy do not have
In the case of lossy, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.
The embodiment of the present invention provides a kind of specific neural network model compression method, is different from above-described embodiment, this hair
Bright embodiment has been S101 in above-described embodiment further restriction and explanation, other content and above-described embodiment substantially phase
Together, above-described embodiment can be specifically referred to, details are not described herein again.Referring to Fig. 2 and Fig. 3, S101 is specifically included:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described
The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model.
It should be noted that the concept cut refers to fractional weight parameter in neural network being set to 0.Cut weight variation
As shown in figure 3, in Fig. 3 from a left side to again be followed successively by initial network model to be cut;It is determined that weight to be cut is waited to cut out
Network model, wherein the weight filled, which is identified as, is confirmed as weight to be cut out;Network model after being cut, will
Weight to be cut out is set to 0.
Specifically, first, prepare training dataset X, with reference modelI.e. trained intact nervous
Network model.Wherein, C is the number of plies of neural network model,It is the model parameter of kth layer;It needs to set hyper parameter simultaneously,
Such as including learning rate, learning rate update rule controls parameter of cutting rate etc..
It will network model parameter W be cutkIt is initialized asIt will be by two-value mask code matrix TkIt is initialized as the range 0 of 1, k
The number of plies of≤k≤C, C for neural network model, TkRepresent weight two-value mask of the footmark for (i, j) in neural network kth layer,
That is mask blob, value represent that its corresponding weight is deleted for 0 or 1,0, and 1 represents that its corresponding weight is retained, TkShape
Shape size and WkIt is identical.
S202 utilizes formulaUpdate every layer of weight
Parameter;WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out;Represent nerve net
First two-value mask of the footmark for the weight of (i, j) in network kth layer;β is positive learning rate;L () represents loss function;⊙ tables
Show Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range.
A part of data are chosen from training dataset X, by (the W of each layer0⊙T0),...,(WC⊙TC) as preceding to biography
The weight broadcast, wherein ⊙ are Hadamard Product Operators, and calculate the penalty values of propagated forward;The gradient of counting loss functionAnd backpropagation is carried out, with the weight matrix W to each layerkIt is updated.WkUpdate mode it is as follows:
Wherein,Represent weight coefficient of the footmark for (i, j) in neural network kth layer;Represent neural network
Footmark is the weight two-value mask of (i, j) in kth layer;β is positive learning rate;L () represents loss function;⊙ is represented
Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range.
S203 utilizes formulaEvery layer of weight parameter is updated to correspond to
The first two-value mask code matrix;Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueIt is exhausted
A is less than to valuekWhen, then two-value maskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskMore
New is 1;WhenAbsolute value between akWith bkBetween when, thenValue do not do and update.
It should be noted that ak< bk, respectively judge the whether newer boundary of two-value mask.Function hk() represents, such as
Fruit weightsAbsolute value be less than ak, then two-value maskBecome 0, it is meant thatIt will be cut;IfAbsolute value be more than bk, then two-value maskBecome 1, it is meant thatIt will be retained;IfIt is exhausted
To being worth between akWith bkBetween, thenValue it is temporarily constant, it is meant thatWhether it is retained and depends onBefore update
Value.
S204 updates iterations and learning rate according to predetermined manner.
Specifically, Policy Updates learning rate is updated, while iterations are carried out more according to learning rate preset in S201
Newly, such as iterations add 1.
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, using this more
The every layer of weight parameter obtained after new and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update
Determine neural network model to be quantified.
Specifically, judge whether current iterations are more than preset value, if it is not, continuing iteration, return to S201,
If being more than preset value, the weight parameter { W after cutting is exportedk:0≤k≤C } and its corresponding two-value mask square
Battle array { Tk:0≤k≤C }, determine neural network model to be quantified using the weight parameter and two-value mask code matrix.
It can be seen that a kind of specific neural network model compression method provided in an embodiment of the present invention, by using DNS
Neural network to be cut out is cut out by algorithm, so as to which the value of fractional weight parameter is set to 0, prevents it preceding to biography from influencing
The prediction result of neural network during broadcasting.Model parameter is greatly reduced, therefore the requirement to computing resource can also reduce.
A kind of specific neural network model compression method provided in an embodiment of the present invention, is different from above-described embodiment, this
Inventive embodiments have done further restriction and explanation to S102 in above-described embodiment, other step contents and above-described embodiment are big
Cause identical, details are not described herein again.With reference to figure 4, S102 is specifically included:
S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified
The weight parameter of the beginningization reference model;By the second two-value mask corresponding with every layer of weight parameter in the reference model
Value in matrix is initialized as 1.
Specifically, prepare training dataset X, with reference modelI.e. trained intact nervous net
Network model, wherein C are the number of plies of neural network model,It is the model parameter of kth layer;Hyper parameter is set, including quantization scale
Parameter etc..
Weight quantization scale { σ is set1,σ2,…,σN};Initialize the two-value mask code matrix T of each layerkIt is 1, needs what is illustrated
It is that the two-value mask code matrix in the embodiment of the present invention is used to represent whether corresponding parameter has been quantized, wherein, 0 represents
It has been be quantized that, 1 represents not to be quantized.
S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale
Weights group and weights group to be trained.
Specifically, initialization needs the weights group that each layer of neural network is quantizedInitialization needs to be instructed again
Experienced weights groupFor l layers of neural network, weights grouping is defined as follows:
Wherein,Represent the weights group that will be quantized,Expression needs the weights group of retraining.Weights are grouped
When, Tl(i, j)=0 representsTl(i, j)=1 represents
According to weights quantization scale { σ1,σ2,…,σN, to neural network, each layer of weight parameter is divided intoWithAnd update its corresponding two-value mask code matrix Tk;
According to each layer quantization weight groupIn weighted value range, determine quantized values set Pl:
Wherein, n1With n2It is integer, and meets n2≤n1, in this way, n1With n2By WlIn nonzero term constrain inOrRange, WlMiddle absolute value is less thanValue will be cut.It is low in INQ algorithms
Bit quantization bit number b used is set in advance, therefore need to only calculate n1, then n2It can be according to n1It is calculated with b
Go out.n1Calculation formula it is as follows:
Wherein, the downward rounding of floor () function representation, max () function representation take the maximum value of all input elements,
Abs () function representation takes absolute value to each element.Obtain n1Later, n can be obtained2=n1+2-2(b-1)。
S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization
Battle array is updated to 0, and update quantitative rate;Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights
Ratio in number.
Specifically, it is right according to following formulaIn weight parameter quantified:
Wherein, α and β is set PlIn adjacent element.
S304, the weight parameter of re -training weights group to be trained.
Specifically, using updated weight parameter progress propagated forward is quantified, then in backpropagation according to following
Formula (stochastic gradient descent method) updatesIn weight parameter.
Wherein, γ is positive learning rate, and L is loss function, Tl(i, j) is two-value mask code matrix, shape size and Wl
(i, j) is completely the same, and value is 0 or 1.During right value update, Tl(i, j)=0 represents corresponding weights Wl(i, j) has been quantized,
Without update;Tl(i, j)=1 represents corresponding weights Wl(i, j) is not quantized, need to normally be updated.
S305 judges whether current iteration number reaches predetermined threshold value and the quantitative rate reaches 100%;If so, really
Weight parameter after fixed all quantizations is with determining neural network model to be stored;If it is not, it then updates iterations and returns to S302.
Specifically, judge whether current iteration number has reached predetermined threshold value, and quantitative rate has reached 100%,
Namely TkIt is 0, if so, being no longer iterated, the weight parameter { W after output quantizationk: 0≤k≤C }, and weight is joined
Several values is 2 whole power or 0.
It can be seen that a kind of specific neural network compression method provided in an embodiment of the present invention, can utilize INQ algorithms
Neural network to be quantified is quantified, by the method for re -training repeatedly during quantization, ensure that compressed
Model accuracy does not lose.
A kind of neural network compressibility provided in an embodiment of the present invention is introduced below, a kind of god described below
It can be cross-referenced through Web compression system and a kind of above-described neural network compression method.
Referring to Fig. 5, a kind of neural network compressibility provided in an embodiment of the present invention specifically includes:
Module 401 is cut, is cut to obtain for treating cutting neural network model using neural network method of cutting out
Neural network model to be quantified.
In the present solution, cutting the neural network model that module 401 treats first with neural network method of cutting out cutting
It is cut, the value of the fractional weight parameter in neural network is set to 0, make these weights into useless weight, thus
During propagated forward, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned part
Weight parameter is generally the smaller weight parameter of absolute value.
As preference, neural network method of cutting out can select DNS (Dynamic Network Surgery, Dynamic Networks
Network is cut) algorithm.
Specifically, it cuts module 401 and treats to cut and cut after neural network model is cut as a result, and making its work
For for neural network model to be quantified, and then quantification treatment is carried out to it.
Quantization modules 402, it is to be stored for being quantified to obtain to the neural network model to be quantified using INQ algorithms
Neural network model.
Quantization modules 402 using neural network parameter quantization method to after cutting the obtained cutting of module 401 as a result, i.e.
The quantized result that neural network model to be quantified is quantified can be carried out storing.As preference, the method for quantization
INQ (Incremental Network Quantization, cumulative network quantization) algorithm, the nerve finally obtained may be used
Network parameter quantization after as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.
Memory module 403, for storing the neural network model to be stored using default compressed format.
Specifically, memory module 403 can store its quantized result after quantifying to neural network model,
The parameter value after quantization is stored using compression storage format.As preference, compression storage format can be low bit
Position storage.
It can be seen that a kind of neural network model compressibility provided in an embodiment of the present invention, cuts module 401 by right
Neural network model is cut, and can reduce moulded dimension, therefore can solve the problems, such as that consuming resource is excessive;It cuts simultaneously
Later INQ algorithms is used to quantify it using quantization modules 402, can effectively ensures that compressed model accuracy does not have
Loss.
A kind of neural network model compression set provided in an embodiment of the present invention is introduced below, described below one
Kind neural network model compression set can be cross-referenced with a kind of above-described neural network model compression method.
A kind of neural network model compression set provided in an embodiment of the present invention, specifically includes:
Memory, for storing computer program;
Processor, for performing computer program when, realize that neural network model described in any of the above-described embodiment compresses
The step of method.
A kind of computer readable storage medium provided in an embodiment of the present invention is introduced below, one kind described below
Computer readable storage medium can be cross-referenced with a kind of above-described neural network model compression method.
Computer program, the computer are stored on a kind of computer readable storage medium provided in an embodiment of the present invention
It is realized when program is executed by processor as described in above-mentioned any embodiment the step of neural network model compression method.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other
The difference of embodiment, just to refer each other for identical similar portion between each embodiment.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention.
A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one
The most wide range caused.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other
The difference of embodiment, just to refer each other for identical similar portion between each embodiment.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention.
A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one
The most wide range caused.
Claims (10)
1. a kind of neural network model compression method, which is characterized in that including:
Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network model to be quantified;
The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;
The neural network model to be stored is stored using default compressed format.
2. according to the method described in claim 1, it is characterized in that, the neural network method of cutting out is cut including dynamic network
Algorithm.
3. according to the method described in claim 2, it is characterized in that, described treat cutting nerve using neural network method of cutting out
Network model is cut to obtain neural network model to be quantified, including:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut network mould is cut wherein described
Value in type in the corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1;
S202 utilizes formulaUpdate every layer of weight ginseng
Number;WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out;Represent neural network
First two-value mask of the footmark for the weight of (i, j) in kth layer;β is positive learning rate;L () represents loss function;⊙ is represented
Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range;
S203 utilizes formulaUpdate every layer of weight parameter corresponding
One two-value mask code matrix;Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueAbsolute value
Less than akWhen, then two-value maskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskIt is updated to
1;WhenAbsolute value between akWith bkBetween when, thenValue do not do and update;
S204 updates iterations and learning rate according to predetermined manner;
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, after using this update
Every layer of obtained weight parameter and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update determine
Neural network model to be quantified.
4. according to the method described in claim 1, it is characterized in that, described utilize INQ algorithms to the neural network to be quantified
Model is quantified to obtain neural network model to be stored, including:
S301 determines the second training set and reference model, is initialized using the weight parameter of the neural network model to be quantified
The weight parameter of the reference model;By the second two-value mask code matrix corresponding with every layer of weight parameter in the reference model
It is initialized as 1;
S302 determines weights to be quantified according to default weight quantization scale in weight parameter of the second two-value mask code matrix for 1
Group and weights group to be trained;
S303 quantifies the weights group to be quantified, by the corresponding two-value mask code matrix of weight parameter in the weights group after quantization more
New is 0, and update quantitative rate;Wherein, the quantitative rate be two-value mask code matrix be 0 weight parameter in all weight parameters
Ratio;
S304, the weight parameter of re -training weights group to be trained;
S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%;If so, determine all amounts
Weight parameter after change is with determining neural network model to be stored;If it is not, then return to S302.
5. method as claimed in any of claims 1 to 4, which is characterized in that described that institute is stored using compressed format
Neural network model to be stored is stated, including:The weight parameter of the neural network model to be stored is stored according to predetermined bit position.
6. a kind of neural network model compressibility, which is characterized in that including:
Module is cut, is cut to obtain god to be quantified for treating cutting neural network model using neural network method of cutting out
Through network model;
Quantization modules, for being quantified to obtain nerve net to be stored to the neural network model to be quantified using INQ algorithms
Network model;
Memory module, for storing the neural network model to be stored using compressed format.
7. according to the method described in claim 6, it is characterized in that, the neural network method of cutting out is cut including dynamic network
Algorithm.
8. the system described according to claim 6 or 7, which is characterized in that the memory module, specifically for according to predetermined bit
Position stores the weight parameter of the neural network model to be stored.
9. a kind of neural network model compression set, which is characterized in that including:
Memory, for storing computer program;
Processor realizes the neural network model pressure as described in any one of claim 1 to 5 during for performing the computer program
The step of contracting method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes that neural network model compresses as described in any one of claim 1 to 5 when the computer program is executed by processor
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711465541.1A CN108229681A (en) | 2017-12-28 | 2017-12-28 | A kind of neural network model compression method, system, device and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711465541.1A CN108229681A (en) | 2017-12-28 | 2017-12-28 | A kind of neural network model compression method, system, device and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108229681A true CN108229681A (en) | 2018-06-29 |
Family
ID=62646571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711465541.1A Pending CN108229681A (en) | 2017-12-28 | 2017-12-28 | A kind of neural network model compression method, system, device and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108229681A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962247A (en) * | 2018-08-13 | 2018-12-07 | 南京邮电大学 | Based on gradual neural network multidimensional voice messaging identifying system and its method |
CN109344893A (en) * | 2018-09-25 | 2019-02-15 | 华中师范大学 | A kind of image classification method and system based on mobile terminal |
CN109376854A (en) * | 2018-11-02 | 2019-02-22 | 矽魅信息科技(上海)有限公司 | More truth of a matter logarithmic quantization method and devices for deep neural network |
CN109634401A (en) * | 2018-12-29 | 2019-04-16 | 联想(北京)有限公司 | A kind of control method and electronic equipment |
CN109635935A (en) * | 2018-12-29 | 2019-04-16 | 北京航空航天大学 | Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould |
CN109766993A (en) * | 2018-12-13 | 2019-05-17 | 浙江大学 | A kind of convolutional neural networks compression method of suitable hardware |
CN109978144A (en) * | 2019-03-29 | 2019-07-05 | 联想(北京)有限公司 | A kind of model compression method and system |
CN110245753A (en) * | 2019-05-27 | 2019-09-17 | 东南大学 | A kind of neural network compression method based on power exponent quantization |
WO2020019236A1 (en) * | 2018-07-26 | 2020-01-30 | Intel Corporation | Loss-error-aware quantization of a low-bit neural network |
CN110782021A (en) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | Image classification method, device, equipment and computer readable storage medium |
CN110866603A (en) * | 2018-12-29 | 2020-03-06 | 中科寒武纪科技股份有限公司 | Data processing method and processor |
CN110929837A (en) * | 2018-09-19 | 2020-03-27 | 北京搜狗科技发展有限公司 | Neural network model compression method and device |
CN111191784A (en) * | 2018-11-14 | 2020-05-22 | 辉达公司 | Transposed sparse matrix multiplied by dense matrix for neural network training |
WO2020133364A1 (en) * | 2018-12-29 | 2020-07-02 | 华为技术有限公司 | Neural network compression method and apparatus |
CN111598227A (en) * | 2020-05-20 | 2020-08-28 | 字节跳动有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN112085186A (en) * | 2019-06-12 | 2020-12-15 | 上海寒武纪信息科技有限公司 | Neural network quantitative parameter determination method and related product |
WO2021143070A1 (en) * | 2020-01-16 | 2021-07-22 | 北京智芯微电子科技有限公司 | Compression method and apparatus for deep neural network model, and storage medium |
CN113298248A (en) * | 2020-07-20 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Processing method and device for neural network model and electronic equipment |
CN113642710A (en) * | 2021-08-16 | 2021-11-12 | 北京百度网讯科技有限公司 | Network model quantification method, device, equipment and storage medium |
CN109086819B (en) * | 2018-07-26 | 2023-12-05 | 北京京东尚科信息技术有限公司 | Method, system, equipment and medium for compressing caffemul model |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106557812A (en) * | 2016-11-21 | 2017-04-05 | 北京大学 | The compression of depth convolutional neural networks and speeding scheme based on dct transform |
-
2017
- 2017-12-28 CN CN201711465541.1A patent/CN108229681A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106557812A (en) * | 2016-11-21 | 2017-04-05 | 北京大学 | The compression of depth convolutional neural networks and speeding scheme based on dct transform |
Non-Patent Citations (4)
Title |
---|
AOJUN ZHOU 等: "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights", 《ARXIV:1702.03044V1》 * |
SONG HAN 等: "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", 《ARXIV: 1510.00149V5》 * |
YIWEN GUO 等: "Dynamic Network Surgery for Efficient DNNs", 《ARXIV:1608.04493V2》 * |
刘南平 等: "《数据通信技术》", 31 July 2004 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020019236A1 (en) * | 2018-07-26 | 2020-01-30 | Intel Corporation | Loss-error-aware quantization of a low-bit neural network |
CN109086819B (en) * | 2018-07-26 | 2023-12-05 | 北京京东尚科信息技术有限公司 | Method, system, equipment and medium for compressing caffemul model |
CN108962247B (en) * | 2018-08-13 | 2023-01-31 | 南京邮电大学 | Multi-dimensional voice information recognition system and method based on progressive neural network |
CN108962247A (en) * | 2018-08-13 | 2018-12-07 | 南京邮电大学 | Based on gradual neural network multidimensional voice messaging identifying system and its method |
CN110929837A (en) * | 2018-09-19 | 2020-03-27 | 北京搜狗科技发展有限公司 | Neural network model compression method and device |
CN109344893A (en) * | 2018-09-25 | 2019-02-15 | 华中师范大学 | A kind of image classification method and system based on mobile terminal |
CN109376854A (en) * | 2018-11-02 | 2019-02-22 | 矽魅信息科技(上海)有限公司 | More truth of a matter logarithmic quantization method and devices for deep neural network |
CN109376854B (en) * | 2018-11-02 | 2022-08-16 | 矽魅信息科技(上海)有限公司 | Multi-base logarithm quantization device for deep neural network |
CN111191784A (en) * | 2018-11-14 | 2020-05-22 | 辉达公司 | Transposed sparse matrix multiplied by dense matrix for neural network training |
CN109766993B (en) * | 2018-12-13 | 2020-12-18 | 浙江大学 | Convolutional neural network compression method suitable for hardware |
CN109766993A (en) * | 2018-12-13 | 2019-05-17 | 浙江大学 | A kind of convolutional neural networks compression method of suitable hardware |
CN110866603A (en) * | 2018-12-29 | 2020-03-06 | 中科寒武纪科技股份有限公司 | Data processing method and processor |
WO2020133364A1 (en) * | 2018-12-29 | 2020-07-02 | 华为技术有限公司 | Neural network compression method and apparatus |
CN110866603B (en) * | 2018-12-29 | 2024-04-16 | 中科寒武纪科技股份有限公司 | Data processing method and processor |
CN109635935A (en) * | 2018-12-29 | 2019-04-16 | 北京航空航天大学 | Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould |
CN113168565A (en) * | 2018-12-29 | 2021-07-23 | 华为技术有限公司 | Neural network compression method and device |
CN109634401B (en) * | 2018-12-29 | 2023-05-02 | 联想(北京)有限公司 | Control method and electronic equipment |
CN109634401A (en) * | 2018-12-29 | 2019-04-16 | 联想(北京)有限公司 | A kind of control method and electronic equipment |
CN109635935B (en) * | 2018-12-29 | 2022-10-14 | 北京航空航天大学 | Model adaptive quantization method of deep convolutional neural network based on modular length clustering |
CN109978144B (en) * | 2019-03-29 | 2021-04-13 | 联想(北京)有限公司 | Model compression method and system |
CN109978144A (en) * | 2019-03-29 | 2019-07-05 | 联想(北京)有限公司 | A kind of model compression method and system |
CN110245753A (en) * | 2019-05-27 | 2019-09-17 | 东南大学 | A kind of neural network compression method based on power exponent quantization |
CN112085186A (en) * | 2019-06-12 | 2020-12-15 | 上海寒武纪信息科技有限公司 | Neural network quantitative parameter determination method and related product |
CN112085186B (en) * | 2019-06-12 | 2024-03-05 | 上海寒武纪信息科技有限公司 | Method for determining quantization parameter of neural network and related product |
CN110782021A (en) * | 2019-10-25 | 2020-02-11 | 浪潮电子信息产业股份有限公司 | Image classification method, device, equipment and computer readable storage medium |
CN110782021B (en) * | 2019-10-25 | 2023-07-14 | 浪潮电子信息产业股份有限公司 | Image classification method, device, equipment and computer readable storage medium |
WO2021143070A1 (en) * | 2020-01-16 | 2021-07-22 | 北京智芯微电子科技有限公司 | Compression method and apparatus for deep neural network model, and storage medium |
CN111598227B (en) * | 2020-05-20 | 2023-11-03 | 字节跳动有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
CN111598227A (en) * | 2020-05-20 | 2020-08-28 | 字节跳动有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN113298248A (en) * | 2020-07-20 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Processing method and device for neural network model and electronic equipment |
CN113642710A (en) * | 2021-08-16 | 2021-11-12 | 北京百度网讯科技有限公司 | Network model quantification method, device, equipment and storage medium |
CN113642710B (en) * | 2021-08-16 | 2023-10-31 | 北京百度网讯科技有限公司 | Quantification method, device, equipment and storage medium of network model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229681A (en) | A kind of neural network model compression method, system, device and readable storage medium storing program for executing | |
CN110378468B (en) | Neural network accelerator based on structured pruning and low bit quantization | |
Sohoni et al. | Low-memory neural network training: A technical report | |
CN109635936A (en) | A kind of neural networks pruning quantization method based on retraining | |
CN112367353A (en) | Mobile edge computing unloading method based on multi-agent reinforcement learning | |
CN106570559A (en) | Data processing method and device based on neural network | |
CN107395211A (en) | A kind of data processing method and device based on convolutional neural networks model | |
CN109886397A (en) | A kind of neural network structure beta pruning compression optimization method for convolutional layer | |
CN110175628A (en) | A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation | |
WO2020238237A1 (en) | Power exponent quantization-based neural network compression method | |
CN106557812A (en) | The compression of depth convolutional neural networks and speeding scheme based on dct transform | |
CN109635935A (en) | Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould | |
CN109635922A (en) | A kind of distribution deep learning parameter quantization communication optimization method and system | |
CN108734264A (en) | Deep neural network model compression method and device, storage medium, terminal | |
CN107886164A (en) | A kind of convolutional neural networks training, method of testing and training, test device | |
CN110751265A (en) | Lightweight neural network construction method and system and electronic equipment | |
CN112508190A (en) | Method, device and equipment for processing structured sparse parameters and storage medium | |
CN112329910A (en) | Deep convolutional neural network compression method for structure pruning combined quantization | |
CN109978144A (en) | A kind of model compression method and system | |
CN109145107A (en) | Subject distillation method, apparatus, medium and equipment based on convolutional neural networks | |
CN106156142B (en) | Text clustering processing method, server and system | |
CN110263917B (en) | Neural network compression method and device | |
CN112598129A (en) | Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator | |
CN115470889A (en) | Network-on-chip autonomous optimal mapping exploration system and method based on reinforcement learning | |
CN112819157B (en) | Neural network training method and device, intelligent driving control method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180629 |
|
RJ01 | Rejection of invention patent application after publication |