CN108229681A - A kind of neural network model compression method, system, device and readable storage medium storing program for executing - Google Patents

A kind of neural network model compression method, system, device and readable storage medium storing program for executing Download PDF

Info

Publication number
CN108229681A
CN108229681A CN201711465541.1A CN201711465541A CN108229681A CN 108229681 A CN108229681 A CN 108229681A CN 201711465541 A CN201711465541 A CN 201711465541A CN 108229681 A CN108229681 A CN 108229681A
Authority
CN
China
Prior art keywords
neural network
network model
quantified
stored
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711465541.1A
Other languages
Chinese (zh)
Inventor
谢启凯
吴韶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201711465541.1A priority Critical patent/CN108229681A/en
Publication of CN108229681A publication Critical patent/CN108229681A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a kind of neural network model compression method, system, device and computer readable storage mediums, and the method includes treating cutting neural network model using neural network method of cutting out to be cut to obtain neural network model to be quantified;The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;The neural network model to be stored is stored using compressed format.It can be seen that, a kind of neural network model compression method provided in an embodiment of the present invention, by being cut to neural network model, INQ algorithms is used to quantify to it after cutting simultaneously, in the case where effectively ensureing that compressed model accuracy does not lose, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.

Description

A kind of neural network model compression method, system, device and readable storage medium storing program for executing
Technical field
The present invention relates to artificial intelligence field, more specifically to a kind of neural network model compression method, system, Device and computer readable storage medium.
Background technology
Current era either in daily life or internet world, all can't steer clear of a word, AI (Artificial Intelligence), i.e. artificial intelligence.The application of AI has penetrated into many aspects, such as recognition of face, Speech recognition, text-processing, weiqi play chess, game fighting, automatic Pilot, picture beautification, lip-read language, even stratum breaking is imitative True simulation etc..At many aspects, the ability of accuracy and process problem alreadys exceed the mankind, therefore, has very wide Wealthy application prospect and imagination space.In the algorithmic technique in AI fields, deep learning is competing in ImageNet since 2012 Won the championship in match with absolute advantage, just caused the extensive concern of academia and industrial quarters, scientists from all over the world, researcher, Enterprise, Web Community are all in the research and development for the neural network model for studying and pushing deep learning energetically.
As deep learning makes a breakthrough progress in every field, the demand of real life scene is applied it to also more Strongly, especially in today, mobile and portable electronic device greatly facilitates people's life, and deep learning will be carried greatly These high equipment it is intelligent with it is recreational.Therefore, by the neural network model of deep learning be deployed in mobile terminal with it is embedded System is become as active demand.
But for the neural network model of deep learning in actual deployment, moulded dimension is usually excessive under normal conditions, one As in the case of, neural network model is differed from tens to up to a hundred million, such file size, for mobile terminal, download When the flow and bandwidth contributions that expend caused by transmission latency it is long be that user is intolerable, it is and embedding for some Embedded system, memory space are very limited, may store so big nerve net without enough memory spaces at all Network model file.
Meanwhile computing resource and computing capability are required high.It is mobile when large-scale neural network model is used to be calculated End and embedded system or computing resource needed for it can not be provided or calculate slow, lead to operating lag too Gao Erwu Method meets practical application scene.
In addition, neural network model power consumption is also big.During neural computing, processor needs frequently to read god Parameter through network model, therefore larger neural network model also accordingly brings higher internal storage access number, and it is frequent Power consumption can be also greatly improved in internal storage access.
Although common model compression method reduces moulded dimension, and model parameter is taken the mode of sparse matrix into Row storage, but the precision of model is also inevitably declined.In addition compression method also takes to compressed model weight Newly trained method reduces the loss of model accuracy, but in the case where the operational performance for being using model reasoning prediction has significantly Drop.
Therefore, how to ensure the precision of neural network model while neural network model is compressed, be those skilled in the art Problem to be solved.
Invention content
The purpose of the present invention is to provide a kind of neural network model compression method, system, device and computer-readable deposit Storage media, to ensure the precision of neural network model while neural network model is compressed.
To achieve the above object, an embodiment of the present invention provides following technical solutions:
A kind of neural network model compression method, including:
Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network mould to be quantified Type;
The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;
The neural network model to be stored is stored using compressed format.
Wherein, the neural network method of cutting out includes dynamic network trimming algorithm.
Wherein, it is described using neural network method of cutting out treat cut neural network model cut to obtain god to be quantified Through network model, including:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model;
S202 utilizes formulaIt updates described in every layer The weight of each footmark in weight parameter, to obtain updated every layer of weight parameter;WhereinRepresent nerve to be cut out Footmark is the weight coefficient of (i, j) in network kth layer;Represent that footmark is the weight of (i, j) in neural network kth layer First two-value mask;β is positive learning rate;L () represents loss function;⊙ represents Hadamard Product Operators;I represents weight Coefficient matrix WkFootmark range;
S203 utilizes formulaUpdate every layer of weight parameter In each footmark weight two-value mask, to obtain updated first two-value mask code matrix corresponding with every layer of weight parameter; Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueAbsolute value be less than akWhen, then two-value MaskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskIt is updated to 1;WhenIt is absolute Value is between akWith bkBetween when, thenValue do not do and update;
S204 updates iterations and learning rate according to predetermined manner;
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, using this more The every layer of weight parameter obtained after new and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update Determine neural network model to be quantified.
Wherein, it is described that the neural network model to be quantified is quantified to obtain nerve net to be stored using INQ algorithms Network model, including:
S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified The weight parameter of the beginningization reference model;By the second two-value mask corresponding with every layer of weight parameter in the reference model Matrix initialisation is 1;
S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale Weights group and weights group to be trained;
S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization Battle array is updated to 0, and update quantitative rate;Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights Ratio in number;
S304 updates the weight parameter of weights group to be trained;
S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%;If so, determine institute There is the weight parameter after quantization to determine neural network model to be stored;If it is not, then return to S302.
Wherein, it is described that the neural network model to be stored is stored using compressed format, including:It is deposited according to predetermined bit position Store up the weight parameter of the neural network model to be stored.
In order to solve the above technical problems, the present invention also provides a kind of neural network model compressibility, including:
Module is cut, is cut to obtain the amount for the treatment of for treating cutting neural network model using neural network method of cutting out Change neural network model;
Quantization modules, for being quantified to obtain god to be stored to the neural network model to be quantified using INQ algorithms Through network model;
Memory module, for storing the neural network model to be stored using compressed format.
Wherein, the neural network method of cutting out includes dynamic network trimming algorithm.
Wherein, the memory module, specifically for storing the neural network model to be stored according to predetermined bit position Weight parameter.
The present invention also provides a kind of neural network model compression set, including:
Memory, for storing computer program;
The step of processor, for performing computer program when, realize the neural network model compression method.
The present invention also provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium The step of calculation machine program, the computer program realizes neural network model compression method when being executed by processor.
By above scheme it is found that a kind of neural network model compression method provided by the invention, including the use of nerve net Network method of cutting out treats cutting neural network model and is cut to obtain neural network model to be quantified;Using INQ algorithms to institute Neural network model to be quantified is stated to be quantified to obtain neural network model to be stored;It is stored using compressed format described to be stored Neural network model.
It can be seen that a kind of neural network model compression method provided by the invention, by being carried out to neural network model It cuts, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive;INQ is used to it after cutting simultaneously Algorithm is quantified, and can effectively ensure that compressed model accuracy does not lose.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of neural network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 2 cuts weight change schematic diagram for a kind of specific DNS disclosed by the embodiments of the present invention;
Fig. 3 is a kind of specific network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 4 is a kind of specific neural network model compression method flow chart disclosed by the embodiments of the present invention;
Fig. 5 is a kind of neural network model compressibility structure diagram disclosed by the embodiments of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of neural network model compression method, system, device and computer-readable storages Medium, to ensure the precision of neural network model while neural network model is compressed.
Referring to Fig. 1, a kind of neural network model compression method provided in an embodiment of the present invention specifically includes:
S101 treats cutting neural network model using neural network method of cutting out and is cut to obtain nerve net to be quantified Network model.
In the present solution, the neural network model that cutting is treated first with neural network method of cutting out is cut, with The value of fractional weight parameter in neural network is set to 0, makes these weights into useless weight, so as in propagated forward process In, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned fractional weight parameter is general The weight parameter petite for absolute value.
As preference, neural network method of cutting out can select DNS (Dynamic Network Surgery, Dynamic Networks Network is cut) algorithm.
Specifically, it treats to cut and be cut after neural network model is cut as a result, and it is to be quantified to be used as it Neural network model, and then quantification treatment is carried out to it.
S102 quantifies the neural network model to be quantified using INQ algorithms to obtain neural network mould to be stored Type.
Using neural network parameter quantization method to after the cutting that is obtained in S101 as a result, neural network mould i.e. to be quantified The quantized result that type is quantified can be carried out storing.As preference, INQ may be used in the method for quantization (Incremental Network Quantization, cumulative network quantization) algorithm, the neural network parameter amount finally obtained It is after change as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.
It should be noted that INQ technologies propose gradual neural network quantification thought, core is the introduction of parameter point Three kinds of group, quantization and retraining operations.Each layer parameter in full precision floating-point network model is divided into two groups first in the implementation, Parameter in first group directly will be quantified and be fixed, and the parameter in another group will compensate quantization to model by retraining Caused by loss of significance.Then, it operates the full precision floating point parameters after iterated application successively to completion retraining for above-mentioned three kinds Part, until model quantifies completely.It is grouped by ingenious coupling parameter, quantization and retraining operate, it is suppressed that model value Performance loss caused by change, so as to be suitable for the neural network model of arbitrary structures in practice.
In addition, INQ technologies, during model quantization, all parameters are constrained to binary representation, and comprising zero, this Kind quantization is so that last model is highly suitable for disposing on hardware and accelerate.Such as on FPGA, complicated full precision floating-point Multiplying will directly be replaced with simple shifting function.
S103 stores the neural network model to be stored using default compressed format.
Specifically, its quantized result can be stored after quantifying to neural network model, is deposited using compression Storage form stores the parameter value after quantization.As preference, compression storage format can be the storage of low bit position.
For example, bit is preset as 4, storage form is as shown in table 1.Wherein actual value is the power of network model to be stored Weight parameter value, i.e., the weight parameter value after quantization, 4bit are expressed as corresponding 4bit of the value and represent corresponding value.
Table 1
4bit expressions Actual value 4bit expressions Actual value
0000 0.00 1000 2-1
0001 -2-1 1001 2-2
0010 -2-2 1010 2-3
0011 -2-3 1011 2-4
0100 -2-4 1100 2-5
0101 -2-5 1101 2-6
0110 -2-6 1110 2-7
0111 -2-7 1111 Without correspondence
It can be seen that a kind of neural network model compression method provided in an embodiment of the present invention, by neural network mould Type is cut, while INQ algorithms is used to quantify to it after cutting, and is effectively ensureing that compressed model accuracy do not have In the case of lossy, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.
The embodiment of the present invention provides a kind of specific neural network model compression method, is different from above-described embodiment, this hair Bright embodiment has been S101 in above-described embodiment further restriction and explanation, other content and above-described embodiment substantially phase Together, above-described embodiment can be specifically referred to, details are not described herein again.Referring to Fig. 2 and Fig. 3, S101 is specifically included:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model.
It should be noted that the concept cut refers to fractional weight parameter in neural network being set to 0.Cut weight variation As shown in figure 3, in Fig. 3 from a left side to again be followed successively by initial network model to be cut;It is determined that weight to be cut is waited to cut out Network model, wherein the weight filled, which is identified as, is confirmed as weight to be cut out;Network model after being cut, will Weight to be cut out is set to 0.
Specifically, first, prepare training dataset X, with reference modelI.e. trained intact nervous Network model.Wherein, C is the number of plies of neural network model,It is the model parameter of kth layer;It needs to set hyper parameter simultaneously, Such as including learning rate, learning rate update rule controls parameter of cutting rate etc..
It will network model parameter W be cutkIt is initialized asIt will be by two-value mask code matrix TkIt is initialized as the range 0 of 1, k The number of plies of≤k≤C, C for neural network model, TkRepresent weight two-value mask of the footmark for (i, j) in neural network kth layer, That is mask blob, value represent that its corresponding weight is deleted for 0 or 1,0, and 1 represents that its corresponding weight is retained, TkShape Shape size and WkIt is identical.
S202 utilizes formulaUpdate every layer of weight Parameter;WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out;Represent nerve net First two-value mask of the footmark for the weight of (i, j) in network kth layer;β is positive learning rate;L () represents loss function;⊙ tables Show Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range.
A part of data are chosen from training dataset X, by (the W of each layer0⊙T0),...,(WC⊙TC) as preceding to biography The weight broadcast, wherein ⊙ are Hadamard Product Operators, and calculate the penalty values of propagated forward;The gradient of counting loss functionAnd backpropagation is carried out, with the weight matrix W to each layerkIt is updated.WkUpdate mode it is as follows:
Wherein,Represent weight coefficient of the footmark for (i, j) in neural network kth layer;Represent neural network Footmark is the weight two-value mask of (i, j) in kth layer;β is positive learning rate;L () represents loss function;⊙ is represented Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range.
S203 utilizes formulaEvery layer of weight parameter is updated to correspond to The first two-value mask code matrix;Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueIt is exhausted A is less than to valuekWhen, then two-value maskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskMore New is 1;WhenAbsolute value between akWith bkBetween when, thenValue do not do and update.
It should be noted that ak< bk, respectively judge the whether newer boundary of two-value mask.Function hk() represents, such as Fruit weightsAbsolute value be less than ak, then two-value maskBecome 0, it is meant thatIt will be cut;IfAbsolute value be more than bk, then two-value maskBecome 1, it is meant thatIt will be retained;IfIt is exhausted To being worth between akWith bkBetween, thenValue it is temporarily constant, it is meant thatWhether it is retained and depends onBefore update Value.
S204 updates iterations and learning rate according to predetermined manner.
Specifically, Policy Updates learning rate is updated, while iterations are carried out more according to learning rate preset in S201 Newly, such as iterations add 1.
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, using this more The every layer of weight parameter obtained after new and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update Determine neural network model to be quantified.
Specifically, judge whether current iterations are more than preset value, if it is not, continuing iteration, return to S201, If being more than preset value, the weight parameter { W after cutting is exportedk:0≤k≤C } and its corresponding two-value mask square Battle array { Tk:0≤k≤C }, determine neural network model to be quantified using the weight parameter and two-value mask code matrix.
It can be seen that a kind of specific neural network model compression method provided in an embodiment of the present invention, by using DNS Neural network to be cut out is cut out by algorithm, so as to which the value of fractional weight parameter is set to 0, prevents it preceding to biography from influencing The prediction result of neural network during broadcasting.Model parameter is greatly reduced, therefore the requirement to computing resource can also reduce.
A kind of specific neural network model compression method provided in an embodiment of the present invention, is different from above-described embodiment, this Inventive embodiments have done further restriction and explanation to S102 in above-described embodiment, other step contents and above-described embodiment are big Cause identical, details are not described herein again.With reference to figure 4, S102 is specifically included:
S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified The weight parameter of the beginningization reference model;By the second two-value mask corresponding with every layer of weight parameter in the reference model Value in matrix is initialized as 1.
Specifically, prepare training dataset X, with reference modelI.e. trained intact nervous net Network model, wherein C are the number of plies of neural network model,It is the model parameter of kth layer;Hyper parameter is set, including quantization scale Parameter etc..
Weight quantization scale { σ is set12,…,σN};Initialize the two-value mask code matrix T of each layerkIt is 1, needs what is illustrated It is that the two-value mask code matrix in the embodiment of the present invention is used to represent whether corresponding parameter has been quantized, wherein, 0 represents It has been be quantized that, 1 represents not to be quantized.
S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale Weights group and weights group to be trained.
Specifically, initialization needs the weights group that each layer of neural network is quantizedInitialization needs to be instructed again Experienced weights groupFor l layers of neural network, weights grouping is defined as follows:
Wherein,Represent the weights group that will be quantized,Expression needs the weights group of retraining.Weights are grouped When, Tl(i, j)=0 representsTl(i, j)=1 represents
According to weights quantization scale { σ12,…,σN, to neural network, each layer of weight parameter is divided intoWithAnd update its corresponding two-value mask code matrix Tk
According to each layer quantization weight groupIn weighted value range, determine quantized values set Pl
Wherein, n1With n2It is integer, and meets n2≤n1, in this way, n1With n2By WlIn nonzero term constrain inOrRange, WlMiddle absolute value is less thanValue will be cut.It is low in INQ algorithms Bit quantization bit number b used is set in advance, therefore need to only calculate n1, then n2It can be according to n1It is calculated with b Go out.n1Calculation formula it is as follows:
Wherein, the downward rounding of floor () function representation, max () function representation take the maximum value of all input elements, Abs () function representation takes absolute value to each element.Obtain n1Later, n can be obtained2=n1+2-2(b-1)
S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization Battle array is updated to 0, and update quantitative rate;Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights Ratio in number.
Specifically, it is right according to following formulaIn weight parameter quantified:
Wherein, α and β is set PlIn adjacent element.
S304, the weight parameter of re -training weights group to be trained.
Specifically, using updated weight parameter progress propagated forward is quantified, then in backpropagation according to following Formula (stochastic gradient descent method) updatesIn weight parameter.
Wherein, γ is positive learning rate, and L is loss function, Tl(i, j) is two-value mask code matrix, shape size and Wl (i, j) is completely the same, and value is 0 or 1.During right value update, Tl(i, j)=0 represents corresponding weights Wl(i, j) has been quantized, Without update;Tl(i, j)=1 represents corresponding weights Wl(i, j) is not quantized, need to normally be updated.
S305 judges whether current iteration number reaches predetermined threshold value and the quantitative rate reaches 100%;If so, really Weight parameter after fixed all quantizations is with determining neural network model to be stored;If it is not, it then updates iterations and returns to S302.
Specifically, judge whether current iteration number has reached predetermined threshold value, and quantitative rate has reached 100%, Namely TkIt is 0, if so, being no longer iterated, the weight parameter { W after output quantizationk: 0≤k≤C }, and weight is joined Several values is 2 whole power or 0.
It can be seen that a kind of specific neural network compression method provided in an embodiment of the present invention, can utilize INQ algorithms Neural network to be quantified is quantified, by the method for re -training repeatedly during quantization, ensure that compressed Model accuracy does not lose.
A kind of neural network compressibility provided in an embodiment of the present invention is introduced below, a kind of god described below It can be cross-referenced through Web compression system and a kind of above-described neural network compression method.
Referring to Fig. 5, a kind of neural network compressibility provided in an embodiment of the present invention specifically includes:
Module 401 is cut, is cut to obtain for treating cutting neural network model using neural network method of cutting out Neural network model to be quantified.
In the present solution, cutting the neural network model that module 401 treats first with neural network method of cutting out cutting It is cut, the value of the fractional weight parameter in neural network is set to 0, make these weights into useless weight, thus During propagated forward, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned part Weight parameter is generally the smaller weight parameter of absolute value.
As preference, neural network method of cutting out can select DNS (Dynamic Network Surgery, Dynamic Networks Network is cut) algorithm.
Specifically, it cuts module 401 and treats to cut and cut after neural network model is cut as a result, and making its work For for neural network model to be quantified, and then quantification treatment is carried out to it.
Quantization modules 402, it is to be stored for being quantified to obtain to the neural network model to be quantified using INQ algorithms Neural network model.
Quantization modules 402 using neural network parameter quantization method to after cutting the obtained cutting of module 401 as a result, i.e. The quantized result that neural network model to be quantified is quantified can be carried out storing.As preference, the method for quantization INQ (Incremental Network Quantization, cumulative network quantization) algorithm, the nerve finally obtained may be used Network parameter quantization after as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.
Memory module 403, for storing the neural network model to be stored using default compressed format.
Specifically, memory module 403 can store its quantized result after quantifying to neural network model, The parameter value after quantization is stored using compression storage format.As preference, compression storage format can be low bit Position storage.
It can be seen that a kind of neural network model compressibility provided in an embodiment of the present invention, cuts module 401 by right Neural network model is cut, and can reduce moulded dimension, therefore can solve the problems, such as that consuming resource is excessive;It cuts simultaneously Later INQ algorithms is used to quantify it using quantization modules 402, can effectively ensures that compressed model accuracy does not have Loss.
A kind of neural network model compression set provided in an embodiment of the present invention is introduced below, described below one Kind neural network model compression set can be cross-referenced with a kind of above-described neural network model compression method.
A kind of neural network model compression set provided in an embodiment of the present invention, specifically includes:
Memory, for storing computer program;
Processor, for performing computer program when, realize that neural network model described in any of the above-described embodiment compresses The step of method.
A kind of computer readable storage medium provided in an embodiment of the present invention is introduced below, one kind described below Computer readable storage medium can be cross-referenced with a kind of above-described neural network model compression method.
Computer program, the computer are stored on a kind of computer readable storage medium provided in an embodiment of the present invention It is realized when program is executed by processor as described in above-mentioned any embodiment the step of neural network model compression method.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims (10)

1. a kind of neural network model compression method, which is characterized in that including:
Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network model to be quantified;
The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms;
The neural network model to be stored is stored using default compressed format.
2. according to the method described in claim 1, it is characterized in that, the neural network method of cutting out is cut including dynamic network Algorithm.
3. according to the method described in claim 2, it is characterized in that, described treat cutting nerve using neural network method of cutting out Network model is cut to obtain neural network model to be quantified, including:
S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut network mould is cut wherein described Value in type in the corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1;
S202 utilizes formulaUpdate every layer of weight ginseng Number;WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out;Represent neural network First two-value mask of the footmark for the weight of (i, j) in kth layer;β is positive learning rate;L () represents loss function;⊙ is represented Hadamard Product Operators;I represents weight coefficient matrix WkFootmark range;
S203 utilizes formulaUpdate every layer of weight parameter corresponding One two-value mask code matrix;Wherein, akWith bkRespectively preset boundary;Function hk() represents, works as weighted valueAbsolute value Less than akWhen, then two-value maskIt is updated to 0;WhenAbsolute value be more than bkWhen, then two-value maskIt is updated to 1;WhenAbsolute value between akWith bkBetween when, thenValue do not do and update;
S204 updates iterations and learning rate according to predetermined manner;
S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202;If so, after using this update Every layer of obtained weight parameter and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update determine Neural network model to be quantified.
4. according to the method described in claim 1, it is characterized in that, described utilize INQ algorithms to the neural network to be quantified Model is quantified to obtain neural network model to be stored, including:
S301 determines the second training set and reference model, is initialized using the weight parameter of the neural network model to be quantified The weight parameter of the reference model;By the second two-value mask code matrix corresponding with every layer of weight parameter in the reference model It is initialized as 1;
S302 determines weights to be quantified according to default weight quantization scale in weight parameter of the second two-value mask code matrix for 1 Group and weights group to be trained;
S303 quantifies the weights group to be quantified, by the corresponding two-value mask code matrix of weight parameter in the weights group after quantization more New is 0, and update quantitative rate;Wherein, the quantitative rate be two-value mask code matrix be 0 weight parameter in all weight parameters Ratio;
S304, the weight parameter of re -training weights group to be trained;
S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%;If so, determine all amounts Weight parameter after change is with determining neural network model to be stored;If it is not, then return to S302.
5. method as claimed in any of claims 1 to 4, which is characterized in that described that institute is stored using compressed format Neural network model to be stored is stated, including:The weight parameter of the neural network model to be stored is stored according to predetermined bit position.
6. a kind of neural network model compressibility, which is characterized in that including:
Module is cut, is cut to obtain god to be quantified for treating cutting neural network model using neural network method of cutting out Through network model;
Quantization modules, for being quantified to obtain nerve net to be stored to the neural network model to be quantified using INQ algorithms Network model;
Memory module, for storing the neural network model to be stored using compressed format.
7. according to the method described in claim 6, it is characterized in that, the neural network method of cutting out is cut including dynamic network Algorithm.
8. the system described according to claim 6 or 7, which is characterized in that the memory module, specifically for according to predetermined bit Position stores the weight parameter of the neural network model to be stored.
9. a kind of neural network model compression set, which is characterized in that including:
Memory, for storing computer program;
Processor realizes the neural network model pressure as described in any one of claim 1 to 5 during for performing the computer program The step of contracting method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that neural network model compresses as described in any one of claim 1 to 5 when the computer program is executed by processor The step of method.
CN201711465541.1A 2017-12-28 2017-12-28 A kind of neural network model compression method, system, device and readable storage medium storing program for executing Pending CN108229681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711465541.1A CN108229681A (en) 2017-12-28 2017-12-28 A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711465541.1A CN108229681A (en) 2017-12-28 2017-12-28 A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Publications (1)

Publication Number Publication Date
CN108229681A true CN108229681A (en) 2018-06-29

Family

ID=62646571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711465541.1A Pending CN108229681A (en) 2017-12-28 2017-12-28 A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Country Status (1)

Country Link
CN (1) CN108229681A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962247A (en) * 2018-08-13 2018-12-07 南京邮电大学 Based on gradual neural network multidimensional voice messaging identifying system and its method
CN109344893A (en) * 2018-09-25 2019-02-15 华中师范大学 A kind of image classification method and system based on mobile terminal
CN109376854A (en) * 2018-11-02 2019-02-22 矽魅信息科技(上海)有限公司 More truth of a matter logarithmic quantization method and devices for deep neural network
CN109634401A (en) * 2018-12-29 2019-04-16 联想(北京)有限公司 A kind of control method and electronic equipment
CN109635935A (en) * 2018-12-29 2019-04-16 北京航空航天大学 Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN109766993A (en) * 2018-12-13 2019-05-17 浙江大学 A kind of convolutional neural networks compression method of suitable hardware
CN109978144A (en) * 2019-03-29 2019-07-05 联想(北京)有限公司 A kind of model compression method and system
CN110245753A (en) * 2019-05-27 2019-09-17 东南大学 A kind of neural network compression method based on power exponent quantization
WO2020019236A1 (en) * 2018-07-26 2020-01-30 Intel Corporation Loss-error-aware quantization of a low-bit neural network
CN110782021A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
CN110866603A (en) * 2018-12-29 2020-03-06 中科寒武纪科技股份有限公司 Data processing method and processor
CN110929837A (en) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 Neural network model compression method and device
CN111191784A (en) * 2018-11-14 2020-05-22 辉达公司 Transposed sparse matrix multiplied by dense matrix for neural network training
WO2020133364A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Neural network compression method and apparatus
CN111598227A (en) * 2020-05-20 2020-08-28 字节跳动有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN112085186A (en) * 2019-06-12 2020-12-15 上海寒武纪信息科技有限公司 Neural network quantitative parameter determination method and related product
WO2021143070A1 (en) * 2020-01-16 2021-07-22 北京智芯微电子科技有限公司 Compression method and apparatus for deep neural network model, and storage medium
CN113298248A (en) * 2020-07-20 2021-08-24 阿里巴巴集团控股有限公司 Processing method and device for neural network model and electronic equipment
CN113642710A (en) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 Network model quantification method, device, equipment and storage medium
CN109086819B (en) * 2018-07-26 2023-12-05 北京京东尚科信息技术有限公司 Method, system, equipment and medium for compressing caffemul model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557812A (en) * 2016-11-21 2017-04-05 北京大学 The compression of depth convolutional neural networks and speeding scheme based on dct transform

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557812A (en) * 2016-11-21 2017-04-05 北京大学 The compression of depth convolutional neural networks and speeding scheme based on dct transform

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AOJUN ZHOU 等: "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights", 《ARXIV:1702.03044V1》 *
SONG HAN 等: "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", 《ARXIV: 1510.00149V5》 *
YIWEN GUO 等: "Dynamic Network Surgery for Efficient DNNs", 《ARXIV:1608.04493V2》 *
刘南平 等: "《数据通信技术》", 31 July 2004 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020019236A1 (en) * 2018-07-26 2020-01-30 Intel Corporation Loss-error-aware quantization of a low-bit neural network
CN109086819B (en) * 2018-07-26 2023-12-05 北京京东尚科信息技术有限公司 Method, system, equipment and medium for compressing caffemul model
CN108962247B (en) * 2018-08-13 2023-01-31 南京邮电大学 Multi-dimensional voice information recognition system and method based on progressive neural network
CN108962247A (en) * 2018-08-13 2018-12-07 南京邮电大学 Based on gradual neural network multidimensional voice messaging identifying system and its method
CN110929837A (en) * 2018-09-19 2020-03-27 北京搜狗科技发展有限公司 Neural network model compression method and device
CN109344893A (en) * 2018-09-25 2019-02-15 华中师范大学 A kind of image classification method and system based on mobile terminal
CN109376854A (en) * 2018-11-02 2019-02-22 矽魅信息科技(上海)有限公司 More truth of a matter logarithmic quantization method and devices for deep neural network
CN109376854B (en) * 2018-11-02 2022-08-16 矽魅信息科技(上海)有限公司 Multi-base logarithm quantization device for deep neural network
CN111191784A (en) * 2018-11-14 2020-05-22 辉达公司 Transposed sparse matrix multiplied by dense matrix for neural network training
CN109766993B (en) * 2018-12-13 2020-12-18 浙江大学 Convolutional neural network compression method suitable for hardware
CN109766993A (en) * 2018-12-13 2019-05-17 浙江大学 A kind of convolutional neural networks compression method of suitable hardware
CN110866603A (en) * 2018-12-29 2020-03-06 中科寒武纪科技股份有限公司 Data processing method and processor
WO2020133364A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Neural network compression method and apparatus
CN110866603B (en) * 2018-12-29 2024-04-16 中科寒武纪科技股份有限公司 Data processing method and processor
CN109635935A (en) * 2018-12-29 2019-04-16 北京航空航天大学 Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN113168565A (en) * 2018-12-29 2021-07-23 华为技术有限公司 Neural network compression method and device
CN109634401B (en) * 2018-12-29 2023-05-02 联想(北京)有限公司 Control method and electronic equipment
CN109634401A (en) * 2018-12-29 2019-04-16 联想(北京)有限公司 A kind of control method and electronic equipment
CN109635935B (en) * 2018-12-29 2022-10-14 北京航空航天大学 Model adaptive quantization method of deep convolutional neural network based on modular length clustering
CN109978144B (en) * 2019-03-29 2021-04-13 联想(北京)有限公司 Model compression method and system
CN109978144A (en) * 2019-03-29 2019-07-05 联想(北京)有限公司 A kind of model compression method and system
CN110245753A (en) * 2019-05-27 2019-09-17 东南大学 A kind of neural network compression method based on power exponent quantization
CN112085186A (en) * 2019-06-12 2020-12-15 上海寒武纪信息科技有限公司 Neural network quantitative parameter determination method and related product
CN112085186B (en) * 2019-06-12 2024-03-05 上海寒武纪信息科技有限公司 Method for determining quantization parameter of neural network and related product
CN110782021A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
CN110782021B (en) * 2019-10-25 2023-07-14 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
WO2021143070A1 (en) * 2020-01-16 2021-07-22 北京智芯微电子科技有限公司 Compression method and apparatus for deep neural network model, and storage medium
CN111598227B (en) * 2020-05-20 2023-11-03 字节跳动有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN111598227A (en) * 2020-05-20 2020-08-28 字节跳动有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113298248A (en) * 2020-07-20 2021-08-24 阿里巴巴集团控股有限公司 Processing method and device for neural network model and electronic equipment
CN113642710A (en) * 2021-08-16 2021-11-12 北京百度网讯科技有限公司 Network model quantification method, device, equipment and storage medium
CN113642710B (en) * 2021-08-16 2023-10-31 北京百度网讯科技有限公司 Quantification method, device, equipment and storage medium of network model

Similar Documents

Publication Publication Date Title
CN108229681A (en) A kind of neural network model compression method, system, device and readable storage medium storing program for executing
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
Sohoni et al. Low-memory neural network training: A technical report
CN109635936A (en) A kind of neural networks pruning quantization method based on retraining
CN112367353A (en) Mobile edge computing unloading method based on multi-agent reinforcement learning
CN106570559A (en) Data processing method and device based on neural network
CN107395211A (en) A kind of data processing method and device based on convolutional neural networks model
CN109886397A (en) A kind of neural network structure beta pruning compression optimization method for convolutional layer
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
WO2020238237A1 (en) Power exponent quantization-based neural network compression method
CN106557812A (en) The compression of depth convolutional neural networks and speeding scheme based on dct transform
CN109635935A (en) Depth convolutional neural networks model adaptation quantization method based on the long cluster of mould
CN109635922A (en) A kind of distribution deep learning parameter quantization communication optimization method and system
CN108734264A (en) Deep neural network model compression method and device, storage medium, terminal
CN107886164A (en) A kind of convolutional neural networks training, method of testing and training, test device
CN110751265A (en) Lightweight neural network construction method and system and electronic equipment
CN112508190A (en) Method, device and equipment for processing structured sparse parameters and storage medium
CN112329910A (en) Deep convolutional neural network compression method for structure pruning combined quantization
CN109978144A (en) A kind of model compression method and system
CN109145107A (en) Subject distillation method, apparatus, medium and equipment based on convolutional neural networks
CN106156142B (en) Text clustering processing method, server and system
CN110263917B (en) Neural network compression method and device
CN112598129A (en) Adjustable hardware-aware pruning and mapping framework based on ReRAM neural network accelerator
CN115470889A (en) Network-on-chip autonomous optimal mapping exploration system and method based on reinforcement learning
CN112819157B (en) Neural network training method and device, intelligent driving control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629

RJ01 Rejection of invention patent application after publication