CN108229681A

CN108229681A - A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Info

Publication number: CN108229681A
Application number: CN201711465541.1A
Authority: CN
Inventors: 谢启凯; 吴韶华
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2018-06-29

Abstract

The invention discloses a kind of neural network model compression method, system, device and computer readable storage mediums, and the method includes treating cutting neural network model using neural network method of cutting out to be cut to obtain neural network model to be quantified；The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms；The neural network model to be stored is stored using compressed format.It can be seen that, a kind of neural network model compression method provided in an embodiment of the present invention, by being cut to neural network model, INQ algorithms is used to quantify to it after cutting simultaneously, in the case where effectively ensureing that compressed model accuracy does not lose, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.

Description

A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Technical field

The present invention relates to artificial intelligence field, more specifically to a kind of neural network model compression method, system, Device and computer readable storage medium.

Background technology

Current era either in daily life or internet world, all can't steer clear of a word, AI (Artificial Intelligence), i.e. artificial intelligence.The application of AI has penetrated into many aspects, such as recognition of face, Speech recognition, text-processing, weiqi play chess, game fighting, automatic Pilot, picture beautification, lip-read language, even stratum breaking is imitative True simulation etc..At many aspects, the ability of accuracy and process problem alreadys exceed the mankind, therefore, has very wide Wealthy application prospect and imagination space.In the algorithmic technique in AI fields, deep learning is competing in ImageNet since 2012 Won the championship in match with absolute advantage, just caused the extensive concern of academia and industrial quarters, scientists from all over the world, researcher, Enterprise, Web Community are all in the research and development for the neural network model for studying and pushing deep learning energetically.

As deep learning makes a breakthrough progress in every field, the demand of real life scene is applied it to also more Strongly, especially in today, mobile and portable electronic device greatly facilitates people's life, and deep learning will be carried greatly These high equipment it is intelligent with it is recreational.Therefore, by the neural network model of deep learning be deployed in mobile terminal with it is embedded System is become as active demand.

But for the neural network model of deep learning in actual deployment, moulded dimension is usually excessive under normal conditions, one As in the case of, neural network model is differed from tens to up to a hundred million, such file size, for mobile terminal, download When the flow and bandwidth contributions that expend caused by transmission latency it is long be that user is intolerable, it is and embedding for some Embedded system, memory space are very limited, may store so big nerve net without enough memory spaces at all Network model file.

Meanwhile computing resource and computing capability are required high.It is mobile when large-scale neural network model is used to be calculated End and embedded system or computing resource needed for it can not be provided or calculate slow, lead to operating lag too Gao Erwu Method meets practical application scene.

In addition, neural network model power consumption is also big.During neural computing, processor needs frequently to read god Parameter through network model, therefore larger neural network model also accordingly brings higher internal storage access number, and it is frequent Power consumption can be also greatly improved in internal storage access.

Although common model compression method reduces moulded dimension, and model parameter is taken the mode of sparse matrix into Row storage, but the precision of model is also inevitably declined.In addition compression method also takes to compressed model weight Newly trained method reduces the loss of model accuracy, but in the case where the operational performance for being using model reasoning prediction has significantly Drop.

Therefore, how to ensure the precision of neural network model while neural network model is compressed, be those skilled in the art Problem to be solved.

Invention content

The purpose of the present invention is to provide a kind of neural network model compression method, system, device and computer-readable deposit Storage media, to ensure the precision of neural network model while neural network model is compressed.

To achieve the above object, an embodiment of the present invention provides following technical solutions：

A kind of neural network model compression method, including：

Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network mould to be quantified Type；

The neural network model to be quantified is quantified to obtain neural network model to be stored using INQ algorithms；

The neural network model to be stored is stored using compressed format.

Wherein, the neural network method of cutting out includes dynamic network trimming algorithm.

Wherein, it is described using neural network method of cutting out treat cut neural network model cut to obtain god to be quantified Through network model, including：

S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model；

S202 utilizes formulaIt updates described in every layer The weight of each footmark in weight parameter, to obtain updated every layer of weight parameter；WhereinRepresent nerve to be cut out Footmark is the weight coefficient of (i, j) in network kth layer；Represent that footmark is the weight of (i, j) in neural network kth layer First two-value mask；β is positive learning rate；L () represents loss function；⊙ represents Hadamard Product Operators；I represents weight Coefficient matrix W_kFootmark range；

S203 utilizes formulaUpdate every layer of weight parameter In each footmark weight two-value mask, to obtain updated first two-value mask code matrix corresponding with every layer of weight parameter； Wherein, a_kWith b_kRespectively preset boundary；Function h_k() represents, works as weighted valueAbsolute value be less than a_kWhen, then two-value MaskIt is updated to 0；WhenAbsolute value be more than b_kWhen, then two-value maskIt is updated to 1；WhenIt is absolute Value is between a_kWith b_kBetween when, thenValue do not do and update；

S204 updates iterations and learning rate according to predetermined manner；

S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202；If so, using this more The every layer of weight parameter obtained after new and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update Determine neural network model to be quantified.

Wherein, it is described that the neural network model to be quantified is quantified to obtain nerve net to be stored using INQ algorithms Network model, including：

S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified The weight parameter of the beginningization reference model；By the second two-value mask corresponding with every layer of weight parameter in the reference model Matrix initialisation is 1；

S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale Weights group and weights group to be trained；

S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization Battle array is updated to 0, and update quantitative rate；Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights Ratio in number；

S304 updates the weight parameter of weights group to be trained；

S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%；If so, determine institute There is the weight parameter after quantization to determine neural network model to be stored；If it is not, then return to S302.

Wherein, it is described that the neural network model to be stored is stored using compressed format, including：It is deposited according to predetermined bit position Store up the weight parameter of the neural network model to be stored.

In order to solve the above technical problems, the present invention also provides a kind of neural network model compressibility, including：

Module is cut, is cut to obtain the amount for the treatment of for treating cutting neural network model using neural network method of cutting out Change neural network model；

Quantization modules, for being quantified to obtain god to be stored to the neural network model to be quantified using INQ algorithms Through network model；

Memory module, for storing the neural network model to be stored using compressed format.

Wherein, the memory module, specifically for storing the neural network model to be stored according to predetermined bit position Weight parameter.

The present invention also provides a kind of neural network model compression set, including：

Memory, for storing computer program；

The step of processor, for performing computer program when, realize the neural network model compression method.

The present invention also provides a kind of computer readable storage medium, meter is stored on the computer readable storage medium The step of calculation machine program, the computer program realizes neural network model compression method when being executed by processor.

By above scheme it is found that a kind of neural network model compression method provided by the invention, including the use of nerve net Network method of cutting out treats cutting neural network model and is cut to obtain neural network model to be quantified；Using INQ algorithms to institute Neural network model to be quantified is stated to be quantified to obtain neural network model to be stored；It is stored using compressed format described to be stored Neural network model.

It can be seen that a kind of neural network model compression method provided by the invention, by being carried out to neural network model It cuts, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive；INQ is used to it after cutting simultaneously Algorithm is quantified, and can effectively ensure that compressed model accuracy does not lose.

Description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of neural network model compression method flow chart disclosed by the embodiments of the present invention；

Fig. 2 cuts weight change schematic diagram for a kind of specific DNS disclosed by the embodiments of the present invention；

Fig. 3 is a kind of specific network model compression method flow chart disclosed by the embodiments of the present invention；

Fig. 4 is a kind of specific neural network model compression method flow chart disclosed by the embodiments of the present invention；

Fig. 5 is a kind of neural network model compressibility structure diagram disclosed by the embodiments of the present invention.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a kind of neural network model compression method, system, device and computer-readable storages Medium, to ensure the precision of neural network model while neural network model is compressed.

Referring to Fig. 1, a kind of neural network model compression method provided in an embodiment of the present invention specifically includes：

S101 treats cutting neural network model using neural network method of cutting out and is cut to obtain nerve net to be quantified Network model.

In the present solution, the neural network model that cutting is treated first with neural network method of cutting out is cut, with The value of fractional weight parameter in neural network is set to 0, makes these weights into useless weight, so as in propagated forward process In, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned fractional weight parameter is general The weight parameter petite for absolute value.

As preference, neural network method of cutting out can select DNS (Dynamic Network Surgery, Dynamic Networks Network is cut) algorithm.

Specifically, it treats to cut and be cut after neural network model is cut as a result, and it is to be quantified to be used as it Neural network model, and then quantification treatment is carried out to it.

S102 quantifies the neural network model to be quantified using INQ algorithms to obtain neural network mould to be stored Type.

Using neural network parameter quantization method to after the cutting that is obtained in S101 as a result, neural network mould i.e. to be quantified The quantized result that type is quantified can be carried out storing.As preference, INQ may be used in the method for quantization (Incremental Network Quantization, cumulative network quantization) algorithm, the neural network parameter amount finally obtained It is after change as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.

It should be noted that INQ technologies propose gradual neural network quantification thought, core is the introduction of parameter point Three kinds of group, quantization and retraining operations.Each layer parameter in full precision floating-point network model is divided into two groups first in the implementation, Parameter in first group directly will be quantified and be fixed, and the parameter in another group will compensate quantization to model by retraining Caused by loss of significance.Then, it operates the full precision floating point parameters after iterated application successively to completion retraining for above-mentioned three kinds Part, until model quantifies completely.It is grouped by ingenious coupling parameter, quantization and retraining operate, it is suppressed that model value Performance loss caused by change, so as to be suitable for the neural network model of arbitrary structures in practice.

In addition, INQ technologies, during model quantization, all parameters are constrained to binary representation, and comprising zero, this Kind quantization is so that last model is highly suitable for disposing on hardware and accelerate.Such as on FPGA, complicated full precision floating-point Multiplying will directly be replaced with simple shifting function.

S103 stores the neural network model to be stored using default compressed format.

Specifically, its quantized result can be stored after quantifying to neural network model, is deposited using compression Storage form stores the parameter value after quantization.As preference, compression storage format can be the storage of low bit position.

For example, bit is preset as 4, storage form is as shown in table 1.Wherein actual value is the power of network model to be stored Weight parameter value, i.e., the weight parameter value after quantization, 4bit are expressed as corresponding 4bit of the value and represent corresponding value.

Table 1

4bit expressions	Actual value	4bit expressions	Actual value
				0000	0.00	1000	2^-1
0001	-2^-1	1001	2^-2
				0010	-2^-2	1010	2^-3
0011	-2^-3	1011	2^-4
				0100	-2^-4	1100	2^-5
0101	-2^-5	1101	2^-6
				0110	-2^-6	1110	2^-7
0111	-2^-7	1111	Without correspondence

It can be seen that a kind of neural network model compression method provided in an embodiment of the present invention, by neural network mould Type is cut, while INQ algorithms is used to quantify to it after cutting, and is effectively ensureing that compressed model accuracy do not have In the case of lossy, moulded dimension can be reduced, therefore can solve the problems, such as that consuming resource is excessive, and accelerates to calculate.

The embodiment of the present invention provides a kind of specific neural network model compression method, is different from above-described embodiment, this hair Bright embodiment has been S101 in above-described embodiment further restriction and explanation, other content and above-described embodiment substantially phase Together, above-described embodiment can be specifically referred to, details are not described herein again.Referring to Fig. 2 and Fig. 3, S101 is specifically included：

S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut net is cut wherein described The corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1 in network model.

It should be noted that the concept cut refers to fractional weight parameter in neural network being set to 0.Cut weight variation As shown in figure 3, in Fig. 3 from a left side to again be followed successively by initial network model to be cut；It is determined that weight to be cut is waited to cut out Network model, wherein the weight filled, which is identified as, is confirmed as weight to be cut out；Network model after being cut, will Weight to be cut out is set to 0.

Specifically, first, prepare training dataset X, with reference modelI.e. trained intact nervous Network model.Wherein, C is the number of plies of neural network model,It is the model parameter of kth layer；It needs to set hyper parameter simultaneously, Such as including learning rate, learning rate update rule controls parameter of cutting rate etc..

It will network model parameter W be cut_kIt is initialized asIt will be by two-value mask code matrix T_kIt is initialized as the range 0 of 1, k The number of plies of≤k≤C, C for neural network model, T_kRepresent weight two-value mask of the footmark for (i, j) in neural network kth layer, That is mask blob, value represent that its corresponding weight is deleted for 0 or 1,0, and 1 represents that its corresponding weight is retained, T_kShape Shape size and W_kIt is identical.

S202 utilizes formulaUpdate every layer of weight Parameter；WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out；Represent nerve net First two-value mask of the footmark for the weight of (i, j) in network kth layer；β is positive learning rate；L () represents loss function；⊙ tables Show Hadamard Product Operators；I represents weight coefficient matrix W_kFootmark range.

A part of data are chosen from training dataset X, by (the W of each layer₀⊙T₀),...,(W_C⊙T_C) as preceding to biography The weight broadcast, wherein ⊙ are Hadamard Product Operators, and calculate the penalty values of propagated forward；The gradient of counting loss functionAnd backpropagation is carried out, with the weight matrix W to each layer_kIt is updated.W_kUpdate mode it is as follows：

Wherein,Represent weight coefficient of the footmark for (i, j) in neural network kth layer；Represent neural network Footmark is the weight two-value mask of (i, j) in kth layer；β is positive learning rate；L () represents loss function；⊙ is represented Hadamard Product Operators；I represents weight coefficient matrix W_kFootmark range.

S203 utilizes formulaEvery layer of weight parameter is updated to correspond to The first two-value mask code matrix；Wherein, a_kWith b_kRespectively preset boundary；Function h_k() represents, works as weighted valueIt is exhausted A is less than to value_kWhen, then two-value maskIt is updated to 0；WhenAbsolute value be more than b_kWhen, then two-value maskMore New is 1；WhenAbsolute value between a_kWith b_kBetween when, thenValue do not do and update.

It should be noted that a_k＜ b_k, respectively judge the whether newer boundary of two-value mask.Function h_k() represents, such as Fruit weightsAbsolute value be less than a_k, then two-value maskBecome 0, it is meant thatIt will be cut；IfAbsolute value be more than b_k, then two-value maskBecome 1, it is meant thatIt will be retained；IfIt is exhausted To being worth between a_kWith b_kBetween, thenValue it is temporarily constant, it is meant thatWhether it is retained and depends onBefore update Value.

S204 updates iterations and learning rate according to predetermined manner.

Specifically, Policy Updates learning rate is updated, while iterations are carried out more according to learning rate preset in S201 Newly, such as iterations add 1.

Specifically, judge whether current iterations are more than preset value, if it is not, continuing iteration, return to S201, If being more than preset value, the weight parameter { W after cutting is exported_k:0≤k≤C } and its corresponding two-value mask square Battle array { T_k:0≤k≤C }, determine neural network model to be quantified using the weight parameter and two-value mask code matrix.

It can be seen that a kind of specific neural network model compression method provided in an embodiment of the present invention, by using DNS Neural network to be cut out is cut out by algorithm, so as to which the value of fractional weight parameter is set to 0, prevents it preceding to biography from influencing The prediction result of neural network during broadcasting.Model parameter is greatly reduced, therefore the requirement to computing resource can also reduce.

A kind of specific neural network model compression method provided in an embodiment of the present invention, is different from above-described embodiment, this Inventive embodiments have done further restriction and explanation to S102 in above-described embodiment, other step contents and above-described embodiment are big Cause identical, details are not described herein again.With reference to figure 4, S102 is specifically included：

S301 determines the second training set and reference model, at the beginning of the weight parameter using the neural network model to be quantified The weight parameter of the beginningization reference model；By the second two-value mask corresponding with every layer of weight parameter in the reference model Value in matrix is initialized as 1.

Specifically, prepare training dataset X, with reference modelI.e. trained intact nervous net Network model, wherein C are the number of plies of neural network model,It is the model parameter of kth layer；Hyper parameter is set, including quantization scale Parameter etc..

Weight quantization scale { σ is set₁,σ₂,…,σ_N}；Initialize the two-value mask code matrix T of each layer_kIt is 1, needs what is illustrated It is that the two-value mask code matrix in the embodiment of the present invention is used to represent whether corresponding parameter has been quantized, wherein, 0 represents It has been be quantized that, 1 represents not to be quantized.

S302, it is determining to be quantified in weight parameter of the second two-value mask code matrix for 1 according to default weight quantization scale Weights group and weights group to be trained.

Specifically, initialization needs the weights group that each layer of neural network is quantizedInitialization needs to be instructed again Experienced weights groupFor l layers of neural network, weights grouping is defined as follows：

Wherein,Represent the weights group that will be quantized,Expression needs the weights group of retraining.Weights are grouped When, T_l(i, j)=0 representsT_l(i, j)=1 represents

According to weights quantization scale { σ₁,σ₂,…,σ_N, to neural network, each layer of weight parameter is divided intoWithAnd update its corresponding two-value mask code matrix T_k；

According to each layer quantization weight groupIn weighted value range, determine quantized values set P_l：

Wherein, n₁With n₂It is integer, and meets n₂≤n₁, in this way, n₁With n₂By W_lIn nonzero term constrain inOrRange, W_lMiddle absolute value is less thanValue will be cut.It is low in INQ algorithms Bit quantization bit number b used is set in advance, therefore need to only calculate n₁, then n₂It can be according to n₁It is calculated with b Go out.n₁Calculation formula it is as follows：

Wherein, the downward rounding of floor () function representation, max () function representation take the maximum value of all input elements, Abs () function representation takes absolute value to each element.Obtain n₁Later, n can be obtained₂=n₁+2-2^(b-1)。

S303 quantifies the weights group to be quantified, by the corresponding two-value mask square of weight parameter in the weights group after quantization Battle array is updated to 0, and update quantitative rate；Wherein, the quantitative rate is that the weight parameter that two-value mask code matrix is 0 is joined in all weights Ratio in number.

Specifically, it is right according to following formulaIn weight parameter quantified：

Wherein, α and β is set P_lIn adjacent element.

S304, the weight parameter of re -training weights group to be trained.

Specifically, using updated weight parameter progress propagated forward is quantified, then in backpropagation according to following Formula (stochastic gradient descent method) updatesIn weight parameter.

Wherein, γ is positive learning rate, and L is loss function, T_l(i, j) is two-value mask code matrix, shape size and W_l (i, j) is completely the same, and value is 0 or 1.During right value update, T_l(i, j)=0 represents corresponding weights W_l(i, j) has been quantized, Without update；T_l(i, j)=1 represents corresponding weights W_l(i, j) is not quantized, need to normally be updated.

S305 judges whether current iteration number reaches predetermined threshold value and the quantitative rate reaches 100%；If so, really Weight parameter after fixed all quantizations is with determining neural network model to be stored；If it is not, it then updates iterations and returns to S302.

Specifically, judge whether current iteration number has reached predetermined threshold value, and quantitative rate has reached 100%, Namely T_kIt is 0, if so, being no longer iterated, the weight parameter { W after output quantization_k: 0≤k≤C }, and weight is joined Several values is 2 whole power or 0.

It can be seen that a kind of specific neural network compression method provided in an embodiment of the present invention, can utilize INQ algorithms Neural network to be quantified is quantified, by the method for re -training repeatedly during quantization, ensure that compressed Model accuracy does not lose.

A kind of neural network compressibility provided in an embodiment of the present invention is introduced below, a kind of god described below It can be cross-referenced through Web compression system and a kind of above-described neural network compression method.

Referring to Fig. 5, a kind of neural network compressibility provided in an embodiment of the present invention specifically includes：

Module 401 is cut, is cut to obtain for treating cutting neural network model using neural network method of cutting out Neural network model to be quantified.

In the present solution, cutting the neural network model that module 401 treats first with neural network method of cutting out cutting It is cut, the value of the fractional weight parameter in neural network is set to 0, make these weights into useless weight, thus During propagated forward, these weights is made not have an impact the prediction result of neural network.It should be noted that above-mentioned part Weight parameter is generally the smaller weight parameter of absolute value.

Specifically, it cuts module 401 and treats to cut and cut after neural network model is cut as a result, and making its work For for neural network model to be quantified, and then quantification treatment is carried out to it.

Quantization modules 402, it is to be stored for being quantified to obtain to the neural network model to be quantified using INQ algorithms Neural network model.

Quantization modules 402 using neural network parameter quantization method to after cutting the obtained cutting of module 401 as a result, i.e. The quantized result that neural network model to be quantified is quantified can be carried out storing.As preference, the method for quantization INQ (Incremental Network Quantization, cumulative network quantization) algorithm, the nerve finally obtained may be used Network parameter quantization after as a result, i.e. the value of weight parameter is quantified as 2 whole power or 0.

Memory module 403, for storing the neural network model to be stored using default compressed format.

Specifically, memory module 403 can store its quantized result after quantifying to neural network model, The parameter value after quantization is stored using compression storage format.As preference, compression storage format can be low bit Position storage.

It can be seen that a kind of neural network model compressibility provided in an embodiment of the present invention, cuts module 401 by right Neural network model is cut, and can reduce moulded dimension, therefore can solve the problems, such as that consuming resource is excessive；It cuts simultaneously Later INQ algorithms is used to quantify it using quantization modules 402, can effectively ensures that compressed model accuracy does not have Loss.

A kind of neural network model compression set provided in an embodiment of the present invention is introduced below, described below one Kind neural network model compression set can be cross-referenced with a kind of above-described neural network model compression method.

A kind of neural network model compression set provided in an embodiment of the present invention, specifically includes：

Memory, for storing computer program；

Processor, for performing computer program when, realize that neural network model described in any of the above-described embodiment compresses The step of method.

A kind of computer readable storage medium provided in an embodiment of the present invention is introduced below, one kind described below Computer readable storage medium can be cross-referenced with a kind of above-described neural network model compression method.

Computer program, the computer are stored on a kind of computer readable storage medium provided in an embodiment of the present invention It is realized when program is executed by processor as described in above-mentioned any embodiment the step of neural network model compression method.

Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.

The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims

1. a kind of neural network model compression method, which is characterized in that including：

Cutting neural network model is treated using neural network method of cutting out to be cut to obtain neural network model to be quantified；

The neural network model to be stored is stored using default compressed format.

2. according to the method described in claim 1, it is characterized in that, the neural network method of cutting out is cut including dynamic network Algorithm.

3. according to the method described in claim 2, it is characterized in that, described treat cutting nerve using neural network method of cutting out Network model is cut to obtain neural network model to be quantified, including：

S201 determines the first training dataset, network model to be cut and primary iteration number, and to be cut network mould is cut wherein described Value in type in the corresponding first two-value mask code matrix of every layer of weight parameter is initialized as 1；

S202 utilizes formulaUpdate every layer of weight ginseng Number；WhereinRepresent weight coefficient of the footmark for (i, j) in neural network kth layer to be cut out；Represent neural network First two-value mask of the footmark for the weight of (i, j) in kth layer；β is positive learning rate；L () represents loss function；⊙ is represented Hadamard Product Operators；I represents weight coefficient matrix W_kFootmark range；

S203 utilizes formulaUpdate every layer of weight parameter corresponding One two-value mask code matrix；Wherein, a_kWith b_kRespectively preset boundary；Function h_k() represents, works as weighted valueAbsolute value Less than a_kWhen, then two-value maskIt is updated to 0；WhenAbsolute value be more than b_kWhen, then two-value maskIt is updated to 1；WhenAbsolute value between a_kWith b_kBetween when, thenValue do not do and update；

S204 updates iterations and learning rate according to predetermined manner；

S205, judges whether current iteration number is more than preset value, if it is not, then returning to S202；If so, after using this update Every layer of obtained weight parameter and the first two-value mask code matrix corresponding with every layer of weight parameter obtained after this update determine Neural network model to be quantified.

4. according to the method described in claim 1, it is characterized in that, described utilize INQ algorithms to the neural network to be quantified Model is quantified to obtain neural network model to be stored, including：

S301 determines the second training set and reference model, is initialized using the weight parameter of the neural network model to be quantified The weight parameter of the reference model；By the second two-value mask code matrix corresponding with every layer of weight parameter in the reference model It is initialized as 1；

S302 determines weights to be quantified according to default weight quantization scale in weight parameter of the second two-value mask code matrix for 1 Group and weights group to be trained；

S303 quantifies the weights group to be quantified, by the corresponding two-value mask code matrix of weight parameter in the weights group after quantization more New is 0, and update quantitative rate；Wherein, the quantitative rate be two-value mask code matrix be 0 weight parameter in all weight parameters Ratio；

S304, the weight parameter of re -training weights group to be trained；

S305 judges whether iterations reach predetermined threshold value and the quantitative rate reaches 100%；If so, determine all amounts Weight parameter after change is with determining neural network model to be stored；If it is not, then return to S302.

5. method as claimed in any of claims 1 to 4, which is characterized in that described that institute is stored using compressed format Neural network model to be stored is stated, including：The weight parameter of the neural network model to be stored is stored according to predetermined bit position.

6. a kind of neural network model compressibility, which is characterized in that including：

Module is cut, is cut to obtain god to be quantified for treating cutting neural network model using neural network method of cutting out Through network model；

Quantization modules, for being quantified to obtain nerve net to be stored to the neural network model to be quantified using INQ algorithms Network model；

7. according to the method described in claim 6, it is characterized in that, the neural network method of cutting out is cut including dynamic network Algorithm.

8. the system described according to claim 6 or 7, which is characterized in that the memory module, specifically for according to predetermined bit Position stores the weight parameter of the neural network model to be stored.

9. a kind of neural network model compression set, which is characterized in that including：

Memory, for storing computer program；

Processor realizes the neural network model pressure as described in any one of claim 1 to 5 during for performing the computer program The step of contracting method.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes that neural network model compresses as described in any one of claim 1 to 5 when the computer program is executed by processor The step of method.