CN109389216A

CN109389216A - The dynamic tailor method, apparatus and storage medium of neural network

Info

Publication number: CN109389216A
Application number: CN201710656730.0A
Authority: CN
Inventors: 南楠; 李晓会; 叶丽萍
Original assignee: Allwinner Technology Co Ltd
Current assignee: Allwinner Technology Co Ltd
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2019-02-26
Anticipated expiration: 2037-08-03
Also published as: CN109389216B

Abstract

The invention discloses the dynamic tailor method, apparatus and storage medium of a kind of neural network, which comprises setting cutting rate target cuts range, cuts object and cuts stage number；According to the cutting rate target and stage number is cut, determines the cutting rate target in each cutting stage；According to the network training parameter of neural network archetype and the cutting stage number, the network training parameter in each cutting stage is determined；Based on the network training parameter in each cutting stage, network training is carried out by the cutting stage, and in each cutting stage, according to the cutting rate target in corresponding cutting stage, the cutting object in the cutting range is cut.Using scheme of the present invention, preferable compression and acceleration effect are obtained while can ensure neural network precision, and is restrained comparatively fast, and training effectiveness is high.

Description

The dynamic tailor method, apparatus and storage medium of neural network

Technical field

The present invention relates to the dynamic tailor method, apparatus of depth learning technology field more particularly to a kind of neural network and Storage medium.

Background technique

In recent years, with the fast development of artificial intelligence technology, deep learning neural network obtains in terms of pattern-recognition Huge success, such as image classification, target detection, image segmentation, speech recognition and machine translation etc..In above-mentioned field In, the either shallow model of the remote ultra-traditional of the performance of deep learning model, or even reached the level of the mankind in some aspects.However, The neural network of better performances usually has biggish model parameter, so that its computation complexity is higher.Here, the complexity of calculating Degree is embodied in space (huge model storage volume and EMS memory occupation when running) simultaneously and the time is (hundreds of needed for single reasoning Hundred million floating-point operations) two aspect on.Therefore, carrying out compression and acceleration to neural network becomes particularly important, especially to operation For application in such as embedded device, integrated hardware equipment and large-scale data processing center.

Currently, the compression of mainstream and the method for acceleration deep learning neural network are network prunings, for example, Han, Song etc. Paper " Learning both weights and connections for efficient neural of human hair table Network " proposes a kind of static tailor strategy for deep learning neural network successive iteration, and compression ratio is considerable, property Energy decline is limited, but the training time is too long, and efficiency is too low；Paper " the Dynamic Network that Guo, Yiwen et al. are delivered Surgery for Efficient DNNs " proposes a kind of dynamic tailor strategy for deep learning network, when training Between it is shorter, but compression ratio is uncontrollable, and performance decline is more.Therefore, current network pruning technical effect is undesirable, however it remains Training time, compression ratio and network performance the technical issues of cannot combining.

Summary of the invention

In view of the above problems, the present invention is proposed in order to provide a kind of dynamic tailor of neural network to solve the above problems Method, apparatus and storage medium.

According to one aspect of the present invention, a kind of dynamic tailor method of neural network is provided, comprising:

Cutting rate target is set, range is cut, cut object and cuts stage number；

According to the cutting rate target and stage number is cut, determines the cutting rate target in each cutting stage；

According to the network training parameter of neural network archetype and the cutting stage number, each cutting stage is determined Network training parameter；

Based on the network training parameter in each cutting stage, network training is carried out by the cutting stage, and in each cutting rank Duan Zhong cuts the cutting object in the cutting range according to the cutting rate target in corresponding cutting stage.

Optionally, in the method for the invention, the object that cuts includes one of following object: connection, the filtering of weight Device and convolution kernel.

Optionally, in the method for the invention, the network training parameter according to neural network archetype and described Cutting stage number determines the network training parameter in each cutting stage, comprising:

According to the training the number of iterations of neural network archetype and the cutting stage quantity, each cutting rank is determined The training the number of iterations of section；

It is cut according to the training the number of iterations of neural network archetype, training parameter and before each cutting stage accumulative Training the number of iterations, determines the initialization training parameter in each cutting stage.

Optionally, in the method for the invention, the training the number of iterations in each cutting stage determined is M_i, the M_i=α ×M/N；Wherein, M be original trained the number of iterations, N be the cutting stage number, α be arranged the number of iterations impact factor, 1 ≤i≤N。

Optionally, described to be joined according to the training the number of iterations of neural network archetype, training in the method for the invention Accumulative training the number of iterations was cut before several and each cutting stage, is determined the initialization training parameter in each cutting stage, is wrapped It includes:

According to the original trained the number of iterations and training parameter of neural network, any iteration in original training process is estimated The training parameter in period；

The accumulative matched iteration cycle of training the number of iterations institute of cutting before determining the current cutting stage, according to matching The corresponding training parameter of the iteration cycle, estimating current cutting stage updates the corresponding initialization training parameter of point.

Optionally, in the method for the invention, the cutting object in the cutting range is cut, and is had Body includes:

In each cutting stage, before each repetitive exercise, it is based on probability function and current network physical training condition, Determine the probability of trimming operation；Wherein, the probability function is the increase with the number of iterations and the function that is gradually reduced；

When the probability of the trimming operation is greater than specified threshold, the cutting object in the cutting range is carried out It cuts.

Optionally, in the method for the invention, the specified threshold includes: the number and random number that tandom number generator generates The ratio of the upper limit of number caused by generator.

Optionally, described to be based on probability function and current network physical training condition in the method for the invention, it determines and cuts behaviour The probability of work, comprising:

By the probability function, the cutting movement excitation probability an of benchmark is generated；

According to the physical training condition of current network, determines and cut the impact probability factor；Wherein, the cutting impact probability factor Less than or equal to 1；

By the excitation probability and the cutting impact probability fac-tor, the probability of the trimming operation is obtained.

Optionally, in the method for the invention, the physical training condition according to current network, determine cut impact probability because Son, comprising:

Judge whether network is in trained dead state by stall detection, if so, by the cutting impact probability because Son is set smaller than the value of first threshold；Otherwise, the cutting impact probability factor is set greater than to the value of second threshold；Its In, second threshold is more than or equal to first threshold.

Determine the importance of the cutting object in the cutting range；

According to the cutting rate target of current generation, the cutting threshold value for cutting object is set；

The importance for cutting object is compared with the cutting threshold value, and according to comparison result, described in update Cut the connection status of object.

Optionally, in the method for the invention, the cutting threshold value is single value or is interval value；

It is described that the connection status for cutting object is updated according to comparison result when the cutting threshold value is single value, Include:

When the importance for cutting object is more than or equal to the cutting threshold value, the connection shape for cutting object is updated State is connection；When the importance for cutting object is less than the cutting threshold value, the connection status for cutting object is updated To disconnect；

It is described that the connection status for cutting object is updated according to comparison result when the cutting threshold value is interval value, Include:

When the importance for cutting object is less than the lower limit of the interval value, the connection shape for cutting object is updated State is to disconnect；When the importance for cutting object is greater than the upper limit of the interval value, the connection for cutting object is updated State is connection；When the importance for cutting object is fallen within the scope of the interval value, keep the cutting object current Connection status.

Optionally, in the method for the invention, after the cutting object in the cutting range is cut, Further include:

It is that the attenuation term for cutting weight in object disconnected updates part, momentum more by connection status in the cutting range New portion or all discarding of update part.

According to another aspect of the invention, a kind of dynamic tailor device of neural network is provided, comprising:

Setup module, for setting cutting rate target, cutting range, cut object and cut stage number；

Phase targets determining module, for determining each cutting rank according to the cutting rate target and cutting stage number The cutting rate target of section；

Stage parameter determination module, for according to neural network archetype network training parameter and the cutting stage Number determines the network training parameter in each cutting stage；

Training cuts module, for the network training parameter based on each cutting stage, carries out network instruction by the cutting stage Practice, and in each cutting stage, according to the cutting rate target in corresponding cutting stage, to the cutting in the cutting range Object is cut.

Optionally, in device of the present invention, the object that cuts includes one of following object: connection, the filtering of weight Device and convolution kernel.

Optionally, in device of the present invention, the stage parameter determination module, comprising:

The number of iterations determines submodule, for the training the number of iterations and the cutting according to neural network archetype Stage quantity determines the training the number of iterations in each cutting stage；

Training parameter determines submodule, for according to the training the number of iterations of neural network archetype, training parameter with And accumulative training the number of iterations was cut before each cutting stage, determine the initialization training parameter in each cutting stage.

Optionally, in device of the present invention, the number of iterations determines the instruction in each cutting stage that submodule determines Practicing the number of iterations is M_i, the M_i=α × M/N；Wherein, M is original trained the number of iterations, and N is the number in cutting stage, and α is to set The number of iterations impact factor set, 1≤i≤N.

Optionally, in device of the present invention, the training parameter determines submodule, specifically for according to neural network Original trained the number of iterations and training parameter, estimate the training parameter of any iteration cycle in original training process, and determination is worked as The accumulative matched iteration cycle of training the number of iterations institute of cutting before the preceding cutting stage, according to the matched iteration cycle Corresponding training parameter estimates the corresponding initialization training parameter of current cutting stage update point.

Optionally, in device of the present invention, the training cuts module, comprising:

Determine the probability submodule, for before each repetitive exercise, being based on probability function in each cutting stage With current network physical training condition, the probability of trimming operation is determined；Wherein, the probability function be with the number of iterations increase and The function being gradually reduced.

Judging submodule is cut, for triggering to the cutting when the probability of the trimming operation is greater than specified threshold The operation that the cutting object in range is cut.

Optionally, in device of the present invention, the specified threshold includes: the number and random number that tandom number generator generates The ratio of the upper limit of number caused by generator.

Optionally, in device of the present invention, the determine the probability submodule is specifically used for through the probability function, The cutting movement excitation probability for generating a benchmark determines the cutting impact probability factor according to the physical training condition of current network, will The excitation probability and the cutting impact probability fac-tor, obtain the probability of the trimming operation；Wherein, described to cut generally Rate impact factor is less than or equal to 1.

Optionally, in device of the present invention, the determine the probability submodule, specifically for judging net by stall detection Whether network is in trained dead state, if so, the cutting impact probability factor to be set smaller than to the value of first threshold；It is no Then, the cutting impact probability factor is set greater than to the value of second threshold；Wherein, second threshold is more than or equal to the first threshold Value.

Information determines submodule, for determining the importance of the cutting object in the cutting range；

Threshold value sets submodule, for the cutting rate target according to the current generation, sets the cutting threshold for cutting object Value；

Cut submodule, for by it is described cut object importance be compared with the cutting threshold value, and according to than Compared with as a result, updating the connection status for cutting object.

Optionally, in device of the present invention, the cutting threshold value is single value or is interval value；

When the cutting threshold value is single value, the cutting submodule is big for working as the importance for cutting object When being equal to the cutting threshold value, updating the connection status for cutting object is connection；When the importance for cutting object When less than the cutting threshold value, updating the connection status for cutting object is to disconnect；

When the cutting threshold value is interval value, the cutting submodule is small for working as the importance for cutting object When the lower limit of the interval value, updating the connection status for cutting object is to disconnect；When the importance for cutting object Greater than the interval value the upper limit when, update it is described cut object connection status be connection；The important of object is cut when described Property when falling within the scope of the interval value, keep described and cut the current connection status of object.

Optionally, in device of the present invention, the training cuts module, is also used to described in the cutting range It is that the attenuation term for cutting weight in object disconnected updates by connection status in the cutting range after cutting object is cut Partially, momentum updates part or all updates part and abandons.

In terms of third according to the present invention, a kind of computer readable storage medium is also provided, computer is stored thereon with The step of program, which realizes the dynamic tailor method of neural network of the present invention when being executed by processor.

The dynamic tailor scheme of neural network proposed by the present invention, using dynamic tailor mode stage by stage, by each Stage uses different cutting rate target and training parameter, so that cutting rate is controllable, and effectively improves trained convergence Speed reduces the influence degree cut to network performance；

Furthermore the dynamic tailor scheme of neural network proposed by the present invention, is mutually tied by probability function and network training state The adaptive adjustment of conjunction cuts frequency, further the convergence rate of training for promotion, reduction can cut the influence journey to network performance Degree.

In short, a cutting can be generated within shorter cycle of training using dynamic tailor scheme of the present invention Rate it is controllable, performance decline unconspicuous rarefaction neural network, it is seen that achieve more significant technical effect.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 is a kind of flow chart of the dynamic tailor method for neural network that first embodiment of the invention provides；

Fig. 2 is a kind of structural block diagram of the dynamic tailor device for neural network that second embodiment of the invention provides.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

The embodiment of the present invention proposes the dynamic tailor method, apparatus and storage medium of a kind of neural network, uses sublevel Section dynamic tailor mode is combined with network training state adaptive adjustment cutting frequency by probability function, is realized can Ensure to obtain preferable compression and acceleration effect while neural network precision, and restrain comparatively fast, training effectiveness is high.Below It is described in detail by the implementation process of several specific embodiments scheme described in the embodiment of the present invention.

In the first embodiment of the invention, a kind of dynamic tailor method of neural network is provided, as shown in Figure 1, including such as Lower step:

Step S101, setting cutting rate target cut range, cut object and cut stage number；

In the embodiment of the present invention, the cutting rate target is target, target ∈ R, and 0 < target < 1；Wherein R is Real number.In specific application, different values can be set to target according to actual scene demand.Specifically setting means includes But be not limited to: layer higher for redundancy or network, target suitably can set larger, namely reject more Cut object；Layer lower for redundancy or network, target suitably can set smaller, namely retain more sanction Cut object.

It further, in a particular application can be according to actual scene need about range is cut in the embodiment of the present invention It asks, the certain layer in neural network is cut, both can be individually to certain layers of cutting in whole network, it can also be to net The specific filter of certain layer in network is cut, can also in whole network it is all based on the network layer of inner product operation into Row is cut.That is, those skilled in the art flexibly can set the cutting range according to demand.

It further, in a particular application can be according to actual field about stage number is cut in the embodiment of the present invention Scape demand selects suitable cutting stage number.In one particular embodiment of the present invention, the setting for cutting stage number is quasi- It is then that cutting stage number is directly proportional to the height of cutting rate, is inversely proportional with the redundancy of network, the complexity with network is at just Than.

Further, in the embodiment of the present invention, the type for cutting object includes but is not limited to are as follows: the connection of weight, Filter and convolution kernel.That is, the embodiment of the present invention can carry out the dynamic tailor of the connection based on weight, moving based on filter State is cut and the dynamic tailor based on convolution kernel.

Step S102 according to the cutting rate target and cuts stage number, determines the cutting rate mesh in each cutting stage Mark；

In one particular embodiment of the present invention, it is assumed that a total of N number of cutting stage, each cutting stage uniformly cut R, r ∈ R, and 0 < r≤target, then r can be calculated by formula (1):

Assuming that the cutting rate target in i-th of cutting stage is milestone_i, then the cutting rate mesh in i-th of cutting stage Mark can be calculated by formula (2):

milestone_i=1- (1-r)ⁱ, 1≤i≤N formula (2)

Certainly, in a particular application, those skilled in the art can also be according to the demand of actual scene, using non-homogeneous side Formula carries out gradual cutting process.The embodiment of the present invention does not do uniqueness to the calculation of the cutting rate target of heterogeneous fashion It limits, any cutting rate goal-setting mode that non-homogeneous cutting may be implemented is within protection thought range of the invention.

Step S103 is determined every according to the network training parameter of neural network archetype and the cutting stage number The network training parameter in a cutting stage；

In the embodiment of the present invention, which is specifically included:

Step 31, it according to the training the number of iterations of neural network archetype and the cutting stage quantity, determines every The training the number of iterations in a cutting stage；

In an alternate embodiment of the present invention where, it is assumed that original trained the number of iterations is M, and the number for cutting the stage is N, The number of iterations M in so each cutting stage_iIt is calculated by formula (3):

M_i=α × M/N formula (3)

Wherein, 1≤i≤N, α are the number of iterations impact factor.In practical applications, α may be configured as 1, then entirely cut The uniform step-length of Cheng Caiyong updates cutting rate target and training parameter；Non-linear set-up mode can also be used, based on redundancy The guarantee with network performance is reduced, to increase the training the number of iterations in cutting stage in later period.

Step 32, according to the training the number of iterations of neural network archetype, training parameter and before each cutting stage Accumulative training the number of iterations is cut, determines the initialization training parameter in each cutting stage.

In one particular embodiment of the present invention, step 32 is realized especially by such as under type:

In the embodiment of the present invention, after all cutting phase process are complete, the corresponding training of all cutting stages has just been obtained The number of iterations and corresponding initialization training parameter.

Step S104 carries out network training by the cutting stage, and every based on the network training parameter in each cutting stage In a cutting stage, according to the cutting rate target in corresponding cutting stage, the cutting object in the cutting range is carried out It cuts.

In an alternate embodiment of the present invention where, in each cutting stage, the excitation of single trimming operation is by probability The physical training condition of function and current network codetermines, and specific implementation process includes:

Step 41, it in each cutting stage, before each repetitive exercise, is instructed based on probability function and current network Practice state, determines the probability of trimming operation；Wherein, the probability function is the increase with the number of iterations and the letter that is gradually reduced Number；

Step 42, judge whether the probability of the trimming operation is greater than specified threshold, if so, thening follow the steps 43；Otherwise, Execute step 44；

Step 43, when the probability of the trimming operation is greater than specified threshold, to the cutting in the cutting range Object is cut.

Step 44, when the probability of the trimming operation is less than or equal to specified threshold, not to the institute in the cutting range It states cutting object to be cut, is directly entered subsequent training step namely forward, backward is propagated and the behaviour of weight update Make.

Specifically, in step 41, being acted first by the cutting that probability function generates a benchmark in the embodiment of the present invention Excitation probability.Assuming that excitation probability is p, then p can be calculated by formula (4):

In formula (4), γ and power are preset parameter, and iter is current cuts in the stage or time of accumulative iteration Number.P is a function being gradually reduced with the number of iterations.

Secondly, generating a cutting impact probability factor-beta, 0 < β≤1 according to the physical training condition of current network.It is specific next It says, is exactly to judge whether network is in a trained dead state by stall detection (plateau detection) first.Such as The physical training condition of fruit network is in stagnation, then cutting impact probability factor-beta is set smaller than to the value of first threshold；Otherwise, Cutting impact probability factor-beta is set greater than to the value of second threshold；Wherein, second threshold is more than or equal to first threshold.Wherein, First threshold and second threshold are the real number less than 1, and value can flexibly be set according to demand.In one exemplary embodiment, The first threshold is set as 0.1, and the second threshold is set as 0.8.

More specifically, if the physical training condition of network is in stagnation, impact probability factor-beta will be cut and be set as Value much smaller than 1；If the performance of network still constantly enhances with the increase of the number of iterations, impact probability factor-beta will be cut It is set to close to 1 or the value equal to 1.

Specifically, in the embodiment of the present invention, in step 42, by formula (5) to determine whether carrying out the cutting of single Operation:

P × β > rand ()/RAND_MAX formula (5)

Wherein, rand () is the number that tandom number generator generates, and RAND_MAX is number caused by tandom number generator The upper limit.If inequality is very, to execute trimming operation, i.e. execution step 43；Conversely, going to step 44.In this way, making network With the increase of training the number of iterations, cuts frequency and gradually decrease；Meanwhile network cuts frequency in the case where restraining preferable situation Higher, structure change is larger；In the case where restraining stagnation, cutting frequency is lower, and structure is more stable.

Further, in an alternate embodiment of the present invention where, according to the cutting rate target in corresponding cutting stage, to institute The cutting object cut in range is stated to be cut, comprising:

Step 51, the importance of the cutting object in the cutting range is determined；

In practical applications, cut the importance of object judgment criterion have it is multiple, to cut connection of the object as weight For, the importance of weight can both be determined by the absolute value of weight, can also be in training, according to weight to affiliated neuron The disturbance degree of output activation value determine, can also be determined according to the importance of the input activation value of affiliated neuron.

Step 52, according to the cutting rate target of current generation, the cutting threshold value for cutting object is set；

Step 53, the importance for cutting object is compared with the cutting threshold value, and according to comparison result, more The new connection status for cutting object, namely disconnect the connection for cutting object or reset the connection for having disconnected and having cut object.

In an alternate embodiment of the present invention where, in order to improve relative efficiency, by the important of determining each cutting object Property is ranked up.

Cutting threshold value in step 52 of the embodiment of the present invention is single value (i.e. line of demarcation) or is interval value.

It is described that the connection status for cutting object is updated according to comparison result when the cutting threshold value is line of demarcation, Include:

Specifically, assuming to cut the connection that object is weight, certain certain weights is w, importance f_w, the shape of connection State is mask_w, mask_w={ 0,1 }, wherein 0 indicates to disconnect, 1 indicates connection, according to the current cutting rate target for cutting the stage, really Fixed cutting line of demarcation is b_single, then have:

Specifically, assuming to cut the connection that object is weight, certain certain weights is w, importance f_w, the shape of connection State is mask_w, mask_w={ 0,1 }, wherein 0 indicates to disconnect, 1 indicates connection, according to the current cutting rate target for cutting the stage, really Fixed cutting section is [b_lower b_upper], then have:

According to above-mentioned formula (6) or above-mentioned formula (7), the setting to the connection status of certain weights can be completed.

The method of the embodiment of the present invention cuts the connection status of object by updating, and can be realized to cutting object It cuts, that is to say, that when the connection status for cutting object is to disconnect, which is just no longer participate in training process.In order to More preferably illustrate the process, is illustrated for cutting object as the connection of weight below.

Assuming that the collection of the weight of the i-th cutting layer is combined into W_i, it is combined by the collection of the obtained weight connection status of step 53 Mask_i, then in propagated forward and back-propagating, the set W of the weight really to play a role_i ^*It can be obtained by formula (8):

W_i ^*=W_i×Mask_iFormula (8)

According to above formula (8), after cutting, the process of propagated forward can be expressed by formula (9):

Wherein, A_iFor i-th layer of input, ⊙ is convolution algorithm (or matrix multiplication), and network is L layers total.

According to above formula (8), after cutting, the calculating of back-propagating can be expressed by formula (10):

Wherein,For i-th layer of gradient, Loss () is the loss function of network.

In addition, it is necessary to it is clear that, in the present embodiment, cropped weight not necessarily necessarily participates in update, but is bound to Participate in sequence.

It in practical applications, can be according to its connection status set Mask for some specific cutting layer i_i, abandon and disconnect Weight attenuation term (decay) update part, can also abandon the weight of disconnection momentum (momentum) update part, The whole more new content of the discardable weight disconnected.No matter abandon and all update or abandon part update, entirety participates in cutting Weight its importance ranking can be constantly updated and transformation with the passage of training iteration cycle so that cutting rate target The connection of neighbouring weight constantly turns off or reconnection.

In summary, in the method for the embodiment of the present invention, for each cutting stage, the current generation is obtained first Cutting rate target and network training parameter and cutting range, subsequently into cutting-re -training iteration cycle.In each sanction Cut-re -training iteration cycle in, the excitation for cutting behavior is common by the preset probability function convergence state current with network It determines；In a trimming operation, the importance for cutting object is updated and sorts first, is then based on current cutting rate mesh Mark, setting cut range and judge that each connection for cutting object is disconnected or reconnection.After whole flow process, original nerve net Network be converted into a connection structure it is sparse but it is basic keep legacy network reasoning performance new neural network.The technology of the present invention Scheme can obtain preferable compression and acceleration effect while ensuring neural network precision, while restrain very fast, training effect Rate is high.

Further, in order to prove that the present invention is practical, we are using with widely applied character recognition, image point Class and object detection task are verified.In experiment, made using MNIST, CIFAR-10 and PascalVOC2007&2012 It is tested for data set.Wherein MNIST includes 60000 trained pictures, and 10000 test pictures amount to 10 classes；CIFAR- 10 include 50000 trained pictures, and 10000 test pictures amount to 10 classes；Pascal VOC2007&2012 includes 8218 Training image, 8333 test images.In addition, selecting widely applied LeNet-5 network (2 convolutional layers, 2 full connections Layer), Cifar network (10 convolutional layers) and YOLO (8 convolutional layers, 1 full articulamentum, Joseph Redmon, You Only Look Once:Unified, Real-Time Object Detection, 2015) it is used as deep learning neural network mould Type is verified, and under much compression rate (for 20 times), the results are shown in Table 1.

1 each group of table cuts the performance decline and training duration comparison of experiment

In table 1, traditional static tailor method refers to Han, paper " the Learning both that Song et al. is delivered weights and connections for efficientneural network"；Traditional dynamic tailor method refers to The paper " Dynamic Network Surgery forEfficient DNNs " that Guo, Yiwen et al. are delivered.

As shown in Table 1, more traditional static tailor method, upon compression, the influence amplitude of network performance is basic by the present invention Be consistent, but compression time namely network training and convergence time substantially accelerate (3 times or so)；More traditional dynamic is cut out Shear method, the present invention are consistent substantially on compression time, but compress after network performance fall it is significantly smaller.Thus As it can be seen that the present invention is a practicable efficient method of cutting out for deep learning neural network.

It should be pointed out that the method for the embodiment of the present invention is a kind of cutting side for deep learning neural network Method.(software product is dedicated hard suitable for the application scenarios that all expectations accelerate original neural network model for this method In part equipment).The cutting range of this method includes in all network layers based on inner product principle, namely including but not limited to various Convolutional layer and full articulamentum in CNN model, RNN model (or LSTM model) etc. etc..

In addition, cutting the mode as a kind of network de-redundancy, the performance of network can also be promoted by the cutting of appropriateness. Therefore this method can be used for being hopeful to optimize that (software product is dedicated in the application scenarios of original neural network model Hardware device on).The optimization range of this method includes all network layers based on inner product principle, namely includes but is not limited to each Convolutional layer and full articulamentum in kind CNN model, RNN model (or LSTM model) etc. etc..

In the second embodiment of the present invention, a kind of dynamic tailor device of neural network is provided, as shown in Fig. 2, described Device includes:

Setup module 210, for setting cutting rate target, cutting range, cut object and cut stage number；

Phase targets determining module 220, for determining each cutting according to the cutting rate target and cutting stage number The cutting rate target in stage；

Stage parameter determination module 230, for the network training parameter and the cutting according to neural network archetype Stage number determines the network training parameter in each cutting stage；

Training cuts module 240, for the network training parameter based on each cutting stage, carries out network by the cutting stage Training, and in each cutting stage, according to the cutting rate target in corresponding cutting stage, to the sanction in the cutting range Object is cut to be cut.

Based on above structure frame and implementation principle, several specific and preferred implementation sides under the above constitution are given below Formula, to refine and optimize the function of device of the present invention, so that the implementation of the present invention program is more convenient, accurately.It needs to refer to Out, in the absence of conflict, following feature can be in any combination.More particularly to following content:

In the embodiment of the present invention, the cutting object includes one of following object: connection, filter and the convolution of weight Core.

Further, in the embodiment of the present invention, the stage parameter determination module 230, comprising:

The number of iterations determines submodule 231, for according to the training the number of iterations of neural network archetype and described Cutting stage quantity determines the training the number of iterations in each cutting stage；

Training parameter determines submodule 232, for according to the training the number of iterations of neural network archetype, training parameter And accumulative training the number of iterations was cut before each cutting stage, determine the initialization training parameter in each cutting stage.

Further, in the embodiment of the present invention, the number of iterations determines each cutting stage that submodule 231 determines Training the number of iterations is M_i, the M_i=α × M/N；Wherein, M is original trained the number of iterations, and N is the number in cutting stage, and α is The number of iterations impact factor of setting, 1≤i≤N.

Further, in the embodiment of the present invention, the training parameter determines submodule 232, is specifically used for according to nerve net The original trained the number of iterations and training parameter of network estimate the training parameter of any iteration cycle in original training process, really The accumulative matched iteration cycle of training the number of iterations institute of cutting before the settled preceding cutting stage, according to the matched iteration Period corresponding training parameter estimates the corresponding initialization training parameter of current cutting stage update point.

Further, in the embodiment of the present invention, the training cuts module 240, comprising:

Determine the probability submodule 241, for before each repetitive exercise, being based on probability in each cutting stage Function and current network physical training condition, determine the probability of trimming operation；Wherein, the probability function is the increasing with the number of iterations The function for adding and being gradually reduced

Judging submodule 242 is cut, for triggering to the sanction when the probability of the trimming operation is greater than specified threshold Cut the operation that the cutting object in range is cut.

Further, in the embodiment of the present invention, the specified threshold includes: the number and random number that tandom number generator generates The ratio of the upper limit of number caused by generator.

Further, in the embodiment of the present invention, the determine the probability submodule 241 is specifically used for passing through the probability letter Number generates the cutting movement excitation probability an of benchmark, according to the physical training condition of current network, determine cut impact probability because The excitation probability and the cutting impact probability fac-tor are obtained the probability of the trimming operation by son；Wherein, described It cuts the impact probability factor and is less than or equal to 1.

In one particular embodiment of the present invention, the determine the probability submodule 241, specifically for passing through stall detection Judge whether network is in trained dead state, if so, the cutting impact probability factor is set smaller than first threshold Value；Otherwise, the cutting impact probability factor is set greater than to the value of second threshold；Wherein, second threshold is more than or equal to First threshold.

Further, in the embodiment of the present invention, training cuts module 240, comprising:

Information determines submodule 243, for determining the importance of the cutting object in the cutting range；

Threshold value sets submodule 244, for the cutting rate target according to the current generation, sets the cutting for cutting object Threshold value；

Submodule 245 is cut, for the importance for cutting object to be compared with the cutting threshold value, and according to Comparison result updates the connection status for cutting object.

Wherein, the cutting threshold value is single value or is interval value.

Further, in the embodiment of the present invention, when the cutting threshold value is single value, the cutting submodule 245 is used In when the importance for cutting object is more than or equal to the cutting threshold value, updating the connection status for cutting object is to connect It connects；When the importance for cutting object is less than the cutting threshold value, updating the connection status for cutting object is to disconnect；

Further, in the embodiment of the present invention, when the cutting threshold value is interval value, the cutting submodule 245 is used In when the importance for cutting object is less than the lower limit of the interval value, it is disconnected for updating the connection status for cutting object It opens；When the importance for cutting object is greater than the upper limit of the interval value, updating the connection status for cutting object is Connection；When the importance for cutting object is fallen within the scope of the interval value, keeps described and cut the current connection of object State.

Further, in the embodiment of the present invention, training cuts module 240, is also used to described in the cutting range It is that the attenuation term for cutting weight in object disconnected updates by connection status in the cutting range after cutting object is cut Partially, momentum updates part or all updates part and abandons.

The dynamic tailor device for the neural network that the embodiment of the present invention proposes, using dynamic tailor mode stage by stage, by general Rate function combines adaptive adjustment with network training state and cuts frequency, and a pressure can be generated within shorter cycle of training Shrinkage is higher and performance declines unconspicuous rarefaction neural network, it is seen that controllable, convergence that the invention proposes a cutting rates The faster dynamic tailor scheme for being directed to neural network, achieves more significant technical effect.

In the third embodiment of the present invention, a kind of computer readable storage medium is provided, computer journey is stored thereon with Sequence, when which is executed by processor the step of the dynamic tailor method of the neural network of realization as in the first embodiment.By In dynamic tailor method is discussed in detail in the first embodiment, it is no longer repeated herein.

Those of ordinary skill in the art will appreciate that storage medium can include but is not limited in the present embodiment are as follows: ROM, RAM, disk or CD etc..

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and the highlights of each of the examples are the differences of itself and other embodiments.Particularly with device, For storage medium embodiment, due to its substantially similar and embodiment of the method, so, it is described relatively simple, related place ginseng See the part explanation of embodiment of the method.

In short, the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. a kind of dynamic tailor method of neural network characterized by comprising

Cutting rate target is set, range is cut, cut object and cuts stage number；

According to the network training parameter of neural network archetype and the cutting stage number, the net in each cutting stage is determined Network training parameter；

Based on the network training parameter in each cutting stage, network training is carried out by the cutting stage, and in each cutting stage, According to the cutting rate target in corresponding cutting stage, the cutting object in the cutting range is cut.

2. the method as described in claim 1, which is characterized in that the cutting object includes one of following object: the company of weight It connects, filter and convolution kernel.

3. the method as described in claim 1, which is characterized in that the network training parameter according to neural network archetype With the cutting stage number, the network training parameter in each cutting stage is determined, comprising:

According to the training the number of iterations of neural network archetype and the cutting stage quantity, each cutting stage is determined Training the number of iterations；

Accumulative training is cut according to the training the number of iterations of neural network archetype, training parameter and before each cutting stage The number of iterations determines the initialization training parameter in each cutting stage.

4. method as claimed in claim 3, which is characterized in that the training the number of iterations in each cutting stage determined is M_i, institute State M_i=α × M/N；Wherein, M is original trained the number of iterations, and N is the number in cutting stage, and α is that the number of iterations of setting influences The factor, 1≤i≤N.

5. method as claimed in claim 3, which is characterized in that the training iteration time according to neural network archetype Number, training parameter and the accumulative training the number of iterations of cutting before each cutting stage determine the initialization instruction in each cutting stage Practice parameter, comprising:

According to the original trained the number of iterations and training parameter of neural network, any iteration cycle in original training process is estimated Training parameter；

The accumulative matched iteration cycle of training the number of iterations institute of cutting before determining the current cutting stage, according to matched institute The corresponding training parameter of iteration cycle is stated, the corresponding initialization training parameter of current cutting stage update point is estimated.

6. the method as described in claim 1, which is characterized in that the cutting object in the cutting range carries out It cuts, specifically includes:

In each cutting stage, before each repetitive exercise, it is based on probability function and current network physical training condition, is determined The probability of trimming operation；Wherein, the probability function is the increase with the number of iterations and the function that is gradually reduced；

When the probability of the trimming operation is greater than specified threshold, the cutting object in the cutting range is cut out It cuts.

7. method as claimed in claim 6, which is characterized in that the specified threshold includes: the number that tandom number generator generates With the ratio of the upper limit of number caused by tandom number generator.

8. method as claimed in claim 6, which is characterized in that it is described to be based on probability function and current network physical training condition, really Determine the probability of trimming operation, comprising:

It according to the physical training condition of current network, determines and cuts the impact probability factor, the cutting impact probability factor is less than or equal to 1；

9. method according to claim 8, which is characterized in that the physical training condition according to current network is determined and cut generally Rate impact factor, comprising:

Judge whether network is in trained dead state by stall detection, if so, the cutting impact probability factor is set It is set to the value less than first threshold；Otherwise, the cutting impact probability factor is set greater than to the value of second threshold；Wherein, Second threshold is more than or equal to first threshold.

10. method as described in claim 1 or 6, which is characterized in that the cutting object in the cutting range It is cut, is specifically included:

Determine the importance of the cutting object in the cutting range；

The importance for cutting object is compared with the cutting threshold value, and according to comparison result, updates the cutting The connection status of object.

11. method as claimed in claim 10, which is characterized in that the cutting threshold value is single value or is interval value；

It is described according to comparison result when the cutting threshold value is single value, update the connection status for cutting object, packet It includes:

When the importance for cutting object is more than or equal to the cutting threshold value, updating the connection status for cutting object is Connection；When the importance for cutting object is less than the cutting threshold value, it is disconnected for updating the connection status for cutting object It opens；

It is described according to comparison result when the cutting threshold value is interval value, update the connection status for cutting object, packet It includes:

When the importance for cutting object is less than the lower limit of the interval value, updating the connection status for cutting object is It disconnects；When the importance for cutting object is greater than the upper limit of the interval value, the connection status for cutting object is updated For connection；When the importance for cutting object is fallen within the scope of the interval value, keeps described and cut the current company of object Connect state.

12. the method as described in claim 1, which is characterized in that it is described to it is described cutting range in the cutting object into After row is cut, further includes:

It is that the attenuation term for cutting weight in object disconnected updates part, momentum update section by connection status in the cutting range Divide or all update part and abandons.

13. a kind of dynamic tailor device of neural network characterized by comprising

Phase targets determining module, for determining each cutting stage according to the cutting rate target and cutting stage number Cutting rate target；

Stage parameter determination module, for according to the network training parameter of neural network archetype and the cutting stage Number, determines the network training parameter in each cutting stage；

Training cuts module, for the network training parameter based on each cutting stage, carries out network training by the cutting stage, and In each cutting stage, according to the cutting rate target in corresponding cutting stage, to the cutting object in the cutting range It is cut.

14. device as claimed in claim 13, which is characterized in that the cutting object includes one of following object: weight Connection, filter and convolution kernel.

15. device as claimed in claim 13, which is characterized in that the stage parameter determination module, comprising:

The number of iterations determines submodule, for according to neural network archetype training the number of iterations and the cutting stage Quantity determines the training the number of iterations in each cutting stage；

Training parameter determines submodule, for according to the training the number of iterations of neural network archetype, training parameter and every Accumulative training the number of iterations was cut before a cutting stage, determined the initialization training parameter in each cutting stage.

16. device as claimed in claim 15, which is characterized in that the number of iterations determines each cutting that submodule determines The training the number of iterations in stage is M_i, the M_i=α × M/N；Wherein, M is original trained the number of iterations, and N is in cutting stage Number, α are the number of iterations impact factor of setting, 1≤i≤N.

17. device as claimed in claim 15, which is characterized in that the training parameter determines submodule, is specifically used for basis The original trained the number of iterations and training parameter of neural network estimate the training ginseng of any iteration cycle in original training process Number, the accumulative matched iteration cycle of training the number of iterations institute of cutting before determining the current cutting stage, according to matched institute The corresponding training parameter of iteration cycle is stated, the corresponding initialization training parameter of current cutting stage update point is estimated.

18. device as claimed in claim 13, which is characterized in that the training cuts module, comprising:

Determine the probability submodule, for before each repetitive exercise, based on probability function and working as in each cutting stage Preceding network training state, determines the probability of trimming operation；Wherein, the probability function be with the number of iterations increase and gradually The function of decline

Judging submodule is cut, for triggering to the cutting range when the probability of the trimming operation is greater than specified threshold The operation that the interior cutting object is cut.

19. device as claimed in claim 18, which is characterized in that the specified threshold includes: what tandom number generator generated Several ratios with the upper limit of number caused by tandom number generator.

20. device as claimed in claim 18, which is characterized in that the determine the probability submodule is specifically used for by described Probability function, the cutting movement excitation probability for generating a benchmark determine according to the physical training condition of current network and cut probability shadow It rings the factor and the excitation probability and the cutting impact probability fac-tor is obtained into the probability of the trimming operation；Wherein, The cutting impact probability factor is less than or equal to 1.

21. device as claimed in claim 20, which is characterized in that the determine the probability submodule is stagnated specifically for passing through Detection judges whether network is in trained dead state, if so, the cutting impact probability factor is set smaller than first The value of threshold value；Otherwise, the cutting impact probability factor is set greater than to the value of second threshold；Wherein, second threshold is greater than Equal to first threshold.

22. the device as described in claim 13 or 18, which is characterized in that the training cuts module, comprising:

Threshold value sets submodule, for the cutting rate target according to the current generation, sets the cutting threshold value for cutting object；

Submodule is cut, for the importance for cutting object to be compared with the cutting threshold value, and is tied according to comparing Fruit updates the connection status for cutting object.

23. device as claimed in claim 22, which is characterized in that the cutting threshold value is single value or is interval value；

When the cutting threshold value be single value when, the cutting submodule, for when it is described cut object importance be greater than etc. When the cutting threshold value, updating the connection status for cutting object is connection；When the importance for cutting object is less than When the cutting threshold value, updating the connection status for cutting object is to disconnect；

When the cutting threshold value is interval value, the cutting submodule, for being less than institute when the importance for cutting object When stating the lower limit of interval value, updating the connection status for cutting object is to disconnect；When the importance for cutting object is greater than When the upper limit of the interval value, updating the connection status for cutting object is connection；When the importance for cutting object is fallen When within the scope of the interval value, keeps described and cut the current connection status of object.

24. device as claimed in claim 13, which is characterized in that the training cuts module, is also used to the cutting model It is weight in the cutting object disconnected by connection status in the cutting range after the cutting object in enclosing is cut Attenuation term, which updates part, momentum updates part or all updates part abandons.

25. a kind of computer readable storage medium, which is characterized in that be stored thereon with computer program, the program is by processor The step of dynamic tailor method of the neural network as described in any one of claim 1~12 is realized when execution.