Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here
Formula is limited.Conversely, these embodiments are provided so that the disclosure is more thorough and complete, and can be by the disclosure
Scope intactly conveys to those skilled in the art.
In the application, will be mainly right for illustrating the present invention as a example by short-term memory long (LSTM) model of speech recognition
The improvement of ANN.The scheme of the application is applied to various artificial neural networks, including deep neural network (DNN), circulation nerve net
Network (RNN) and convolutional neural networks (CNN), are particularly suited for belonging to a kind of above-mentioned LSTM models of RNN.
The basic conception of neutral net compression
Artificial neural network (ANN) is a kind of behavioural characteristic for imitating animal nerve network, carries out distributed parallel information
The mathematics computing model for the treatment of.A large amount of nodes connected with each other are there are in neutral net, also referred to as " neuron ".Each nerve
Unit is processed the weighting from other adjacent neurons by certain specific output function (also referred to as " activation primitive ") and is input into
Value.And the information transmission intensity between each neuron is then defined with so-called " weights ".Algorithm can continuous self-teaching, adjust this
A little weights.
The neutral net of early stage only has input and output layer two-layer.Due to that cannot process the logic of complexity, its practicality is received
To considerable restraint.Deep neural network (DNN) is greatly improved by adding hiding intermediate layer between input and output layer
The ability of Processing with Neural Network complex logic.Fig. 1 shows DNN model schematics.It should be understood that in actual applications
DNN can have the large scale structure more much more complex than shown in Fig. 1, but its basic structure is still as shown in Figure 1.
Speech recognition is by the analog signal Sequential Mapping of language to a specific set of letters.In recent years, manually
The effect that the method for neutral net is obtained in field of speech recognition far beyond all conventional methods, as full row
The main flow of industry.Wherein, deep neural network has extremely wide application.
Recognition with Recurrent Neural Network (RNN) is a kind of conventional deep neural network model, different from traditional BP Neural Network
Network, Recognition with Recurrent Neural Network introduces directed circulation, can process the problem of forward-backward correlation between those inputs.In speech recognition
In, the forward-backward correlation of signal is very strong, for example, recognize the word in sentence, and word sequence relation before the word is very tight
It is close.Therefore, Recognition with Recurrent Neural Network has a very wide range of applications in field of speech recognition.
In order to solve the memory problems to long-term information, Hochreiter&Schmidhuber proposed length in 1997
When remember (LSTM) model.LSTM neutral nets are one kind of RNN, and neural network module is repeated by simple in the middle of common RNN
Change into the connection interactive relation of complexity.Fig. 2 shows the schematic diagram of LSTM neural network models.LSTM neutral nets are in voice
Extraordinary application effect is equally achieved in identification.
When designing and training deep neural network, big network size has stronger ability to express, can represent net
Stronger non-linear relation between network input feature vector and output.But, this larger network is actually wanted in study
Useful pattern when, it is easier to influenceed by noise in training set so that the pattern that learns expects deviation with actual.
Because the noise on these training sets be universal and factor data collection and it is different, the network instructed on data set is possible to
Over-fitting under influence of noise.
By developing rapidly in recent years, the scale of neutral net constantly increases, advanced neutral net up to hundreds of layers,
Several hundred million connections.Because the neutral net that scale increasingly increases can consume substantial amounts of calculating and memory access resource, therefore model compression
Just become particularly important.
In neutral net, especially deep neural network, the annexation of neuron is mathematically represented as one
Series matrix.Although accurate by the network prediction after training, its matrix is all dense, i.e., " non-zero is filled with matrix
Element ".As neutral net becomes even more complex, the calculating of dense matrix can consume substantial amounts of storage and computing resource.Thus
Caused low velocity and high cost cause that the popularization and application of mobile terminal are faced with huge difficulty, so as to greatly constrain nerve
The development of network.
Recent studies indicate that, by the way that in the neural network model matrix that training is obtained, only part weights are larger
Element represents important connection, and the less element of other weights can be removed (be set to zero), at the same time corresponding nerve
Unit is also by beta pruning (pruning).Neural network accuracy after beta pruning can decline, but can be by instructing (finetune) again, to still
The size for being retained in weights in model matrix is adjusted, so as to reduce loss of significance.Model compression can be by neutral net
Dense matrix rarefaction, can effectively reduce amount of storage, reduce amount of calculation, keep precision while realize accelerate.Mould
Type is compressed for special sparse neural network accelerator, it appears particularly important.Fig. 3 show by beta pruning-instruct again into
The schematic flow sheet of row Web compression.Fig. 4 then correspondingly show the node (neuron) before and after beta pruning and connect dividing for (cynapse)
Cloth situation.
The rarefaction degree of model matrix can be represented using compression ratio after beta pruning.Typically use sensitivity analysis at present
Method select compression ratio.Different matrixes in same neutral net, under identical compression ratio, change to neural network accuracy and produce
Raw influence is totally different.For example, individual layer LSTM neutral nets include Wgx, Wix, Wfx, Wox, Wgr, Wir, Wfr, Wor,
Wrm9 matrix, matrix W rm therein is compressed into the 10% drastically decline that can cause neural network accuracy, and (that is, Word Error Rate is drastically
Rise), and the precision that matrix W or is compressed to 10% network is basically unchanged.Therefore, prior art is usually using susceptibility point
Analysis, neural network accuracy of each matrix under different compression ratios in test network, so that suitable compression ratio is chosen as initial value, and
Finely tune on this basis, as final compression ratio.For example, to each matrix in individual layer LSTM neutral nets, respectively according to
0.1,0.2 ..., 0.9 totally 9 kinds of consistencies be compressed, the Word Error Rate (WER) of test network, choose compression after compared to pressure
Before contracting | △ WER | minimum consistencies corresponding less than specified threshold as the matrix consistency initial value.This parameter scanning
Interval can be referred to as coarseness and choose mode than larger.
The mask compression of neutral net is instructed again
Compression depth neutral net is substantially the rarefaction to weight matrix in deep neural network.Weights after rarefaction
Matrix has the element that many values are 0.During computing is carried out, these zero valued elements can be not involved in computing, reduce
The operation times of needs, such that it is able to improving operational speed.Meanwhile, if network rarefaction degree (such as consistency higher
15%), then the weights of non-zero can be only stored, such that it is able to reduce memory space.But because compression process eliminates phase
When a part of weights, the degree of accuracy of entire depth neutral net can decline many, it is necessary to pass through to instruct the stage again, and adjustment still retains
The size of the weights in network weight matrix, improves the model accuracy of deep neural network again.But typically, by
Some weights are reset in beta pruning has added new constraint equivalent in solution space, so readjust converging to new local optimum
Although precision lifting after point, compared to the deep neural network before beta pruning, precision has still declined.
According to existing heavy instruction mode, the generator matrix shape mask (being designated as M) in beta pruning, mask is some 0-1 matrixes,
The weight matrix in LSTM is corresponded to respectively.These matrix shape mask code matrixes record the distribution letter of matrix non-zero unit after compression
Breath, is that 1 element representation corresponding weight value matrix correspondence position element retains, and is 0 element representation corresponding weight value matrix correspondence
Position element is set to 0.It should be understood that it is each element in the mask according in be introduced into the purpose of above-mentioned 0-1 matrixes mask
Value corresponding weights in weight matrix are used restraint (that is, zero setting or keep constant), and the matrix shape mask is then
Realize that one kind of above-mentioned constraint is easy to computer implemented means.In other words, if real otherwise in concrete practice
The zero setting of correspondence weights in weight matrix is showed, it is also possible to regard the realization using the equivalent effect of matrix mask as.
The mask pressure of neutral net is now illustrated with specific reference to the compression process of LSTM deep neural networks in speech recognition
Contracting principle.Beta pruning result is stored as matrix shape mask, by instructing stage reasonable employment matrix shape mask again, is being kept
The accuracy rate of deep neural network is ensured on the basis of compression ratio is constant as far as possible.
1. band mask is instructed again
For the ANN that the annexation between multiple neurons and neuron is represented by connection weight matrix, Fig. 5 shows
A kind of beta pruning-weight method for training for neutral net compression.The method includes that beta pruning step S510 and band mask instruct step again
S520。
In step S510, one or more the inessential weights in the preceding weight matrix trained and obtain are set to zero, with
Obtain through the weight matrix of beta pruning.
Herein, " inessential weights " refer to little thus relatively inessential on the accuracy rate influence of neural network model
Weights.Although people employ various rules to specify that what is inessential weights, for example, according to the Hessian of cost function
Matrix selects inessential weights, but generally to be received by academia at present with the viewpoint of experimental verification be " in weights average
The connection that less those weights of absolute value are represented is relatively unessential ".In " inessential weights " therefore preferred reference matrix
Less those weights of absolute value.
In step S520, using the mask corresponding to the weight matrix through beta pruning, (that is, null position corresponds to weight matrix
The 0-1 matrixes of beta pruning position) weight matrix is instructed again.
As follows will to such as upper band mask, the scheme of instructing be described again with reference to formula.
Note carries out re -training to network and is used to optimize the process of every frame cross entropy:
nneto=Rmask(nneti,M) (1)
Wherein nnetiIt is the input network after beta pruning, nnetoIt is output network.RmaskRepresent that a kind of band mask was trained
Journey.Only the weights not cut off are updated (whether the information record for cutting off is in M) in this process.By this mistake
Journey, remain in the weights in network weight matrix has been worth to adjustment, and deep neural network gradually converges to new office
Portion's optimal solution.
Note M ⊙ nnet0It is to nnet0In each matrix, with M correspondence mask code matrix carry out dot product.Note nnet0To wait to press
The network of contracting, then compression process is as follows:
(a)nnet0→ M (to input network beta pruning, obtains beta pruning mask M)
(b)nneti=M ⊙ nnet0(input network and mask dot product, complete beta pruning)
(c)nneto=Rmask(nneti, M) (the Netowrk tape mask after beta pruning is instructed again, nnet is obtainedo)
2. instruct again+instructed again with mask without mask
Compared to the flow according to Fig. 5, can be with as shown in fig. 6, instructing step again in beta pruning step S610 and band mask
Increase by one between S620 and instruct step S611 again without mask, i.e. can increase by one between the computing of (b) and (c) not
Band mask instructs link again, and the link is as follows:
nneto=Rno_mask(nneti) (2)
So-called instruction again without mask refers to the constraint that beta pruning shape is removed during instruct again, it is allowed to by the weights of beta pruning
Regrow.A kind of intuitively implementation can be input into the initial of network during using the network weight after beta pruning as instruction again
Value, the weights that those have been cut off are 0 equivalent to input initial value.Therefore, it is the network weight after beta pruning is defeated as what is instructed again
Enter network initial value equivalent to network iteration since preferably starting point is allowed, allow relatively important weight to have relatively bigger
Initial weight so that network more likely reduces the interference of grass, study to valuable pattern.Theory and practice
The new network of the heavy instruction generation is illustrated compared to the network before beta pruning, the degree of accuracy can increase.
But because without mask, so the really dense network generated after training, it is impossible to reach the mesh of compression
, so needing the weights zero setting that this cuts off by those scripts again.And this can cause that the network degree of accuracy declines again.In order to extensive
The multiple degree of accuracy, it is necessary to carry out band mask training again so that network converges to an office in the solution space after adding beta pruning constraint
Portion's optimum point, it is ensured that the deep neural network degree of accuracy after beta pruning.
Therefore, compression process new after optimization is as follows, wherein nnet0It is network to be compressed, RmaskIt is that band mask is instructed again,
RnomaskIt is to be instructed again without mask.
(a)(input network beta pruning, obtain beta pruning mask M)
(b)(input network and mask dot product, complete beta pruning)
(c)(network after beta pruning is instructed again without mask, is obtained)
(d)It is (rightDot product M, regrows at removal beta pruning again
The weights for going out)
(e)(carry out band mask to the network after dot product to instruct again, after being compressed
Network)
Instructed again by without mask, the degree of accuracy to network first carries out lifting so as to largely solve compressed web
The phenomenon that the degree of accuracy is generally reduced after network.From the point of view of engineering practice result, in some cases, made using the method after the optimization
The precision of network can even increase after must compressing.
In engineering practice, the computing of (d)-(e) generally can be preferably repeated (from step 620 return to step in such as Fig. 6
Shown in 611 arrow) so that network convergence is to more preferably local best points.Wherein iteration is according to key link:
Iteration stopping condition isWherein for the LSTM nets for speech recognition
For network model, e (nnet0) may refer to nnet0Error rate.
Fig. 7 shows to be instructed again without mask improves the example of neural network accuracy.The figure is directed to a certain thousands of hours Chineses
The LSTM deep neural networks trained on sound data set instruct the whole flow process that step is compressed without mask again using addition
And correlated results.Abscissa is operating procedure in figure, and ordinate is an index for weighing the deep neural network degree of accuracy, i.e.
Wrong word rate (WER).WER is lower, and the explanation network degree of accuracy is higher.Black line represents the initial wrong word rate of network to be compressed, grey
Line represents the process of three wheel compressions.The result of beta pruning is carried out when dotted line meaning is every wheel compression for the first time, it is seen that each beta pruning
Network mistake word rate rises afterwards.First carry out being trained without mask after beta pruning, network mistake word rate declines, then beta pruning again, and wrong word rate is again
Rise, then band mask is trained, wrong word rate declines again.4,8,12 results for corresponding to three wheel compressions respectively, it is seen that the degree of accuracy is compared
Initially lifted.
The dynamic mask compression of neutral net
For for complex neural network model, especially Multi-Layered Network Model, the network model matrix of each layer
It is interrelated.Thus can show smaller in the presence of some important being connected on weights, and it is most of unessential when cutting off
Connection is laid equal stress on after instruction, and show again must be than larger.These weights are in beta pruning equivalent to by miscut.In the pressure of above-mentioned combination Fig. 6
In contracting flow, the network for precise decreasing after beta pruning is instructed again when, beta pruning mask M is still used, i.e. can only adjust still
The size of the weights being retained in model matrix, therefore miscut phenomenon cannot be recovered.This can cause network model possible
Not good local best points are converged to, so as to all bring influence to compression ratio and model accuracy.Heavy instructed for above-mentioned
Journey is to find local optimum again on the basis of beta pruning, and does not attempt situation about being modified to miscut etc., can by
Instruction stage addition again is adjusted to the dynamic for storing the matrix shape mask of beta pruning result, to some the miscut behaviors during beta pruning
It is modified and recovers, reaches the target for ensureing even to improve artificial neural network (for example, LSTM) performance after beta pruning.
Fig. 8 shows the indicative flowchart that neutral net is adjusted using dynamic mask.As shown in figure 8, a kind of adjustment people
The method of artificial neural networks includes beta pruning step S810, instructs step S811, mask generation step S812 and band again without mask and cover
Code instructs step S820 again.
In step S810, n inessential weights in all N number of weights of housebroken first connection weight matrix are set
It is zero.Inessential weights preferably can be n weights of absolute value minimum in all N number of weights.In step S811, do not forcing
Constrain any weights be in the case of zero re -training through beta pruning the second connection weight matrix.In step S812, according to through not
With the 3rd connection weight matrix generator matrix shape mask that mask is instructed again.In step S820, using the mask code matrix to
Three connection weight matrixes carry out re -training.It should be understood that " first " in this, " second " and " the 3rd " is intended to signified every
It is distinguished by, rather than is limited with any other meaning, does not also really want corresponding with the network numbering in such as formula.Separately
Outward, although step S812 is referred to as mask generation step, but due to that can understand beta pruning step S810 in one embodiment
It is the beta pruning step carried out using beta pruning mask, therefore step S812 can also regard as and dynamically be adjusted according to the 3rd weight matrix
The mask set-up procedure of existing mask (for example, beta pruning mask).
Method of adjustment for Fig. 6, beta pruning mask M is to maintain constant.As long as this means that certain part weights once
Be taken as inessential weights and cut off, then cannot recover again forever, even if a certain weights become after a re -training it is very big
(connection performance is critically important after explanation beta pruning) is also such.In order to correct this problem, that is, recover to those instruct again after table
The miscut of existing important element, it is necessary to mask is entered Mobile state adjustment (in order to the mask to dynamic adjustment is distinguished by,
Use beta pruning mask M instead beta pruning mask M herein0Represent):
(dynamic adjustment M0, obtain M1)
In other words, instructed again different from the use of the direct band mask that carried out using beta pruning mask M, the method for Fig. 8 needs root
According to the mask dynamically used adjustment is instructed with mask again in the preceding result instructed without mask again.
Increase this compression process after dynamically adjusting as follows:
(a)(input network beta pruning, obtains beta pruning mask M0)
(b)(input network and mask dot product, complete beta pruning)
(c)(network after beta pruning is instructed again without mask, is obtained)
(d)(dynamic adjustment M0, obtain M1)
(e)It is (rightDot product M1, removal is adjusted needs what is grown at beta pruning
Weights)
(f)(carry out band mask to the network after dot product to instruct again, after being compressed
Network)
In engineering practice, above-mentioned step, mask generation step and the band mask of being instructed again without mask can be repeated and instruct step again
Untill the rapid optimization solution until obtaining the connection weight matrix.I.e., it is possible to repeat the computing of (c)-(f) (from step in such as Fig. 9
Shown in the arrow of 920 return to step 911) so that network convergence is solved to a more preferably optimization, such as local best points.
Each above-mentioned repetition is required for readjusting a matrix mask, i.e.In addition, in reality
For the consideration of time cost etc. in engineering, instruct again and not converge to local optimum sometimes, but reach the predetermined degree of accuracy and want
Optimization solution under asking.
Above-mentioned iteration can also be the iteration of more a small range.For example, the instruction again that can be carried out repeatedly without mask is calculated,
Solution is instructed again without mask until being optimized.The mask dynamic adjustment step for then continuing after entering again.Likewise it is possible to make
Many sub-band masks being carried out with same dynamically adjusted mask and instructing calculating again, instructing solution again until the band mask for being optimized is
Only.The above-mentioned multiple instruction again without mask is calculated and the calculating of repeatedly instructing again with mask can be regarded as being respectively included in single
Instruct step again without mask and band mask is instructed in step again.
In one embodiment, mask generation step can be minimum with absolute value in the 3rd connection weight matrix in mask
N weights correspondence position on value zero setting.
An example of above-mentioned dynamic adjustment mode is now intuitively given using formula:Remember that mat isIn any power
Value matrix, mkIt is correspondence MkIn mask code matrix.Note mkIn 0 value element number be n.Generation one and MkComplete 1 square of same size
Battle array M 'k+1, all elements in mat are sorted from big to small, by minimum n element correspondence M 'k+1The element of relevant position
Set to 0 and can obtain Mk+1.0 element in mask can be transferred to this dynamic adjustment process the ground of weights minimum in those weights networks
Side, and ensure that compression ratio is constant.The method is actually that the outer less weights of mask are done into mask with weights larger in mask
Replace.
Figure 10 shows and is keeping dynamically being adjusted in the case that compression ratio is constant an example of mask.Grayscale is represented in figure
Mask correspondence position, grid and oblique line background pp mask are adjusted.As illustrated, because weights are small when neutral element 0.3 starts
Cut off, but become big (0.7) instruct again after, illustrated that the connection representated by the element is important.Element quilt when starting in lower
Retain but diminish (0.3) after instruct again, illustrate that the connection is actually inessential.Now by above-mentioned optimisation strategy, in can recovering
Between element, cut off lower middle element.
Above-mentioned mask can be dynamically adjusted according to other modes.Academic circles at present not yet has the compression ratio being widely recognized as to choose
Rule, so being typically all to choose mode with taking the coarsenesses such as parameter scanning.This means that the compression ratio of selection in itself not
Must be suitable, thus compression ratio can be preferably finely adjusted in instruction link again.Specifically, if mask code matrix 1 is worth after instruct again
Unit in correspondence neural network weight matrix have quite a few, with the corresponding neutral net power of the value of mask code matrix 0 after instruct again
Quite a few of element in value matrix, very little is differed in weights size, and that means that these weights for cutting and not
Those importance cut are basically identical, now just should not again subtract these weights.
Then, in one embodiment, mask generation step by the mask with the 3rd connection weight matrix in
Absolute value is less than the value zero setting on the weights correspondence position of weight threshold.The weight threshold can be predetermined based on experience value
, it is also possible to tried to achieve according to the element value in the 3rd weight matrix.Preferably, by the N number of power of whole of the 3rd connection weight matrix
Value is sorted from big to small by absolute value, weight threshold can be in collating sequence before (N-n) individual weights average, i.e. be
The average of other the whole elements outside n element of zero setting is wanted in removing.Weight threshold can also be (N-2n in collating sequence
+ 1) average of the n weights that individual weights rise, i.e. remove the average of the minimum n element outside n element for wanting zero setting.It is above-mentioned
Threshold value can also be the value based on above-mentioned average, such as value suitable with above-mentioned average or proportional.
An example of above-mentioned dynamic adjustment mode is now intuitively given using formula:Remember that mat isIn any power
Value matrix, mkIt is correspondence MkIn mask code matrix.Note mkIn 0 value element number be n, total element number be N.Generation one and Mk
Identical matrix M 'k+1.The step-length that note chooses compression ratio is ∈.All correspondence m in matkIn 1 value element element
Sorted from big to small, and taken the N number of element x of minimum ∈i(i=1,2 ..., ∈ N), calculates its average:
To all satisfactions | | x 'i|-xavg|<Element x ' in the mat of σiCorrespondence M 'k+1Put 1 in relevant position, you can obtain Mk+1。
The method is actually to recover those sizes close to xavgWeights.
Figure 11 shows need not keep dynamically being adjusted in the case that compression ratio is constant an example of mask.Grayscale in figure
Mask correspondence position is represented, grid and oblique line background pp mask are adjusted.As illustrated, neutral element (0.5) most start because
Weights are small to be cut off, but is become big (0.7) instruct again after, identical with outer some element sizes of mask, is illustrated representated by the element
Connection is important.Because now the outer element of mask does not also become very little.Therefore reply compression ratio is adjusted.In can be
Compare scheme using above-mentioned threshold value, realize the recovery to neutral element.
Although Figure 10 and Figure 11 respectively illustrate the situation that zero setting element is replaced and recovered, it is understood that, according to
One of above-mentioned each dynamic adjustment mode or any combination, zero setting element can also simultaneously occur and replace and recover, or enter one
Step deletes the situation of non-zero setting element.In addition, though many orders of magnitude with weights (are simply expressed as weights big sometimes above
It is small) characterize the whether important of specific weights element, but weight matrix can also be carried out according to other importance rules
Beta pruning and mask are dynamically adjusted, and these changes are all located within the scope of included by claim of the invention.
Figure 12 shows the schematic diagram of the adjusting apparatus for being able to carry out ANN Adjusted Options of the invention.The ANN adjusting apparatus
1200 can include pruning device 1210, instruct dress again without mask weight training apparatus 1211, mask generating means 1212 and band mask
Put 1220.
Pruning device 1210 can be used for n in all N number of weights of housebroken first connection weight matrix not
Important weights are set to zero.Can again be instructed not obligating in the case that any weights are zero without mask weight training apparatus 1211
Practice through the second connection weight matrix of beta pruning.Mask generating means 1212 are according to the 3rd connection weight square through being instructed without mask again
Battle array generator matrix shape mask.Band mask weight training apparatus 1220 then can be used the mask code matrix to the 3rd connection weight square
Battle array carries out re -training.
Similarly, can be with without mask weight training apparatus 1211, mask generating means 1212 and band mask weight training apparatus 1220
Untill repeating the optimization solution until obtaining the connection weight matrix.In addition, being covered without mask weight training apparatus 1211 and band
Code weight training apparatus 1220 itself internally can also carry out repeatedly instruction meter again for multiple weight matrixs (if any) respectively
Calculate, position is instructed in the action or end that next device can be entered until being optimized to again.
Mask generating means 1212 can be by n minimum with absolute value in the 3rd connection weight matrix in the mask
Value zero setting on individual weights correspondence position, can enter row element zero setting according to weight threshold.Weight threshold can connect based on the 3rd
Weight matrix is connect to try to achieve.If the weights of the 3rd connection weight matrix sorted from big to small by absolute value, weight threshold
Can be according to one of following mean set or the average in itself:The average of (N-n) individual weights before in collating sequence;Sequence sequence
The average of the n weights that (N-2n+1) individual weights rise in row
The ANN adjusting apparatus 1200 can be used for performing the Adjusted Option according to Fig. 5-6 of the present invention.For example, making
The scheme of Fig. 5 is performed with pruning device 1210 and with mask weight training apparatus 1220, or using pruning device 1210, without mask
Weight training apparatus 1211 and the scheme shown in Fig. 6 is performed with mask weight training apparatus 1220.
Band mask in text and (finetune) is instructed again without mask to refer to continue on the basis of existing training and train, can
To be interpreted as " fine setting ", rather than the re -training (retrain) started anew to neutral net.The present invention is for network after beta pruning
Local optimum solution adjustment obviously should be construed as on the basis of having trained continuation training.
Flow chart and block diagram in accompanying drawing show the possibility reality of the system and method for multiple embodiments of the invention
Existing architectural framework, function and operation.At this point, each square frame in flow chart or block diagram can represent module, a journey
A part for sequence section or code a, part for the module, program segment or code is used to realize regulation comprising one or more
The executable instruction of logic function.It should also be noted that in some are as the realization replaced, the function of being marked in square frame also may be used
Occur with different from the order marked in accompanying drawing.For example, two continuous square frames can essentially be performed substantially in parallel,
They can also be performed in the opposite order sometimes, and this is depending on involved function.It is also noted that block diagram and/or stream
The combination of the square frame in each square frame and block diagram and/or flow chart in journey figure, can use the function or operation for performing regulation
Special hardware based system realize, or can be realized with the combination of computer instruction with specialized hardware.
It is described above various embodiments of the present invention, described above is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport
Best explaining principle, practical application or the improvement to the technology in market of each embodiment, or make the art
Other those of ordinary skill are understood that each embodiment disclosed herein.