CN106355248A

CN106355248A - Deep convolution neural network training method and device

Info

Publication number: CN106355248A
Application number: CN201610738135.7A
Authority: CN
Inventors: 乔宇; 刘家铭; 王亚立
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2016-08-26
Filing date: 2016-08-26
Publication date: 2017-01-25

Abstract

The present invention relates to the field of deep learning techniques, in particular to a deep convolution neural network training method and a device. The deep convolution neural network training method and the device comprise the steps of a, pretraining the DCNN on a large scale data set, and pruning the DCNN; b, performing the migration learning on the pruned DCNN; c, performing the model compression and the pruning on the migrated DCNN with the small-scale target data set, In the process of migrating learning of large-scale source data set to small-scale target data set, the model compression and the pruning are performed on the DCNN by the migration learning method and the advantages of model compression technology, so as to improve the migration learning ability to reduce the risk of overfitting and the deployment difficulty on the small-scale target data set and improve the prediction ability of the model on the target data set.

Description

A kind of depth convolutional neural networks training method and device

Technical field

The present invention relates to depth learning technology field, particularly to a kind of depth convolutional neural networks training method and dress Put.

Background technology

In recent years, with the fast development of the Internet and computer technology, depth convolutional neural networks (deep Convolutional neural network, dcnn) achieved in the challenge subjects such as image classification, audio identification prominent Broken property success.But, the model structure bulky complex of dcnn, need large-scale data to be optimized training to model parameter.So And, real-life many practical problems, generally only have the support of small-scale data, directly utilize the small-scale of goal task Training data, is difficult to obtain high performance dcnn.One wide variety of strategy is transfer learning, in deep learning research neck Domain, transfer learning is a kind of effective technology for the modeling of small-scale target data set.Substantial amounts of research shows, from extensive source The dcnn that data set training obtains has general expression, can be as the pre-training model of small-scale target data set [donahue et al.,2014；yosinski et al.,2014].I.e. first with the large-scale dataset (source of common tasks Data set) training one huge structure dcnn, then using goal task small-scale data set (target data set) to pre- The dcnn of training is finely adjusted and obtains the dcnn with regard to goal task.But, the dcnn being obtained by set of source data pre-training Comprise a large amount of model redundancies, after this meeting ability that largely restricted migration learns, and then impact transfer learning, dcnn is in mesh Estimated performance in mark task.

In order to reduce redundancy, scientific research personnel propose dcnn is compressed with beta pruning [hintonet al, 2015；hanet al.,2015；han et al.,2016].Wherein, [han et al., 2015；Han et al., 2016] propose one kind to be directed to The iteration Pruning strategy of dcnn model parameter is it is achieved that considerable redundancy compression ratio.In addition, [hinton et al., 2015] will " knowledge " in extensive dcnn refines to small-scale dcnn, thus effectively instructing the training of small-scale dcnn, indirectly real with this The now model compression to extensive dcnn.But, from the angle of model compression, the beta pruning Compression Strategies [hinton of dcnn et al,2015；han et al.,2015；Han et al., 2016] it is primarily directed to the behaviour of same extensive set of source data Make, and be not directed to the transfer learning of small-scale target data set.Therefore, such method still suffers from how being small-scale target data The problem of high-performance dcnn set up by collection.

In above-mentioned, the list of references related to the application includes:

yosinski,j.,clune,j.,bengio,y.,lipson,h.(2014).how transferable are features in deep neural networks？.in advances in neural information processing systems.

donahue,j.,jia,y.,vinyals,o.,hoffman,j.,zhang,n.,tzeng,e.,darrell,t. (2014).decaf:a deep convolutional activation feature for generic visual recognition.in international conference on machine learning.

han,s.,pool,j.,tran,j.,dally,w.(2015).learning both weights and connections for efficient neural network.in advances in neural information processing systems.

hinton,g.,vinyals,o.,dean,j.(2015).distilling the knowledge in a neural network.arxiv preprint arxiv:1503.02531.

krizhevsky,a.,sutskever,i.,hinton,g.e.(2012).imagenet classification with deep convolutional neural networks.in advances in neural information processing systems.

han,s.,mao,h.,dally,w.j.(2016).deep compression:compressing deep neural network with pruning,trained quantization and huffman coding.in international conference on learning representations.

Content of the invention

The invention provides a kind of depth convolutional neural networks training method and device are it is intended at least solve to a certain extent Certainly one of above-mentioned technical problem of the prior art.

In order to solve the above problems, the technical scheme is that

A kind of depth convolutional neural networks training method, comprises the following steps:

Step a: in extensive set of source data, pre-training is carried out to dcnn, and model beta pruning is carried out to described dcnn；

Step b: transfer learning is carried out on the dcnn of beta pruning；

Step c: carry out model compression using the dcnn after the migration of small-scale target data set pair.

The technical scheme that the embodiment of the present invention is taken also includes: in described step a, described carries out pre-training tool to dcnn Body is: using extensive set of source data, by back-propagation algorithm and gradient descent method, carries out pre-training to described dcnn；Institute State and model beta pruning is carried out to dcnn particularly as follows: carrying out model beta pruning using the iterative strategy of beta pruning-retraining, each iteration is divided into Two steps, the first step is model beta pruning, and Model Weight parameter relatively low for significance in this dcnn is set to zero；Second step is model Retraining, trains the dcnn after beta pruning using back-propagation algorithm and gradient descent method, obtains rarefaction dcnn.

The technical scheme that the embodiment of the present invention is taken also includes: in described step b, the described dcnn in beta pruning is enterprising Row transfer learning specifically includes:

Step b1: the output layer changing described rarefaction dcnn is the classification of target data set, leans on by output layer and most The full articulamentum of nearly output layer reverts to densification, and the Model Weight parameter of the full articulamentum near output layer is carried out at random Initialization；

Step b2: refine the tacit knowledge with regard to target data set in set of source data, dominant using small-scale data set Knowledge and its tacit knowledge in source data set, the dcnn of fine setting training rarefaction, realizes transfer learning.

The technical scheme that the embodiment of the present invention is taken also includes: in described step b, the described dcnn in beta pruning is enterprising Row transfer learning specifically also includes:

Step b3: using amended rarefaction dcnn as trunk model；

Step b4: using pre-training dcnn in set of source data as tacit knowledge reference model；

Step b5: carry out by the output layer in described tacit knowledge reference model and near the full articulamentum of output layer Replicate, as the additional branches of described trunk model, described additional branches are placed on the respective layer of trunk model；

Step b6: the prediction using trunk model is compared with the correspondence markings of target training set, designs main loss function；Profit Compared with the corresponding output of tacit knowledge reference model with the prediction of additional branches, design extraneoas loss function；Total losses function It is the weighted sum of main loss function and extraneoas loss function；Using described total losses function, to trunk mould on target data set Type and additional branches carry out model training using back-propagation algorithm, realize transfer learning.

The technical scheme that the embodiment of the present invention is taken also includes: in described step c, described to migration after dcnn carry out Model compression specifically includes: first by the iterative strategy of beta pruning-retraining, carries out beta pruning in described trunk model, uses The connection of non-zero setting in total losses function pair trunk model and extra branch carry out parameter learning；Then instruct from target at random Practice the input concentrating sample drawn subset as trunk model, obtain activation on full articulamentum for the described sample drawn subset Value, cuts the relatively low neuron of significance, and carries out retraining using total losses function, repeatedly complete model pressure with this iteration Contracting.

Another technical scheme that the embodiment of the present invention is taken is: a kind of depth convolutional neural networks training devicess, comprising:

Model pre-training module: for pre-training being carried out to dcnn in extensive set of source data, and described dcnn is entered Row model beta pruning；

Transfer learning module: for transfer learning is carried out on the dcnn of beta pruning；

Model compression module: for carrying out model compression using the dcnn after the migration of small-scale target data set pair.

The technical scheme that the embodiment of the present invention is taken also includes: described model pre-training module carries out pre-training tool to dcnn Body is: using extensive set of source data, by back-propagation algorithm and gradient descent method, carries out pre-training to described dcnn；Institute State and model beta pruning is carried out to dcnn particularly as follows: carrying out model beta pruning using the iterative strategy of beta pruning-retraining, each iteration is divided into Two steps, the first step is model beta pruning, and Model Weight parameter relatively low for significance in this dcnn is set to zero；Second step is model Retraining, trains the dcnn after beta pruning using back-propagation algorithm and gradient descent method, obtains rarefaction dcnn.

The technical scheme that the embodiment of the present invention is taken also includes: described transfer learning module includes:

Model modification unit: the output layer for changing described rarefaction dcnn is the classification of target data set, will export Layer and the full articulamentum near output layer revert to densification, and the Model Weight ginseng to the full articulamentum near output layer Number carries out random initializtion；

Model fine-adjusting unit: for refining the tacit knowledge with regard to target data set in set of source data, using small-scale number According to Explicit Knowledge and its tacit knowledge in source data set of collection, the dcnn of fine setting training rarefaction, realize transfer learning.

The technical scheme that the embodiment of the present invention is taken also includes: described transfer learning module is carried out on the dcnn of beta pruning Transfer learning specifically also includes: using amended rarefaction dcnn as trunk model；By pre-training dcnn in set of source data As tacit knowledge reference model；By the output layer in described tacit knowledge reference model and the full connection near output layer Layer is replicated, and as the additional branches of described trunk model, described additional branches is placed on the respective layer of trunk model； Prediction using trunk model is compared with the correspondence markings of target training set, designs main loss function；Pre- using additional branches Survey and compare with the corresponding output of tacit knowledge reference model, design extraneoas loss function；Total losses function be main loss function with The weighted sum of extraneoas loss function；Using described total losses function, target data set makes to trunk model and additional branches Carry out model training with back-propagation algorithm, realize transfer learning.

The technical scheme that the embodiment of the present invention is taken also includes: described model compression module carries out mould to the dcnn after migration Type compression specifically includes: first by the iterative strategy of beta pruning-retraining, carries out beta pruning, using total in described trunk model Loss function carries out parameter learning to the connection of the non-zero setting in trunk model and extra branch；Then train from target at random Concentrate sample drawn subset as the input of trunk model, obtain activation value on full articulamentum for the described sample drawn subset, Cut the relatively low neuron of significance, and carry out retraining using total losses function, model compression is repeatedly completed with this iteration.

With respect to prior art, what the embodiment of the present invention produced has the beneficial effects that: the depth convolution of the embodiment of the present invention Neural network training method and device utilize the mutual supplement with each other's advantages of transfer learning method and model compression technology, in extensive source data During collecting the transfer learning of small-scale target data set, model compression and beta pruning are carried out to dcnn, thus improve migration learning Habit ability, to reduce over-fitting risk on small-scale target data set for the dcnn and deployment difficulty, improves model in number of targets According to the predictive ability on collection.The compression dcnn being obtained by the present invention, is applicable to mobile terminal, embedded equipment, robot In the high technology industry field calculating with constrained storage, there is higher economical and practical value.

Brief description

Fig. 1 is the flow chart of the depth convolutional neural networks training method of the embodiment of the present invention；

Fig. 2 is the structural representation of the depth convolutional neural networks training devicess of the embodiment of the present invention.

Specific embodiment

In order that the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, not For limiting the present invention.

The depth convolutional neural networks training method of the embodiment of the present invention and device utilize transfer learning method and model pressure The mutual supplement with each other's advantages of contracting technology, during extensive set of source data to the transfer learning of small-scale target data set, enters to dcnn Row model compression and beta pruning, thus improving transfer learning ability, to reduce over-fitting on small-scale target data set for the dcnn Risk and deployment difficulty, improve its Forecasting recognition rate.

Specifically, refer to Fig. 1, be the flow chart of the depth convolutional neural networks training method of the embodiment of the present invention.This The depth convolutional neural networks training method of inventive embodiments comprises the following steps:

Step 100: in extensive set of source data, pre-training is carried out to dcnn, and model beta pruning is carried out to this dcnn, obtain To rarefaction dcnn；

In step 100, pre-training being carried out to dcnn particularly as follows: using extensive set of source data, being calculated by back propagation Method and gradient descent method, carry out pre-training to a dcnn.Model beta pruning is carried out to dcnn particularly as follows: using beta pruning-retraining Iterative strategy carry out model beta pruning.Each iteration is divided into two steps, and the first step is model beta pruning, by significance in this dcnn (as absolute value) relatively low Model Weight parameter is set to zero.Thus, these parameters corresponding neutral net connection will no longer Dcnn works, network structure becomes sparse, and then reach model beta pruning effect.The second step of iteration is that model is instructed again Practice, the dcnn after beta pruning is trained using back-propagation algorithm and gradient descent method, enters just for the model parameter not being zeroed out Row training.By the process of such a iteration for several times, realize on the premise of dcnn classification performance is not affected, as many as possible The connection of network is deleted on ground, with rarefaction network, reduces model redundancy.

Step 200: the Explicit Knowledge of target data set and the tacit knowledge of set of source data are utilized on the dcnn of beta pruning Carry out transfer learning, dcnn is transferred to aiming field；

In step 200, transfer learning is carried out on the dcnn of beta pruning and specifically includes following steps:

Step 201: modification rarefaction dcnn output layer be target data set classification, and by output layer and near The full articulamentum of output layer reverts to densification, and the Model Weight parameter of the full articulamentum near output layer is carried out at random just Beginningization；

In step 201, need to carry out the fine and close number recovering with the full articulamentum reinitializing non-constant, it is The figure of merit changes because of the difference of the factors such as task, neural network structure.Carry out the full articulamentum of this operation in the embodiment of the present invention Number be preferably two-layer.

Step 202: refine the tacit knowledge with regard to target data set in set of source data, dominant using small-scale data set Knowledge and its tacit knowledge in source data set, the dcnn of fine setting training rarefaction, realizes transfer learning；

In step 202., in order to improve model prediction performance, the embodiment of the present invention outside target data set, by source number It is introduced into transfer learning according to concentrating the tacit knowledge with regard to target data set.Specifically, the embodiment of the present invention is to dcnn model Carry out following modification:

1st, using amended rarefaction dcnn as trunk model；Target data is inputted this trunk model, output layer meeting Export the prediction probability about target data.

2nd, using pre-training dcnn in set of source data as tacit knowledge reference model；Target data is inputted this with reference to mould Type, output layer can export soft labels and (add temperature parameter t) in softmax function, this soft labels corresponds to source data class Other information, it contains the tacit knowledge in source data with regard to target data.

3rd, replicated by the output layer in described tacit knowledge reference model and near the full articulamentum of output layer, As the additional branches of described trunk model, described additional branches are placed on the respective layer of trunk model.This place is replicated Full articulamentum the number of plies should with step 201 in carry out the fine and close number of plies phase recovered with the full articulamentum reinitializing operation With.Target data is inputted and after trunk model, passes through this additional branches, its output layer can export tacit knowledge in relevant source data Soft prediction probability (in softmax function add temperature parameter t).

4th, compared with the correspondence markings of target training set using the prediction of trunk model, design main loss function；Using volume The prediction of outer branch is compared with the corresponding output of tacit knowledge reference model, designs extraneoas loss function, this extraneoas loss function It is mainly used in extracting the tacit knowledge with regard to target data in source data from reference model；Total losses function is main loss function Weighted sum with extraneoas loss function；Using described total losses function, to trunk model and additional branches on target data set Carry out model training using back-propagation algorithm, realize transfer learning.

Step 300: carry out model compression using the dcnn after the migration of small-scale target data set pair；

In step 300, after the dcnn of rarefaction transfers to aiming field by transfer learning, using small-scale number of targets It is compressed according to the dcnn after set pair migration, so that rarefaction dcnn generating reduces the redundancy on aiming field further, carry Predictive ability on target data set for the high model.Specifically, due to finally only needing to using trunk model to target detection collection It is predicted assessing, so the embodiment of the present invention carries out model compression just for trunk model.In Compression Strategies and step 100 The iterative strategy of beta pruning-retraining is similar to, but in each iteration, beta pruning is only carried out in trunk model, and retraining is then Connection and extra branch using the non-zero setting in total losses function pair trunk model carry out parameter learning.

After completing beta pruning-retraining, cut the partial nerve unit of full articulamentum in trunk model, thus compressing further Scale of model.Specifically, compress mode includes: randomly from target training set sample drawn subset as trunk model Input, obtains activation value on certain full articulamentum for these sample drawns with this.For this full articulamentum, cut notable first Property (as average activation value) relatively low neuron, then carries out retraining using total losses function, repeatedly completes mould with this iteration Type compresses.

Refer to Fig. 2, be the structural representation of the depth convolutional neural networks training devicess of the embodiment of the present invention.The present invention The depth convolutional neural networks training devicess of embodiment include model pre-training module, transfer learning module and model compression mould Block.

Model pre-training module: for pre-training being carried out to dcnn in extensive set of source data, and this dcnn is carried out Model beta pruning, obtains rarefaction dcnn；Wherein, model pre-training module carries out pre-training to dcnn particularly as follows: using extensive Set of source data, by back-propagation algorithm and gradient descent method, carries out pre-training to dcnn.Model pre-training module is entered to dcnn Row model beta pruning is particularly as follows: carry out model beta pruning using the iterative strategy of beta pruning-retraining.Each iteration is divided into two steps, and first Step is model beta pruning, and Model Weight parameter relatively low for significance (as absolute value) in this dcnn is set to zero.Thus, this A little corresponding neutral nets of parameter connect and will no longer work in dcnn, and then reach model beta pruning effect.The second of iteration Step is model retraining, trains the dcnn after beta pruning using back-propagation algorithm and gradient descent method, that is, just for not being zeroed out Model parameter be trained.By the process of such a iteration for several times, realize before dcnn classification performance is not affected Put, delete the connection of network as much as possible, with rarefaction network, reduce model redundancy.

Transfer learning module: for being entered using the tacit knowledge of target data set and set of source data on the dcnn of beta pruning Row transfer learning, dcnn is transferred to aiming field；Specifically, transfer learning module includes model modification unit and model fine setting is single Unit；

Model modification unit: the output layer for changing rarefaction dcnn is the classification of target data set, and by output layer And revert to densification, and the Model Weight parameter to the full articulamentum near output layer near the full articulamentum of output layer Carry out random initializtion；

In embodiments of the present invention, in order to improve model prediction performance, the embodiment of the present invention, will outside target data set Source data set is introduced into transfer learning with regard to the tacit knowledge of target data set.Specifically, the embodiment of the present invention is to dcnn Model carries out following modification:

3rd, replicated by the output layer in described tacit knowledge reference model and near the full articulamentum of output layer, As the additional branches of described trunk model, described additional branches are placed on the respective layer of trunk model.By target data Pass through this additional branches, its output layer can export the soft prediction probability about tacit knowledge in source data after input trunk model (in softmax function, add temperature parameter t).

Model compression module: for carrying out model compression using the dcnn after the migration of small-scale target data set pair；Wherein, After the dcnn of rarefaction transfers to aiming field by transfer learning, entered using the dcnn after the migration of small-scale target data set pair Row compression, so that rarefaction dcnn generating reduces the redundancy on aiming field further, improves model on target data set Predictive ability.Specifically, due to finally only needing to target detection collection is predicted assess using trunk model, so this Bright embodiment carries out model compression just for trunk model.The iterative strategy of the beta pruning-retraining in Compression Strategies and step 100 Similar, but in each iteration, beta pruning is only carried out in trunk model, and retraining is then using total losses function pair trunk mould The connection of non-zero setting in type and extra branch carry out parameter learning.

In order to prove that the present invention's is practical, we are tested using the scene Recognition task with extensive using value Card.In experiment, imagenet ilsvrc12 object image data collection (comprised million width images) is used as extensive source number According to collection, (comprise 15,620 width figures using mit indoor scene recognition database scene image data storehouse Picture) as small-scale target data set.In addition, selecting wide variety of alexnet model (5 layers of convolutional layer, 3 layers of full articulamentum) [krizheyshky et al., 2012], is verified as dcnn model.With regard in the embodiment of the present invention, there is highest respectively Can be inquired into the model of highest compression ratio, result is as shown in Table 1.Each group is tested all using mit indoor scene The standard testing collection of recognition database carries out accuracy assessment.

The scene Recognition accuracy of table 1 each group experiment and compression ratio

As shown in Table 1, more traditional method for trimming, the present invention not only can in transfer learning significantly compression depth Neutral net, to reduce over-fitting risk on small-scale target data set for the dcnn and deployment difficulty, but also can improve The Forecasting recognition accuracy rate of dcnn after transfer learning.As can be seen here, the present invention be one practicable for small-scale data The high-performance depth convolutional neural networks training method of collection.

The depth convolutional neural networks training method of the embodiment of the present invention and device utilize transfer learning method and model pressure The mutual supplement with each other's advantages of contracting technology, during extensive set of source data to the transfer learning of small-scale target data set, enters to dcnn Row model compression and beta pruning, thus improving transfer learning ability, to reduce over-fitting on small-scale target data set for the dcnn Risk and deployment difficulty, improve predictive ability on target data set for the model.The compression dcnn being obtained by the present invention, can fit For the high technology industry field of the calculating such as mobile terminal, embedded equipment, robot and constrained storage, there is higher economy real With being worth.

Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the present invention. Multiple modifications to these embodiments will be apparent from for those skilled in the art, as defined herein General Principle can be realized without departing from the spirit or scope of the present invention in other embodiments.Therefore, the present invention It is not intended to be limited to the embodiments shown herein, and be to fit to and principles disclosed herein and features of novelty phase one The scope the widest causing.

Claims

1. a kind of depth convolutional neural networks training method is it is characterised in that comprise the following steps:

Step b: transfer learning is carried out on the dcnn of beta pruning；

2. depth convolutional neural networks training method according to claim 1 is it is characterised in that in described step a, institute State and pre-training is carried out to dcnn particularly as follows: using extensive set of source data, by back-propagation algorithm and gradient descent method, to institute State dcnn and carry out pre-training；Described model beta pruning is carried out to dcnn particularly as follows: carrying out mould using the iterative strategy of beta pruning-retraining Type beta pruning, each iteration is divided into two steps, and the first step is model beta pruning, by Model Weight parameter relatively low for significance in this dcnn It is set to zero；Second step is model retraining, trains the dcnn after beta pruning using back-propagation algorithm and gradient descent method, obtains dilute Thinization dcnn.

3. depth convolutional neural networks training method according to claim 2 is it is characterised in that in described step b, institute State and transfer learning is carried out on the dcnn of beta pruning specifically include:

Step b1: the output layer changing described rarefaction dcnn is the classification of target data set, by output layer and near defeated The full articulamentum going out layer reverts to densification, and the Model Weight parameter of the described full articulamentum near output layer is carried out at random Initialization；

Step b2: refine the tacit knowledge with regard to target data set in set of source data, using the Explicit Knowledge of small-scale data set And its tacit knowledge in source data set, the dcnn of fine setting training rarefaction, realize transfer learning.

4. depth convolutional neural networks training method according to claim 3 is it is characterised in that in described step b, institute State and transfer learning is carried out on the dcnn of beta pruning specifically also include:

Step b3: using amended rarefaction dcnn as trunk model；

Step b5: carry out again by the output layer in described tacit knowledge reference model and near the full articulamentum of output layer System, as the additional branches of described trunk model, described additional branches is placed on the respective layer of trunk model；

Step b6: the prediction using trunk model is compared with the correspondence markings of target training set, designs main loss function；Using volume The prediction of outer branch is compared with the corresponding output of tacit knowledge reference model, designs extraneoas loss function；Total losses function is main Loss function and the weighted sum of extraneoas loss function；Using described total losses function, on target data set to trunk model and Additional branches carry out model training using back-propagation algorithm, realize transfer learning.

5. depth convolutional neural networks training method according to claim 4 is it is characterised in that in described step c, institute State and the dcnn after migration is carried out by model compression specifically includes: first by the iterative strategy of beta pruning-retraining, in described trunk Carry out beta pruning, connection and extra branch using the non-zero setting in total losses function pair trunk model carry out parametrics in model Practise；Then from target training set, sample drawn subset, as the input of trunk model, obtains described sample drawn subset at random Activation value on full articulamentum, is cut the relatively low neuron of significance, and carries out retraining using total losses function, changed with this In generation, repeatedly completes model compression.

6. a kind of depth convolutional neural networks training devicess are it is characterised in that include:

Model pre-training module: for pre-training being carried out to dcnn in extensive set of source data, and mould is carried out to described dcnn Type beta pruning；

7. depth convolutional neural networks training devicess according to claim 6 are it is characterised in that described model pre-training mould Block carries out pre-training to dcnn particularly as follows: using extensive set of source data, by back-propagation algorithm and gradient descent method, to institute State dcnn and carry out pre-training；Described model beta pruning is carried out to dcnn particularly as follows: carrying out mould using the iterative strategy of beta pruning-retraining Type beta pruning, each iteration is divided into two steps, and the first step is model beta pruning, by Model Weight parameter relatively low for significance in this dcnn It is set to zero；Second step is model retraining, trains the dcnn after beta pruning using back-propagation algorithm and gradient descent method, obtains dilute Thinization dcnn.

8. depth convolutional neural networks training devicess according to claim 7 are it is characterised in that described transfer learning module Including:

Model modification unit: for change described rarefaction dcnn output layer be target data set classification, by output layer with And revert to densification near the full articulamentum of output layer, and the Model Weight parameter of the full articulamentum near output layer is entered Row random initializtion；

Model fine-adjusting unit: for refining the tacit knowledge with regard to target data set in set of source data, using small-scale data set Explicit Knowledge and its tacit knowledge in source data set, fine setting training rarefaction dcnn, realize transfer learning.

9. depth convolutional neural networks training devicess according to claim 8 are it is characterised in that described transfer learning module Transfer learning is carried out on the dcnn of beta pruning specifically also include: using amended rarefaction dcnn as trunk model；By source Pre-training dcnn on data set is as tacit knowledge reference model；By the output layer in described tacit knowledge reference model and Full articulamentum near output layer is replicated, and as the additional branches of described trunk model, described additional branches is placed in On the respective layer of trunk model；Prediction using trunk model is compared with the correspondence markings of target training set, designs main loss Function；Compared with the corresponding output of tacit knowledge reference model using the prediction of additional branches, design extraneoas loss function；Total damage Lose the weighted sum that function is main loss function and extraneoas loss function；Using described total losses function, right on target data set Trunk model and additional branches carry out model training using back-propagation algorithm, realize transfer learning.

10. depth convolutional neural networks training devicess according to claim 9 are it is characterised in that described model compression mould Block carries out model compression to the dcnn after migration and specifically includes: first by the iterative strategy of beta pruning-retraining, in described trunk Carry out beta pruning, connection and extra branch using the non-zero setting in total losses function pair trunk model carry out parametrics in model Practise；Then from target training set, sample drawn subset, as the input of trunk model, obtains described sample drawn subset at random Activation value on full articulamentum, is cut the relatively low neuron of significance, and carries out retraining using total losses function, changed with this In generation, repeatedly completes model compression.