CN107506775A

CN107506775A - model training method and device

Info

Publication number: CN107506775A
Application number: CN201610421438.6A
Authority: CN
Inventors: 张默
Original assignee: Beijing Moshanghua Technology Co Ltd
Current assignee: Beijing Moshanghua Technology Co Ltd
Priority date: 2016-06-14
Filing date: 2016-06-14
Publication date: 2017-12-22

Abstract

This application discloses a kind of model training method and device.Methods described includes：Obtain source domain deep learning model；Obtain the training data of target domain；Using the training data of the target domain, the weight parameter of the source domain deep learning model is adjusted, obtains target domain deep learning model；Using the target domain deep learning model, the data characteristics of the training data is extracted；K arest neighbors disaggregated models are trained using the data characteristics, obtain identification model.The embodiment of the present application reduces model training cost, ensure that model accuracy, while reduces over-fitting risk.

Description

Model training method and device

Technical field

The application belongs to technical field of data recognition, specifically, is related to a kind of based on neighbouring general of deep learning and K Taxonomy model.

Background technology

In actual applications, it usually needs the data such as image, sound, text are identified, to be entered according to recognition result The corresponding operation of row.Such as image recognition is carried out to view data, to identify image category, realize to image classification；To sound Data carry out voice recognition, to determine age of user, sex etc..

At present, the data such as image, sound, text are identified with what is typically realized using identification model, therefore first Need to train identification model.

By taking image recognition as an example, identification model is Image Classifier, when carrying out Image Classifier training, it is necessary to obtain sample This image, and the characteristics of image of sample image is extracted, then Image Classifier is identified using characteristics of image.And in order to carry The comprehensive and accuracy of hi-vision feature representation, the characteristics of image of characteristics of image generally use deep learning model extraction, because This obtains Image Classifier, it is necessary to train deep learning model first, but the training of deep learning model is usual to train Substantial amounts of training data is needed, but the collection of a large amount of training datas is typically time-consuming and expensive, and if using compared with decimal The deep learning model of amount, cause that the deep learning model that training obtains is inaccurate again, cause Image Classifier also inaccurate, and Over-fitting risk be present.

The content of the invention

In view of this, technical problems to be solved in this application there is provided a kind of model training method and device, solve Training pattern is inaccurate in the prior art, and the technical problem of over-fitting risk be present.

In order to solve the above-mentioned technical problem, the application has opened a kind of model training method, including：

Obtain source domain deep learning model；

Obtain the training data of target domain；

Using the training data of the target domain, the weight parameter of the source domain deep learning model is adjusted, is obtained Target domain deep learning model；

Using the target domain deep learning model, the data characteristics of the training data is extracted；

K arest neighbors disaggregated models are trained using the data characteristics, obtain identification model.

Preferably, described using data characteristics training K arest neighbors disaggregated models, obtaining identification model includes：

The data characteristics is subjected to Feature Dimension Reduction, obtains low dimensional feature；

Using the low dimensional features training K arest neighbors disaggregated models, identification model is obtained.

Preferably, the training data using the target domain, the power of the source domain deep learning model is adjusted Weight parameter, obtaining target domain deep learning model includes：

Using the weight parameter of the source domain deep learning model as initial parameter；

It is set lower than the learning rate of default rate；

Training data and the learning rate using the target domain, adjust the source domain deep learning model Weight parameter, obtain target domain deep learning model.

Preferably, the training data using the target domain and the learning rate, it is deep to adjust the source domain The weight parameter of learning model is spent, obtaining target domain deep learning model includes：

Training data and the learning rate using the target domain, according to the adjustment number less than preset times, Repetition adjusts the weight parameter of the source domain deep learning model, obtains target domain deep learning model.

Preferably, the acquisition source domain deep learning model includes：

Obtain the source domain deep learning model with target domain categorical match.

A kind of model training apparatus, including：

Model module is obtained, for obtaining source domain deep learning model；

Data module is obtained, for obtaining the training data of target domain；

Learning model training module, for the training data using the target domain, adjust the source domain depth The weight parameter of model is practised, obtains target domain deep learning model；

Characteristic extracting module.For utilizing the target domain deep learning model, the data of the training data are extracted Feature；

Identification model training module, for using data characteristics training K arest neighbors disaggregated models, obtaining identification mould Type.

Preferably, the identification model training module includes：

Dimensionality reduction unit, for the data characteristics to be carried out into Feature Dimension Reduction, obtain low dimensional feature；

Identification model training unit, for utilizing the low dimensional features training K arest neighbors disaggregated models, obtain identification mould Type.

Preferably, the learning model training module includes：

Parameter set unit, for using the weight parameter of the source domain deep learning model as initial parameter, and set The fixed learning rate less than default rate；

Learning model training unit, for the training data using the target domain and the learning rate, adjust institute The weight parameter of source domain deep learning model is stated, obtains target domain deep learning model.

Preferably, the learning model training unit is specifically used for utilizing the training data of the target domain and described Learning rate, according to the adjustment number less than preset times, repetition adjusts the weight parameter of the source domain deep learning model, obtained Obtain target domain deep learning model.

Preferably, the model module that obtains is specifically used for obtaining the source domain deep learning with target domain categorical match Model.

Compared with prior art, the application can be obtained including following technique effect：

Using source domain deep learning model and target domain training data, by transfer learning, training obtains target neck Domain deep learning model.Source domain deep learning model is the deep learning model by training up acquisition, therefore by moving Study is moved, target domain is in the case of the training data of selection of small quantity, by adjusting source domain deep learning model Weight parameter, it is also possible to obtain by the target domain deep learning model trained up.And then utilize the target domain depth Learning model extracts data characteristics, the training of model is identified, identification model is selected K arest neighbors disaggregated models, can reduced Over-fitting risk, therefore the embodiment of the present application realizes the model training based on limited training data, both reduces model training Cost, ensure that the degree of accuracy of model, reduce over-fitting risk again.

Certainly, implementing any product of the application must be not necessarily required to reach all the above technique effect simultaneously.

Brief description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings：

Fig. 1 is a kind of flow chart of model training method one embodiment of the embodiment of the present application；

Fig. 2 is the flow chart of one embodiment of the deep learning model training process of the embodiment of the present application；

Fig. 3 is a kind of flow chart of model training apparatus one embodiment of the embodiment of the present application.

Embodiment

Presently filed embodiment is described in detail below in conjunction with drawings and Examples, and thereby how the application is applied Technological means can fully understand and implement according to this to solve technical problem and reach the implementation process of technical effect.

The technical scheme of the embodiment of the present application is mainly used in data identification, particularly image recognition, voice recognition, wants Realize that data are identified, it is necessary to an identification model first be trained, by the data characteristics extracted input identification model progress data knowledge Not.

In order to reduce the cost for collecting training data, the over-fitting for alleviating identification model, inventor has found that can With using the training data by the source domain deep learning model that trains up and lesser amt, by transfer learning obtain through The deep learning model for training up ground target domain is crossed, so as to extract data characteristics, can be solved because needing to obtain largely Training data and caused by it is costly and time consuming the problem of, then data characteristics is input in K arest neighbors disaggregated models, trained To identification model, the over-fitting problem of identification model is alleviated.

Therefore, inventor proposes the technical scheme of the application, in the embodiment of the present application, by obtaining source domain depth Practise the training data of model and target domain；Using the training data of the target domain, the source domain deep learning is adjusted The weight parameter of model, obtain target domain deep learning model；Using the target domain deep learning model, described in extraction The data characteristics of training data；K arest neighbors disaggregated models are trained using data characteristics, obtain identification model.The embodiment of the present application, By transfer learning, can train to obtain merely with the training data and source domain deep learning model of lesser amt target domain The deep learning model of target domain, even if realizing using limited training data, can also obtain accurate deep learning model and Identification model, reduce the cost for collecting data while ensure that the accuracy of model；K arest neighbors disaggregated models are recycled, are kept away Exempt from because using lesser amt training deep learning model and caused by over-fitting problem.

Technical scheme is described in detail below in conjunction with accompanying drawing.

Fig. 1 is a kind of flow chart of model training method one embodiment of the embodiment of the present application, and this method can include Following steps：

101：Obtain source domain deep learning model.

The deep learning model of source domain can select a variety of methods to train to obtain, can be with selected depth convolutional Neural net Network, AutoEncoder (a kind of unsupervised learning algorithm) or DBM (Deep Boltzmann Machine, depth Boltzmann Machine) etc..

Below by taking depth convolutional neural networks as an example, to being illustrated for the deep learning model of source domain.

Assuming that the configuration of source domain depth convolutional neural networks is as shown in Fig. 2 mainly include 2 convolution (convolution) layer：Convolution1 and convolution2,5 pond (pooling) layers：Pooling1~ Pooling5,9 beginning (Inception) layers：Inception1~Inception9,3 full connection (full- Connection) layer：Full-connection1~full-connection3, and 3 softmax layers：Softmax1~ Softmax3,1 loss (Dropout) layer：Dropout1.Wherein, softmax1, softmax2 layer, full- are added Connection1 and full-connection2 layers are primarily to prevent BP (Back Propagation) training gradient from declining Subtract.

In order to obtain more accurately deep learning model, parameters weighting can be initialized using random number, LearningRate (learning rate) is arranged to a less number, such as LearningRate=0.01, allows model to restrain faster. When nicety of grading is stable, turns LearningRate down and continue to train, until model converges to a good value.Training is completed The weight parameter for obtaining depth convolutional neural networks afterwards is deep learning model.

It should be noted that Fig. 2 is only a kind of possible deep neural network, the application is not limited to this.As long as Any deep neural network that data characteristics can be extracted all should be in the protection domain of the application.

102：Obtain the training data of target domain.

Because source domain deep learning model is to utilize a large amount of source domain data, carry out training up acquisition, therefore this Apply for that embodiment can be based on source domain deep learning model and carry out transfer learning, select the target domain data of lesser amt. For example, the training data for obtaining source domain is the 1000000 image labeling data for having mark altogether for including 1000 classification Collection, image labeling can be the information such as the age of face, sex, and the training data of target domain can be simply to select comprising 7 10,000 view data for having mark of individual classification.

103：Using the training data of the target domain, the weight parameter of the source domain deep learning model is adjusted, Obtain target domain deep learning model.

The weight parameter of source domain deep learning model can be as the initial parameter of target domain deep learning model, it Afterwards, using the training data of target domain, initial parameter is adjusted, you can to obtain target domain deep learning model.

Wherein, the weight parameter of the source domain deep learning model is adjusted, can be real by way of regularized learning algorithm rate It is existing, it is less using setting, the learning rate of such as less than default rate, the weight parameter of source domain deep learning model is adjusted It is whole, reduce every time or increase weight parameter * learning rates, until default regularization condition is restrained or met to model.

It is of course also possible to set adjusting parameter, weight parameter is increased or decreased into the adjusting parameter every time, until model Convergence meets default regularization condition.

Wherein, the adjustment each time to weight parameter is in the upper weight ginseng once adjusted after the adjustment obtained afterwards Adjusted again on the basis of number.

104：Using the target domain deep learning model, the data characteristics of the training data is extracted.

Training data is inputted into target domain deep learning model, will be every by multilayer convolutional layer and multilayer pond layer One training data carries out multilayer convolutional calculation and multilayer pond, extracts the data characteristics of the training data, extracts Data characteristics there is stronger robustness.

105：Using data characteristics training K arest neighbors disaggregated model (k-NearestNeighbor, KNN), known Other model.

The data characteristics that target domain deep learning model extraction goes out is input in K arest neighbors disaggregated models and instructed Practice, obtain identification model.Because K arest neighbors disaggregated models are nonparametric models, therefore, K arest neighbors disaggregated model training is utilized The over-fitting risk of identification model can be reduced.

Wherein, in order to be identified model, in addition to it can use K arest neighbors disaggregated models, it can also use and be based on K Other distorted patterns of arest neighbors disaggregated model etc..

K arest neighbors disaggregated model basic ideas are：If the k in feature space, a sample is most like (namely special It is closest in sign space) sample in it is most of belong to some classification, then this sample falls within this classification.K is nearest In adjacent disaggregated model, selected neighbours are the objects correctly classified, and the model is on categorised decision only according to most adjacent The classifications of one or several near samples determines the classification belonging to sample to be divided, and this model is simple, it can be readily appreciated that being easy to Realize, without estimating parameter.

In the present embodiment, on the basis of the source domain deep learning model trained, led merely with lesser amt target Domain training data is the deep learning model that can obtain the target domain by training up, and is realized in limited training data In the case of, accurate target domain deep learning model can be obtained by transfer learning, improve target domain deep learning The accuracy of model, reduce the cost for collecting training data；Model is identified in conjunction with K arest neighbors disaggregated models, is alleviated The over-fitting problem of model.

Wherein, due to utilizing the data characteristics dimension of target domain deep learning model extraction higher, identification model is caused The amount of calculation of training is larger, and training is slow and is difficult to obtain good performance, therefore in order to reduce the redundancy of data characteristics, enters one Step improves the model training degree of accuracy, described to utilize data characteristics training K arest neighbors classification moulds as another embodiment Type, obtaining identification model can include：

Wherein, the data characteristics is subjected to dimensionality reduction, the dimensionality reductions such as principal component analysis or linear discriminant analysis can be used Mode, to obtain low dimensional feature.

Low dimensional feature and corresponding data mark are input to K arest neighbors disaggregated models and be trained, obtains identification mould Type.

In the present embodiment, the data characteristics of higher-dimension is subjected to dimension-reduction treatment and obtains low dimensional feature, improves identification model Training speed.

In addition, as another embodiment, the training data using the target domain, it is deep to adjust the source domain The weight parameter of learning model is spent, obtaining target domain deep learning model can include：

It is set lower than the learning rate of default rate；

Wherein, the weight parameter of the source domain deep learning model is adjusted, until default adjust is restrained or met to model During shelf-regulating Conditions, you can to obtain target domain deep learning model.

Because if adjustment number is excessive, over-fitting risk can be also improved, therefore in order to further reduce over fitting risk, Default regularization condition can be that adjustment number is less than preset times.

Therefore, as another embodiment, the training data using the target domain and the learning rate, adjust The weight parameter of the whole source domain deep learning model, obtaining target domain deep learning model can include：

In addition, in order to ensure the degree of accuracy of target domain deep learning model, source domain and target domain can be classifications Two fields of matching, such as cat and dog, flowers and plants and trees.

Therefore, the acquisition source domain deep learning model can be：

Obtain the source domain deep learning model with the source domain of the categorical match of target domain.

In actual applications, the technical scheme of the embodiment of the present application can be used for for example carrying out image knowledge to view data Not, to identify image category, realize to image classification；Voice recognition is carried out to voice data, to determine age of user, sex Deng.

A kind of structural representation of model training apparatus one embodiment that Fig. 3 provides for the embodiment of the present application, the device It can include：

Model module 301 is obtained, for obtaining source domain deep learning model.

Data module 302 is obtained, for obtaining the training data of target domain.

Because source domain deep learning model carries out using a large amount of source domain data training up acquisition, therefore this Shen Please embodiment can only select the target domain data of lesser amt, transfer learning is carried out based on source domain deep learning model.

Learning model training module 303, for the training data using the target domain, adjust the source domain depth The weight parameter of learning model, obtain target domain deep learning model.

The learning model training module is used for deep using the weight parameter of source domain deep learning model as target domain The initial parameter of learning model is spent, afterwards, using the training data of target domain, initial parameter is adjusted.

The learning model training module is also used for being adjusted parameter setting, and weight parameter is increased or decreased into institute every time Adjusting parameter is stated, until default regularization condition is restrained or met to model, you can to obtain target domain deep learning model.

Characteristic extracting module 304.For utilizing the target domain deep learning model, the number of the training data is extracted According to feature.

Training data is inputted into target domain deep learning model, and multilayer convolutional calculation is carried out to each training data And multilayer pond, characteristic extracting module are used for the data characteristics for extracting the training data.

Identification model training module 305, for using data characteristics training K arest neighbors disaggregated models, being identified Model.

The data characteristics that characteristic extracting module is extracted is input in identification model training module, utilizes K arest neighbors point Class model is trained, and obtains identification model.The over-fitting risk of identification model can be reduced using K arest neighbors disaggregated model.

Wherein, model is identified by training, K is based on except K arest neighbors disaggregated model can be used to use Distorted pattern of arest neighbors disaggregated model etc..

In the present embodiment, on the basis of the source domain deep learning model trained, merely with lesser amt target domain Training data is the deep learning model that can obtain target domain.Realize in the case of limited training data, pass through migration Study can obtain accurate target domain deep learning model, improve the accuracy of target domain deep learning model, drop The low cost for collecting training data；Data identification model is obtained in conjunction with K arest neighbors disaggregated models, crossing for model is alleviated and intends Conjunction problem.

Wherein, due to higher using the data characteristics dimension of target domain deep learning model extraction.In order to reduce data The redundancy of feature, model training speed and the degree of accuracy are further improved, as another embodiment, the identification model trains mould Block includes：

Wherein, dimensionality reduction unit can specifically use the methods of principal component analysis or linear discriminant analysis and realize data characteristics Dimensionality reduction, obtain low dimensional feature.

The data characteristics that characteristic extracting module is extracted is inputted into the dimensionality reduction unit of identification model training module, is obtained Low dimensional feature.Low dimensional feature and corresponding data mark are input in identification model training unit again, utilize K arest neighbors Disaggregated model is trained, and obtains identification model.

In the present embodiment, low dimensional feature is obtained using dimensionality reduction unit, improves the training speed of identification model.

In addition, as another embodiment, target domain training data is input in learning model training module and carried out Training obtains target domain deep learning model, and the learning model training module includes：

Wherein, after parameter set unit sets target domain deep learning model initial weight parameter, study is utilized Model training unit adjusts the weight parameter of the source domain deep learning model, until default adjustment is restrained or met to model During condition, you can to obtain target domain deep learning model.

, can be with the adjustment number of control weight parameter in order to reduce over-fitting risk, default regularization condition can be adjustment Number is less than preset times.

Therefore, it is specifically used for the training using the target domain as another embodiment, learning model training unit Data and the learning rate, according to the adjustment number less than preset times, repetition adjusts the source domain deep learning model Weight parameter, obtain target domain deep learning model.

In addition, when source domain and target domain have categorical match relation, such as cat and dog, flowers and plants and trees, obtain Target domain deep learning model have the preferable degree of accuracy.

Therefore, the model module that obtains is specifically used for obtaining the source domain deep learning mould with target domain categorical match Type.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.

Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

Some vocabulary has such as been used to censure specific components among specification and claim.Those skilled in the art should It is understood that hardware manufacturer may call same component with different nouns.This specification and claims are not with name The difference of title is used as the mode for distinguishing component, but is used as the criterion of differentiation with the difference of component functionally.Such as logical The "comprising" of piece specification and claim mentioned in is an open language, therefore should be construed to " include but do not limit In "." substantially " refer in receivable error range, those skilled in the art can be described within a certain error range solution Technical problem, basically reach the technique effect.In addition, " coupling " one word is herein comprising any direct and indirect electric property coupling Means.Therefore, if the first device of described in the text one is coupled to a second device, representing the first device can directly electrical coupling The second device is connected to, or the second device is electrically coupled to indirectly by other devices or coupling means.Specification Subsequent descriptions for implement the application better embodiment, so it is described description be for the purpose of the rule for illustrating the application, It is not limited to scope of the present application.The protection domain of the application is worked as to be defined depending on appended claims institute defender.

It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising, so that commodity or system including a series of elements not only include those key elements, but also including without clear and definite The other element listed, or also include for this commodity or the intrinsic key element of system.In the feelings not limited more Under condition, the key element that is limited by sentence "including a ...", it is not excluded that in the commodity including the key element or system also Other identical element be present.

Some preferred embodiments of the application have shown and described in described above, but as previously described, it should be understood that the application Be not limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and available for various other combinations, Modification and environment, and above-mentioned teaching or the technology or knowledge of association area can be passed through in application contemplated scope described herein It is modified., then all should be in this Shen and the change and change that those skilled in the art are carried out do not depart from spirit and scope Please be in the protection domain of appended claims.

Claims

A kind of 1. model training method, it is characterised in that including：

Obtain source domain deep learning model；

Obtain the training data of target domain；

Using the training data of the target domain, the weight parameter of the source domain deep learning model is adjusted, obtains target Field deep learning model；

Using the target domain deep learning model, the data characteristics of the training data is extracted；

K arest neighbors disaggregated models are trained using the data characteristics, obtain identification model.
2. according to the method for claim 1, it is characterised in that described to utilize data characteristics training K arest neighbors classification Model, obtaining identification model includes：

The data characteristics is subjected to Feature Dimension Reduction, obtains low dimensional feature；

Using the low dimensional features training K arest neighbors disaggregated models, identification model is obtained.
3. according to the method for claim 1, it is characterised in that the training data using the target domain, adjustment The weight parameter of the source domain deep learning model, obtaining target domain deep learning model includes：

Using the weight parameter of the source domain deep learning model as initial parameter；

It is set lower than the learning rate of default rate；

Training data and the learning rate using the target domain, adjust the weight of the source domain deep learning model Parameter, obtain target domain deep learning model.
4. according to the method for claim 3, it is characterised in that the training data and institute using the target domain Learning rate is stated, adjusts the weight parameter of the source domain deep learning model, obtaining target domain deep learning model includes：

Training data and the learning rate using the target domain, according to the adjustment number less than preset times, repeat The weight parameter of the source domain deep learning model is adjusted, obtains target domain deep learning model.
5. according to the method for claim 1, it is characterised in that the acquisition source domain deep learning model includes：

Obtain the source domain deep learning model with target domain categorical match.
A kind of 6. model training apparatus, it is characterised in that including：

Model module is obtained, for obtaining source domain deep learning model；

Data module is obtained, for obtaining the training data of target domain；

Learning model training module, for the training data using the target domain, adjust the source domain deep learning mould The weight parameter of type, obtain target domain deep learning model；

Characteristic extracting module.For utilizing the target domain deep learning model, the data characteristics of the training data is extracted；

Identification model training module, for using data characteristics training K arest neighbors disaggregated models, obtaining identification model.
7. device according to claim 6, it is characterised in that the identification model training module includes：

Dimensionality reduction unit, for the data characteristics to be carried out into Feature Dimension Reduction, obtain low dimensional feature；

Identification model training unit, for utilizing the low dimensional features training K arest neighbors disaggregated models, obtain identification model.
8. device according to claim 6, it is characterised in that the learning model training module includes：

Parameter set unit, for using the weight parameter of the source domain deep learning model as initial parameter, and set low In the learning rate of default rate；

Learning model training unit, for the training data using the target domain and the learning rate, adjust the source The weight parameter of field deep learning model, obtain target domain deep learning model.
9. device according to claim 8, it is characterised in that learning model training unit is specifically used for utilizing the target The training data in field and the learning rate, according to the adjustment number less than preset times, it is deep that repetition adjusts the source domain The weight parameter of learning model is spent, obtains target domain deep learning model.
10. device according to claim 6, it is characterised in that the acquisition model module is specifically used for obtaining and target The source domain deep learning model of field categorical match.