CN109409390A

CN109409390A - Deep learning classification method and device

Info

Publication number: CN109409390A
Application number: CN201710705331.9A
Authority: CN
Inventors: 侯国梁
Original assignee: Potevio Information Technology Co Ltd
Current assignee: Potevio Information Technology Co Ltd
Priority date: 2017-08-17
Filing date: 2017-08-17
Publication date: 2019-03-01

Abstract

The present invention provides a kind of deep learning classification method and device, and wherein deep learning classification method includes, step 100: being based on flag data collection D_label, the highest subsidiary classification device of recognition accuracy is selected as stand-by subsidiary classification device, is denoted as C '₁、C’₂…C’_M；Step 200: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff, C₀With stand-by subsidiary classification device to D_buffIt is identified, works as C₀When consistent and confidence level is more than predetermined value with the recognition result of any stand-by subsidiary classification device, by C₀To D_buffFlag data collection D is added in the recognition result of data and data itself_label, and by the data from Unlabeled data collection D_unlabelMiddle removal.The present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic high quality of data marks, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.

Description

Deep learning classification method and device

Technical field

The present invention relates to computer field, in particular to a kind of deep learning classification method and device.

Background technique

With the arrival of big data era, more and more data can be supplied to machine and be learnt, and deep learning is The novel sharp weapon of big data era Artificial Intelligence Development, compared to conventional machines learning method, deep learning makes artificial intelligence meter Calculation machine vision, speech recognition and natural language processing level are greatly improved, and the above problem is increased to one from unavailable It is a to can be generalized to commercial degree.

Classification problem is one of most basic problem of deep learning, and many problems of deep learning algorithm require to be based on The feature extraction of classification carries out other operations again.The classifying quality of each traditional methods, can not show a candle to deep learning.Based on depth For the method for study under " fuel " driving of big data, bottleneck effect is significantly lower than other learners, and performance can continue to rise, And after other conventional sorting methods reach some performance, learn also increase again anyway.

But big data era, the marker samples cost for obtaining high quality is very high, many professional domains, such as medical treatment interpreting blueprints Or translation, handmarking are also required to very professional level, low-quality label can not only improve machine learning effect, instead And the quality of study may be influenced.

In order to reduce the cost of full handmarking, raising handmarking's efficiency generally uses following methods:

1. developing dedicated marker software, labeling effciency is improved

2. preliminary making is carried out using traditional learner based on manual features, it is then artificial to veritify

Although the above method can effectively reduce the workload of handmarking, but still need artificial participation, consider existing Data volume explosion under stage big data form, efficiency is not enough to provide enough " fuel " to deep learning, therefore needs to solve Certainly the data under the conditions of big data mark the unsupervised learning problem of problem and deep learning method automatically.

Summary of the invention

In view of this, the present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic height of data Quality status stamp, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.

The present invention provides a kind of deep learning classification method, includes input data set D, D is by flag data collection D_labelNot Flag data collection D_unlabelIt forms, wherein D_labelAgain by training set D_ltrnWith test set D_ltstComposition, D_labelInput data x_i's Authentic signature result is y_i, meet y_i∈{t₀,t₁…t_N-1, { t₀,t₁…t_N-1In any classification be denoted as t_n(n=0,1 ... N- 1)；Deep learning classification method of the invention includes at least subsidiary classification device screening step 100 and classifier unsupervised learning walks Rapid 200；

Subsidiary classification device screening step 100 includes:

Step 101: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) establish the mapping of mode input and output Relationship, subsidiary classification device pass through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k(k=1,2 ... K, table Show that subsidiary classification device is numbered)；

Step 102: for test set D_ltstIn any data (x_i, y_i), y_i=t_n, x_iPass through h_kIdentification obtainsI.e.Meet

Step 103: comparing y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nCount is incremented；R_k,nFor adding up h_kTo any classification t_nIdentify correct number, W_k,nFor adding up h_kTo any classification t_nIdentify the number of mistake；R_k,nAnd W_k,n Initial value be 0；

Step 104: step 102 and 103 is repeated, until each h_k(k=1,2 ... K) is to D_ltstIn each data test It finishes；

Step 105: computation model h_kRecognition result in each classification t_nAccuracy rate G_k,n=R_k,n/(R_k,n+W_k,n)；

Step 106: for each classification t_n(n=0,1 ... N-1) selects the wherein highest G of accuracy rate_k,n(k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K) records each stand-by subsidiary classification Its selected foundation G of device_k,nCorresponding t_nFor its identification advantage classification, to any stand-by subsidiary classification device, { t₁, t₂… t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification；

Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device The mapping relations of mode input and output execute classifier unsupervised learning step 200；

Classifier unsupervised learning step 200 includes:

Step 201: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) establish mode input and output Mapping relations；

Step 202:C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀；

Step 203: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buffIn all numbers According to x_i(i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.Meet And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ... L D ') is formed_buff；

Step 204: stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrnAnd D '_buffWhat study obtained Respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M)；

Step 205: passing through h '_m(m=1,2 ... M) identifies D '_buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence；

Step 206: enabling D_addFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is added_addSet；

Step 207: by D_addCollection is incorporated to D_ltrn, while by D_addThe x of collection_iFrom D_unlabelMiddle removal, returns to step 202.

The present invention also provides a kind of deep learning sorters, comprising: data module, is divided at subsidiary classification device screening module Class device unsupervised learning module；

Data module: including input data set D, D is by flag data collection D_labelWith Unlabeled data collection D_unlabelComposition, Middle D_labelAgain by training set D_ltrnWith test set D_ltstComposition, D_labelInput data x_iAuthentic signature result be y_i, meet y_i ∈{t₀,t₁…t_N-1, { t₀,t₁…t_N-1In any classification be denoted as t_n(n=0,1 ... N-1)；

Subsidiary classification device screening module includes:

Subsidiary classification device study module: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) establish mode input with The mapping relations of output, subsidiary classification device pass through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k(k= 1,2 ... K indicates subsidiary classification device number)；

Subsidiary classification device identification module: for test set D_ltstIn any data (x_i, y_i), y_i=t_n, x_iPass through h_kKnow It does not obtainI.e.Meet

Recognition result counting module: compare y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nCount is incremented； R_k,nFor adding up h_kTo any classification t_nIdentify correct number, W_k,nFor adding up h_kTo any classification t_nTime of identification mistake Number；R_k,nAnd W_k,nInitial value be 0；

Judgment module: repeating subsidiary classification device identification module and recognition result counting module, until each h_k(k=1,2 ... K) to D_ltstIn each data be completed；

Accuracy rate computing module: computation model h_kRecognition result in each classification t_nAccuracy rate G_k,n=R_k,n/(R_k,n+ W_k,n)；

Stand-by subsidiary classification device determining module: for each classification t_n(n=0,1 ... N-1) selects wherein accuracy rate highest G_k,n(k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K), and remember Record its selected foundation G of each stand-by subsidiary classification device_k,nCorresponding t_nFor the advantage classification of its identification, to any stand-by auxiliary Classifier is helped, { t₁, t₂…t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification it is bad Gesture classification；

Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the identification of stand-by subsidiary classification device Classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module；

Classifier unsupervised learning module includes:

Main classification device mapping block: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) to establish model defeated Enter the mapping relations with output；

Main classification device study module: C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀；

Main classification device identification module: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buff In all data x_i(i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According to(i=1,2 ... L) forms D '_buff；

Stand-by subsidiary classification device study module: stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrn And D '_buffLearning obtained respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M)；

Stand-by subsidiary classification device identification module: pass through h '_m(m=1,2 ... M) identifies D '_buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence；

Comparison module: D is enabled_addFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is added_addSet；

Data set changes module: by D_addCollection is incorporated to D_ltrn, while by D_addThe x of collection_iFrom D_unlabelMiddle removal is returned and is executed Main classification device study module.

The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, constantly accumulates high quality reference numerals Learning ability can be improved jointly according to amount, while by the continuous learning to flag data, Main classification device and subsidiary classification device.

Detailed description of the invention

Fig. 1 is the flow chart of deep learning classification method of the present invention；

Fig. 2 is the flow chart of S100 in Fig. 1；

Fig. 3 is the flow chart of S200 in Fig. 1；

Fig. 4 is the structural schematic diagram of deep learning sorter of the present invention；

Fig. 5 is the structural schematic diagram of 500 modules in Fig. 4；

Fig. 6 is the structural schematic diagram of 600 modules in Fig. 4.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

It include input data set D, deep learning classifier C the present invention relates to object₀And multiple subsidiary classification devices are C₁、 C₂……C_K。

Input data set D is divided into the D of label_labelWith unlabelled D_unlabel, and meet D_label<<D_unlabel, wherein D_labelIt is divided into training set D again_ltrnWith test set D_ltst, meet D_ltrn:D_ltst>=9:1, and maintain test set D_ltstIt is constant.

For D_labelIn any data (x_i,y_i), wherein subscript i represents data sequence number, y_iFor x_iAuthentic signature knot Fruit marks the collection of result to be combined into { t₀,t₁…t_N-1, including N kind category label is as a result, meet yi ∈ { t₀,t₁…t_N-1, { t₀, t₁…t_N-1In any classification be denoted as t_n(n=0,1 ... N-1).

Correspondingly, D_unlabelIn data be not yet marked, only x_i, without marking result y accordingly_i, D_unlabelIn Data can be used classifier and be identified, meet recognition result ∈ { t₀,t₁…t_N-1}。

Multiple subsidiary classification devices are C₁、C₂…C_K, it is except deep learning classifier C₀Except other classifiers, including line Property recurrence, decision tree, supporting vector, bayes method, neural network etc..

Deep learning classification method of the invention, as shown in Figure 1, comprising: step 100: the screening of subsidiary classification device and step 200: classifier unsupervised learning step.

As shown in Fig. 2, step 100: subsidiary classification device screening step includes step 101 to step 107.

Step 101: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) establish the mapping of mode input and output Relationship, subsidiary classification device pass through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k(k=1,2 ... K, table Show that subsidiary classification device is numbered).

Step 102: for test set D_ltstIn any data (x_i, y_i), y_i=t_n∈{t₀,t₁…t_N-1, x_iPass through h_k Identification obtainsI.e.Meet

Step 103: comparing y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nCount is incremented；R_k,nFor adding up h_kTo any classification t_nIdentify correct number, W_k,nFor adding up h_kTo any classification t_nIdentify the number of mistake；R_k,nAnd W_k,n Initial value be 0.

Step 104: step 102 and 103 is repeated, until each h_k(k=1,2 ... K) is to D_ltstIn each data test It finishes.

Step 105: computation model h_kRecognition result in each classification t_nAccuracy rate G_k,n=R_k,n/(R_k,n+W_k,n)。

Step 106: for each classification t_n(n=0,1 ... N-1) selects the wherein highest G of accuracy rate_k,n(k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K) records each stand-by subsidiary classification Its selected foundation G of device_k,nCorresponding t_nFor its identification advantage classification, to any stand-by subsidiary classification device, { t₁, t₂… t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification.

For example, for subsidiary classification device C₁, to the predictablity rate G of N class label_1,nCollection be combined into { G_1,0,G_1,1, G_1,2…G_1,N-1}；For subsidiary classification device C₂, to the predictablity rate G of N class label_2,nCollection be combined into { G_2,0,G_2,1,G_2,2… G_2,N-1}；... for subsidiary classification device C_K, to the predictablity rate G of N class label_K,nCollection be combined into { G_K,0,G_K,1,G_K,2… G_K,N-1}。

For t₀,{G_1,0,G_2,0…G_K,0In auxiliary C ' corresponding to maximum value_k,0As t₀Stand-by subsidiary classification device.It is right Answer { t₀,t₁…t_N-1In N number of classification, have N number of stand-by classifier { C '_k,0,C’_k,1…C’_k,N-1(k=i=1,2 ... K).It is right In { C₁、C₂…C_K, it will be wherein not chosen as eliminating for stand-by subsidiary classification device, and do not use, the subsidiary classification not being selected Device sum marks₀；Then stand-by subsidiary classification device number M=K-K₀；Stand-by subsidiary classification device collection is combined into { C '₁、C’₂…C’_M}； Each C '_mBecome independent N_mMore/bis- classifiers of a classification.

It is assumed that classifier C₁Complete t₀The accuracy G of class_1,0In all G_i,0The inside maximum G_1,0=max { G_i,0(i=1, 2 ... K), and C₁Have and there was only this t₀Class defeats other classifiers completely, then forms C '₁, C '₁It is a two new classifier (N_m= 2) t can, be distinguished₀With every other classification { t₁,t₂…t_N-1}。

Or it is assumed that classifier C₁Complete t₀The accuracy G of class_1,0In all G_i,0The inside maximum G_1,0=max { G_i,0(i=1, 2 ... K), and classifier C₁Complete t₁The accuracy G of class_1,1In all G_i,1The inside maximum G_1,1=max { G_i,1(i=1,2 ... K), Then form C '₁, C '₁It is a new multi-categorizer (N_m=3) { t can, be distinguished₀,t₁Class and every other classification { t₂,t₃… t_N-1, wherein { t₀,t₁It is C '₁The advantage classification of identification, other classifications { t₂,t₃…t_N-1It is C '₁The disadvantage classification of identification.

Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device The mapping relations of mode input and output execute classifier unsupervised learning step 200.

It is assumed that C '₁It is to filter out new multi-categorizer (N_m=3) { t can, be distinguished₀,t₁Class and every other classification { t₂, t₃…t_N-1, step 107, output result is divided into 3 class { t₀,t_1,{t₂,t₃…t_N-1, rebuild C '₁Mode input and output Mapping relations.

Because step 107 is reconstructed the mapping relations of mode input and output, model is obtained with step 101 study, even if Algorithm is identical, but the classification for exporting result makes a big difference, C '_mCorresponding C can be regarded as_kThe identification of advantage classification is added Identification reduction strong and to disadvantage classification.

As shown in figure 3, the step 200 in Fig. 1 may include step 201 and step 207.

Step 201: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) establish mode input and output Mapping relations.

Step 202:C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀。

It step 202, can also include utilizing test set D_ltstFlag data test h₀Accuracy of identification, be denoted as G₀；When G₀When lower than preset value, or the G in operation T times of step 200₀It is not promoted, terminates classifier unsupervised learning step 200.

The value of T can be rule of thumb arranged, or be arranged according to the rule of the actual running results of step 200.

Step 203: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buffIn all numbers According to x_i(i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.Meet And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ... L D ') is formed_buff。

Step 204: stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrnAnd D '_buffWhat study obtained Respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M).

Step 205: passing through h '_m(m=1,2 ... M) identifies D '_buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence.

Step 206: enabling D_addFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is added_addSet.

Preset value can be user rule of thumb or C₀Target identification precision set, be also possible to G₀, G₀To utilize test Collect D_ltstFlag data test h₀Obtained accuracy of identification.The standard of default settings is, D_addIn data precision it is big In preset value, to ensure C₀Have and can promote the high-quality data fuel of its growth of constantly evolving.

Step 206 is to recycle Main classification device and stand-by subsidiary classification device votes in the identification data of optimal assurance.

The step of Fig. 3, every execution one was taken turns, training set D_ltrnJust new data D is added_add, when thereby executing next round, main point Class device and stand-by subsidiary classification device are according to new D_ltrnAfter study, model can constantly evolve growth.In order to guarantee newly to be added Data D_addQuality, the application method use Main classification device and the stand-by secondary verifying of subsidiary classification device, to ensure D_add's The quality of data；And subsidiary classification device, have been subjected to the screening of step 100, it can be ensured that its accuracy for exporting result.

The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, accumulates high quality flag data amount, Learning ability can be improved by the continuous study to flag data, Main classification device and subsidiary classification device jointly simultaneously.

The present invention passes through the unsupervised learning process of positive feedback, can constantly promote the learning performance of Main classification device.

Deep learning classification method of the invention has the characteristics that: overall flow is divided into the screening of subsidiary classification device and classification 2 part of device unsupervised learning；Screening technique: auxiliary learner screening principle is that first pre-training finds out the classifier for being most suitable for certain class； Advantage classification method of determination: optimal classification device is determined by precise effect ranking using preparatory training result；Stand-by subsidiary classification Think highly of mapping: the less classification of re -training is after the selected subsidiary classification device of training to be further absorbed in a small number of feature classifications of classifying；It is main Auxiliary classifier ballot confirmation: carrying out confirming and being permanently added have label data collection for the consistent data of major-minor classifier classification, No label data collection does not abandon data mode and guarantees to be not in selection study phenomenon in learning process；Using when use Main classification Device (deep learning classifier C₀) reduce complexity: the study of Main classification device device is assisted by learning, finally with Main classification device work more For application or production environment.

It is the explanation to deep learning classification method of the present invention above.

The invention also includes a kind of deep learning sorter, the principle of the deep learning classification method of the device and Fig. 1 Identical, related place can be cross-referenced.

As shown in figure 4, deep learning sorter includes: data module 400, subsidiary classification device screening module 500 and divides Class device unsupervised learning module 600.

Data module 400: including input data set D, D is by flag data collection D_labelWith Unlabeled data collection D_unlabelGroup At meeting D_label<<D_unlabel, wherein D_labelAgain by training set D_ltrnWith test set D_ltstComposition, meets D_ltrn:D_ltst≥9:1； D_labelInput data x_iAuthentic signature result be y_i, meet y_i∈{t₀,t₁…t_N-1, { t₀,t₁…t_N-1In any classification It is denoted as t_n(n=0,1 ... N-1).

As shown in figure 5, subsidiary classification device screening module 500 includes:

Subsidiary classification device study module 501: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) to establish model defeated Enter the mapping relations with output, subsidiary classification device passes through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k (k=1,2 ... K indicate subsidiary classification device number).

Subsidiary classification device identification module 502: for test set D_ltstIn any data (x_i, y_i), y_i=t_n∈{t₀, t₁…t_N-1, x_iPass through h_kIdentification obtainsI.e.Meet

Recognition result counting module 503: compare y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nIt counts and adds 1；R_k,nFor adding up h_kTo { t₀,t₁…t_N-1In any classification t_nIdentify correct number and W_k,nFor adding up h_kTo { t₀, t₁…t_N-1In any classification t_nIdentify the number of mistake；R_k,nAnd W_k,nInitial value be 0.

Judgment module 504: repeating subsidiary classification device identification module and recognition result counting module, until each h_k(k=1, 2 ... K) to D_ltstIn each data be completed.

Accuracy rate computing module 505: computation model h_kRecognition result in each classification t_nAccuracy rate G_k,n=R_k,n/ (R_k,n+W_k,n)。

Stand-by subsidiary classification device determining module 506: for each classification t_n(n=0,1 ... N-1) selects wherein accuracy rate Highest G_k,n(k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K), And record its selected foundation G of each stand-by subsidiary classification device_k,nCorresponding t_nFor its identification advantage classification, to it is any to With subsidiary classification device, { t₁,t₂…t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identify Disadvantage classification.

Stand-by subsidiary classification device mapping relations reconstructed module 507: advantage classification based on the identification of stand-by subsidiary classification device and Disadvantage classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module.

As shown in fig. 6, classifier unsupervised learning module includes:

Main classification device mapping block 601: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) establish mould The mapping relations of type input and output.

Main classification device study module 602:C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀。

Main classification device study module 602 can also include utilizing test set D_ltstFlag data test h₀Identification essence Degree, is denoted as G₀；Work as G₀When lower than preset value, or the G in operation T times of classifier unsupervised learning module₀It is not promoted, is exited Classifier unsupervised learning module.

The value of T can be rule of thumb arranged, or be set according to the rule of classifier unsupervised learning module the actual running results It sets.

Main classification device identification module 603: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buffIn all data x_i(i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According to(i=1,2 ... L) forms D '_buff。

Stand-by subsidiary classification device study module 604: stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrnAnd D '_buffLearning obtained respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M).

Stand-by subsidiary classification device identification module 605: pass through h '_m(m=1,2 ... M) identifies D '_buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence.

Comparison module 606: D is enabled_addFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is added_addSet.

Preset value can be user rule of thumb or C₀Target identification precision set value, be also possible to G₀, G₀To utilize Test set D_ltstFlag data test h₀Obtained accuracy of identification.

Data set changes module 607: by D_addCollection is incorporated to D_ltrn, while by D_addThe x of collection_iFrom D_unlabelMiddle removal, return are held Row Main classification device study module 602.

In the device of the application, Main classification device accuracy computation module, further includes working as G₀When lower than preset value, or dividing G in operation T times of class device unsupervised learning module₀It is not promoted, exits classifier unsupervised learning module.

The foregoing is merely illustrative of the preferred embodiments of the present invention, not to limit scope of the invention, it is all Within the spirit and principle of technical solution of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this hair Within bright protection scope.

Claims

1. a kind of deep learning classification method includes input data set D, the D is by flag data collection D_labelAnd Unlabeled data Collect D_unlabelComposition, wherein the D_labelAgain by training set D_ltrnWith test set D_ltstComposition, the D_labelInput data x_i's Authentic signature result is y_i, meet y_i∈{t₀,t₁…t_N-1, { the t₀,t₁…t_N-1In any classification be denoted as t_n(n=0,1 ... N-1)；It is characterized in that, the method includes at least subsidiary classification device screening step 100 and classifier unsupervised learning step 200；

The subsidiary classification device screening step 100 includes:

Step 101: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) mapping relations of mode input and output are established, The subsidiary classification device passes through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k(k=1,2 ... K are indicated The subsidiary classification device number)；

Step 102: for test set D_ltstIn any data (x_i, y_i), y_i=t_n, the x_iPass through h_kIdentification obtainsI.e.Meet

Step 103: the y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nCount is incremented；The R_k,nFor Accumulative h_kTo any classification t_nIdentify correct number, the W_k,nFor adding up h_kTo any classification t_nIdentify the number of mistake； The R_k,nAnd W_k,nInitial value be 0；

Step 104: step 102 and 103 is repeated, until each h_k(k=1,2 ... K) is to D_ltstIn each data test Finish；

Step 106: for each classification t_n(n=0,1 ... N-1) selects the wherein highest G of accuracy rate_k,n

(k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K), record are each Its selected foundation G of stand-by subsidiary classification device_k,nCorresponding t_nFor the advantage classification of its identification, any stand-by auxiliary is divided Class device, { t₁, t₂…t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class Not；

Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device model The mapping relations output and input execute the classifier unsupervised learning step 200；

The classifier unsupervised learning step 200 includes:

Step 201: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) establish the mapping of mode input and output Relationship；

Step 202: the C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀；

Step 203: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buffIn all data x_i (i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.MeetAnd Record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level Form D '_buff；

Step 204: the stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrnAnd D '_buffWhat study obtained Respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M)；

Step 205: passing through the h '_m(m=1,2 ... M) identifies the D '_buffIn L data, i.e.,=h '_m(x_i) (i=1, 2 ... L, m=1,2 ... M), and recordClassification confidence；

Step 206: enabling D_addFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is added_addSet；

2. classification method according to claim 1, which is characterized in that the step 202 further includes utilizing test set D_ltst Flag data test h₀Accuracy of identification, be denoted as G₀；

Preset value described in the step 207 is G₀。

3. according to classification method described in right 1, which is characterized in that the D meets D_label<<D_unlabel。

4. according to classification method described in right 1, which is characterized in that the D_labelMeet D_ltrn:D_ltst≥9:1。

5. according to classification method described in right 2, which is characterized in that the step 202 further includes working as G₀When lower than preset value, Or the G in operation T times of the step 200₀It is not promoted, terminates the classifier unsupervised learning step 200.

6. a kind of deep learning sorter, which is characterized in that described device includes at least: data module, subsidiary classification device sieve Modeling block, classifier unsupervised learning module:

Data module: including input data set D, the D is by flag data collection D_labelWith Unlabeled data collection D_unlabelComposition, Described in D_labelAgain by training set D_ltrnWith test set D_ltstComposition, the D_labelInput data x_iAuthentic signature result be y_i, meet y_i∈{t₀,t₁…t_N-1, { the t₀,t₁…t_N-1In any classification be denoted as t_n(n=0,1 ... N-1)；

The subsidiary classification device screening module includes:

Subsidiary classification device study module: subsidiary classification device is with training set D_ltrnIn data (x_i, y_i) establish mode input and output Mapping relations, the subsidiary classification device pass through training set D_ltrnStudy obtains respective output model, i.e. C_k(D_ltrn)→h_k(k= 1,2 ... K indicates the subsidiary classification device number)；

Subsidiary classification device identification module: for test set D_ltstIn any data (x_i, y_i), y_i=t_n, the x_iPass through h_kIdentification It obtainsI.e.Meet

Recognition result counting module: the y_iWithIf consistent, make R_k,nCount is incremented, otherwise, makes W_k,nCount is incremented；Institute State R_k,nFor adding up h_kTo any classification t_nIdentify correct number, the W_k,nFor adding up h_kTo any classification t_nIdentification is wrong Number accidentally；The R_k,nAnd W_k,nInitial value be 0；

Judgment module: repeating the subsidiary classification device identification module and the recognition result counting module, until each h_k(k=1, 2 ... K) to D_ltstIn each data be completed；

Accuracy rate computing module: computation model h_kRecognition result in each classification t_nAccuracy rate G_k,n=R_k,n/(R_k,n+W_k,n)；

Stand-by subsidiary classification device determining module: for each classification t_n(n=0,1 ... N-1) selects the wherein highest G of accuracy rate_k,n (k=1,2 ... K) corresponding C_kAs the stand-by subsidiary classification device of the category, it is denoted as C '₁、C’₂…C’_M(M≤K), and record each Its selected foundation G of stand-by subsidiary classification device_k,nCorresponding t_nFor the advantage classification of its identification, any stand-by auxiliary is divided Class device, { t₁, t₂…t_N-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class Not；

Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the stand-by subsidiary classification device identification Classification rebuilds the mapping relations of stand-by the subsidiary classification device mode input and output, executes the classifier unsupervised learning Module；

The classifier unsupervised learning module includes:

Main classification device mapping block: deep learning classifier C₀With training set D_ltrnIn data (x_i, y_i) establish mode input with The mapping relations of output；

Main classification device study module: the C₀Pass through training set D_ltrnLearning obtained model is h₀, i.e. C₀(D_ltrn)→h₀；

Main classification device identification module: from unlabelled D_unlabelB sample is taken out in data set constitutes D_buff；For D_buffIn All data x_i(i=1,2 ... b), passes through model h₀Identify x_iObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According toForm D '_buff；

Stand-by subsidiary classification device study module: the stand-by subsidiary classification device C '₁、C’₂…C’_M(M≤K) passes through training set D_ltrnWith D’_buffLearning obtained respective model is h '_m, i.e. C '_m(D_ltrn+D’_buff)→h’_m(m=1,2 ... M)；

Stand-by subsidiary classification device identification module: pass through the h '_m(m=1,2 ... M) identifies the D '_buffIn L data, i.e.,And it recordsClassification confidence；

Comparison module: D is enabled_addFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is added_addSet；

Data set changes module: by D_addCollection is incorporated to D_ltrn, while by D_addThe x of collection_iFrom D_unlabelMiddle removal returns and executes main point Class device study module.

7. according to device described in right 6, which is characterized in that the Main classification device study module further includes utilizing test set D_ltstFlag data test h₀Accuracy of identification, be denoted as G₀；

Preset value described in the comparison module is G₀。

8. according to device described in right 6, which is characterized in that the D meets D_label<<D_unlabel。

9. according to device described in right 6, which is characterized in that the D_labelMeet D_ltrn:D_ltst≥9:1。

10. according to device described in right 7, which is characterized in that the Main classification device study module further includes working as G₀Lower than default When value, or the G in operation T times of the classifier unsupervised learning module₀It is not promoted, exits non-supervisory of the classifier Practise module.