CN109409390A - Deep learning classification method and device - Google Patents
Deep learning classification method and device Download PDFInfo
- Publication number
- CN109409390A CN109409390A CN201710705331.9A CN201710705331A CN109409390A CN 109409390 A CN109409390 A CN 109409390A CN 201710705331 A CN201710705331 A CN 201710705331A CN 109409390 A CN109409390 A CN 109409390A
- Authority
- CN
- China
- Prior art keywords
- classification
- data
- subsidiary
- classification device
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/259—Fusion by voting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of deep learning classification method and device, and wherein deep learning classification method includes, step 100: being based on flag data collection Dlabel, the highest subsidiary classification device of recognition accuracy is selected as stand-by subsidiary classification device, is denoted as C '1、C’2…C’M;Step 200: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff, C0With stand-by subsidiary classification device to DbuffIt is identified, works as C0When consistent and confidence level is more than predetermined value with the recognition result of any stand-by subsidiary classification device, by C0To DbuffFlag data collection D is added in the recognition result of data and data itselflabel, and by the data from Unlabeled data collection DunlabelMiddle removal.The present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic high quality of data marks, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.
Description
Technical field
The present invention relates to computer field, in particular to a kind of deep learning classification method and device.
Background technique
With the arrival of big data era, more and more data can be supplied to machine and be learnt, and deep learning is
The novel sharp weapon of big data era Artificial Intelligence Development, compared to conventional machines learning method, deep learning makes artificial intelligence meter
Calculation machine vision, speech recognition and natural language processing level are greatly improved, and the above problem is increased to one from unavailable
It is a to can be generalized to commercial degree.
Classification problem is one of most basic problem of deep learning, and many problems of deep learning algorithm require to be based on
The feature extraction of classification carries out other operations again.The classifying quality of each traditional methods, can not show a candle to deep learning.Based on depth
For the method for study under " fuel " driving of big data, bottleneck effect is significantly lower than other learners, and performance can continue to rise,
And after other conventional sorting methods reach some performance, learn also increase again anyway.
But big data era, the marker samples cost for obtaining high quality is very high, many professional domains, such as medical treatment interpreting blueprints
Or translation, handmarking are also required to very professional level, low-quality label can not only improve machine learning effect, instead
And the quality of study may be influenced.
In order to reduce the cost of full handmarking, raising handmarking's efficiency generally uses following methods:
1. developing dedicated marker software, labeling effciency is improved
2. preliminary making is carried out using traditional learner based on manual features, it is then artificial to veritify
Although the above method can effectively reduce the workload of handmarking, but still need artificial participation, consider existing
Data volume explosion under stage big data form, efficiency is not enough to provide enough " fuel " to deep learning, therefore needs to solve
Certainly the data under the conditions of big data mark the unsupervised learning problem of problem and deep learning method automatically.
Summary of the invention
In view of this, the present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic height of data
Quality status stamp, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.
The present invention provides a kind of deep learning classification method, includes input data set D, D is by flag data collection DlabelNot
Flag data collection DunlabelIt forms, wherein DlabelAgain by training set DltrnWith test set DltstComposition, DlabelInput data xi's
Authentic signature result is yi, meet yi∈{t0,t1…tN-1, { t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-
1);Deep learning classification method of the invention includes at least subsidiary classification device screening step 100 and classifier unsupervised learning walks
Rapid 200;
Subsidiary classification device screening step 100 includes:
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) establish the mapping of mode input and output
Relationship, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K, table
Show that subsidiary classification device is numbered);
Step 102: for test set DltstIn any data (xi, yi), yi=tn, xiPass through hkIdentification obtainsI.e.Meet
Step 103: comparing yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Rk,nFor adding up
hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnIdentify the number of mistake;Rk,nAnd Wk,n
Initial value be 0;
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test
It finishes;
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n(k=1,2 ...
K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K) records each stand-by subsidiary classification
Its selected foundation G of devicek,nCorresponding tnFor its identification advantage classification, to any stand-by subsidiary classification device, { t1, t2…
tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification;
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device
The mapping relations of mode input and output execute classifier unsupervised learning step 200;
Classifier unsupervised learning step 200 includes:
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input and output
Mapping relations;
Step 202:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0;
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all numbers
According to xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.Meet
And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ...
L D ') is formedbuff;
Step 204: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained
Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Step 205: passing through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ...
L, m=1,2 ... M), and recordClassification confidence;
Step 206: enabling DaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
The present invention also provides a kind of deep learning sorters, comprising: data module, is divided at subsidiary classification device screening module
Class device unsupervised learning module;
Data module: including input data set D, D is by flag data collection DlabelWith Unlabeled data collection DunlabelComposition,
Middle DlabelAgain by training set DltrnWith test set DltstComposition, DlabelInput data xiAuthentic signature result be yi, meet yi
∈{t0,t1…tN-1, { t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1);
Subsidiary classification device screening module includes:
Subsidiary classification device study module: subsidiary classification device is with training set DltrnIn data (xi, yi) establish mode input with
The mapping relations of output, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=
1,2 ... K indicates subsidiary classification device number);
Subsidiary classification device identification module: for test set DltstIn any data (xi, yi), yi=tn, xiPass through hkKnow
It does not obtainI.e.Meet
Recognition result counting module: compare yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;
Rk,nFor adding up hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnTime of identification mistake
Number;Rk,nAnd Wk,nInitial value be 0;
Judgment module: repeating subsidiary classification device identification module and recognition result counting module, until each hk(k=1,2 ...
K) to DltstIn each data be completed;
Accuracy rate computing module: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+
Wk,n);
Stand-by subsidiary classification device determining module: for each classification tn(n=0,1 ... N-1) selects wherein accuracy rate highest
Gk,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), and remember
Record its selected foundation G of each stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, to any stand-by auxiliary
Classifier is helped, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification it is bad
Gesture classification;
Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the identification of stand-by subsidiary classification device
Classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module;
Classifier unsupervised learning module includes:
Main classification device mapping block: deep learning classifier C0With training set DltrnIn data (xi, yi) to establish model defeated
Enter the mapping relations with output;
Main classification device study module: C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0;
Main classification device identification module: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For Dbuff
In all data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level
According to(i=1,2 ... L) forms D 'buff;
Stand-by subsidiary classification device study module: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set Dltrn
And D 'buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Stand-by subsidiary classification device identification module: pass through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence;
Comparison module: D is enabledaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Data set changes module: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal is returned and is executed
Main classification device study module.
The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point
Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, constantly accumulates high quality reference numerals
Learning ability can be improved jointly according to amount, while by the continuous learning to flag data, Main classification device and subsidiary classification device.
Detailed description of the invention
Fig. 1 is the flow chart of deep learning classification method of the present invention;
Fig. 2 is the flow chart of S100 in Fig. 1;
Fig. 3 is the flow chart of S200 in Fig. 1;
Fig. 4 is the structural schematic diagram of deep learning sorter of the present invention;
Fig. 5 is the structural schematic diagram of 500 modules in Fig. 4;
Fig. 6 is the structural schematic diagram of 600 modules in Fig. 4.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments
The present invention is described in detail.
It include input data set D, deep learning classifier C the present invention relates to object0And multiple subsidiary classification devices are C1、
C2……CK。
Input data set D is divided into the D of labellabelWith unlabelled Dunlabel, and meet Dlabel<<Dunlabel, wherein
DlabelIt is divided into training set D againltrnWith test set Dltst, meet Dltrn:Dltst>=9:1, and maintain test set DltstIt is constant.
For DlabelIn any data (xi,yi), wherein subscript i represents data sequence number, yiFor xiAuthentic signature knot
Fruit marks the collection of result to be combined into { t0,t1…tN-1, including N kind category label is as a result, meet yi ∈ { t0,t1…tN-1, { t0,
t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1).
Correspondingly, DunlabelIn data be not yet marked, only xi, without marking result y accordinglyi, DunlabelIn
Data can be used classifier and be identified, meet recognition result ∈ { t0,t1…tN-1}。
Multiple subsidiary classification devices are C1、C2…CK, it is except deep learning classifier C0Except other classifiers, including line
Property recurrence, decision tree, supporting vector, bayes method, neural network etc..
Deep learning classification method of the invention, as shown in Figure 1, comprising: step 100: the screening of subsidiary classification device and step
200: classifier unsupervised learning step.
As shown in Fig. 2, step 100: subsidiary classification device screening step includes step 101 to step 107.
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) establish the mapping of mode input and output
Relationship, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K, table
Show that subsidiary classification device is numbered).
Step 102: for test set DltstIn any data (xi, yi), yi=tn∈{t0,t1…tN-1, xiPass through hk
Identification obtainsI.e.Meet
Step 103: comparing yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Rk,nFor adding up
hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnIdentify the number of mistake;Rk,nAnd Wk,n
Initial value be 0.
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test
It finishes.
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n)。
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n(k=1,2 ...
K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K) records each stand-by subsidiary classification
Its selected foundation G of devicek,nCorresponding tnFor its identification advantage classification, to any stand-by subsidiary classification device, { t1, t2…
tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification.
For example, for subsidiary classification device C1, to the predictablity rate G of N class label1,nCollection be combined into { G1,0,G1,1,
G1,2…G1,N-1};For subsidiary classification device C2, to the predictablity rate G of N class label2,nCollection be combined into { G2,0,G2,1,G2,2…
G2,N-1};... for subsidiary classification device CK, to the predictablity rate G of N class labelK,nCollection be combined into { GK,0,GK,1,GK,2…
GK,N-1}。
For t0,{G1,0,G2,0…GK,0In auxiliary C ' corresponding to maximum valuek,0As t0Stand-by subsidiary classification device.It is right
Answer { t0,t1…tN-1In N number of classification, have N number of stand-by classifier { C 'k,0,C’k,1…C’k,N-1(k=i=1,2 ... K).It is right
In { C1、C2…CK, it will be wherein not chosen as eliminating for stand-by subsidiary classification device, and do not use, the subsidiary classification not being selected
Device sum marks0;Then stand-by subsidiary classification device number M=K-K0;Stand-by subsidiary classification device collection is combined into { C '1、C’2…C’M};
Each C 'mBecome independent NmMore/bis- classifiers of a classification.
It is assumed that classifier C1Complete t0The accuracy G of class1,0In all Gi,0The inside maximum G1,0=max { Gi,0(i=1,
2 ... K), and C1Have and there was only this t0Class defeats other classifiers completely, then forms C '1, C '1It is a two new classifier (Nm=
2) t can, be distinguished0With every other classification { t1,t2…tN-1}。
Or it is assumed that classifier C1Complete t0The accuracy G of class1,0In all Gi,0The inside maximum G1,0=max { Gi,0(i=1,
2 ... K), and classifier C1Complete t1The accuracy G of class1,1In all Gi,1The inside maximum G1,1=max { Gi,1(i=1,2 ... K),
Then form C '1, C '1It is a new multi-categorizer (Nm=3) { t can, be distinguished0,t1Class and every other classification { t2,t3…
tN-1, wherein { t0,t1It is C '1The advantage classification of identification, other classifications { t2,t3…tN-1It is C '1The disadvantage classification of identification.
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device
The mapping relations of mode input and output execute classifier unsupervised learning step 200.
It is assumed that C '1It is to filter out new multi-categorizer (Nm=3) { t can, be distinguished0,t1Class and every other classification { t2,
t3…tN-1, step 107, output result is divided into 3 class { t0,t1,{t2,t3…tN-1, rebuild C '1Mode input and output
Mapping relations.
Because step 107 is reconstructed the mapping relations of mode input and output, model is obtained with step 101 study, even if
Algorithm is identical, but the classification for exporting result makes a big difference, C 'mCorresponding C can be regarded askThe identification of advantage classification is added
Identification reduction strong and to disadvantage classification.
As shown in figure 3, the step 200 in Fig. 1 may include step 201 and step 207.
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input and output
Mapping relations.
Step 202:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0。
It step 202, can also include utilizing test set DltstFlag data test h0Accuracy of identification, be denoted as G0;When
G0When lower than preset value, or the G in operation T times of step 2000It is not promoted, terminates classifier unsupervised learning step 200.
The value of T can be rule of thumb arranged, or be arranged according to the rule of the actual running results of step 200.
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all numbers
According to xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.Meet
And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ...
L D ') is formedbuff。
Step 204: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained
Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M).
Step 205: passing through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ...
L, m=1,2 ... M), and recordClassification confidence.
Step 206: enabling DaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is addedaddSet.
Preset value can be user rule of thumb or C0Target identification precision set, be also possible to G0, G0To utilize test
Collect DltstFlag data test h0Obtained accuracy of identification.The standard of default settings is, DaddIn data precision it is big
In preset value, to ensure C0Have and can promote the high-quality data fuel of its growth of constantly evolving.
Step 206 is to recycle Main classification device and stand-by subsidiary classification device votes in the identification data of optimal assurance.
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
The step of Fig. 3, every execution one was taken turns, training set DltrnJust new data D is addedadd, when thereby executing next round, main point
Class device and stand-by subsidiary classification device are according to new DltrnAfter study, model can constantly evolve growth.In order to guarantee newly to be added
Data DaddQuality, the application method use Main classification device and the stand-by secondary verifying of subsidiary classification device, to ensure Dadd's
The quality of data;And subsidiary classification device, have been subjected to the screening of step 100, it can be ensured that its accuracy for exporting result.
The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point
Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, accumulates high quality flag data amount,
Learning ability can be improved by the continuous study to flag data, Main classification device and subsidiary classification device jointly simultaneously.
The present invention passes through the unsupervised learning process of positive feedback, can constantly promote the learning performance of Main classification device.
Deep learning classification method of the invention has the characteristics that: overall flow is divided into the screening of subsidiary classification device and classification
2 part of device unsupervised learning;Screening technique: auxiliary learner screening principle is that first pre-training finds out the classifier for being most suitable for certain class;
Advantage classification method of determination: optimal classification device is determined by precise effect ranking using preparatory training result;Stand-by subsidiary classification
Think highly of mapping: the less classification of re -training is after the selected subsidiary classification device of training to be further absorbed in a small number of feature classifications of classifying;It is main
Auxiliary classifier ballot confirmation: carrying out confirming and being permanently added have label data collection for the consistent data of major-minor classifier classification,
No label data collection does not abandon data mode and guarantees to be not in selection study phenomenon in learning process;Using when use Main classification
Device (deep learning classifier C0) reduce complexity: the study of Main classification device device is assisted by learning, finally with Main classification device work more
For application or production environment.
It is the explanation to deep learning classification method of the present invention above.
The invention also includes a kind of deep learning sorter, the principle of the deep learning classification method of the device and Fig. 1
Identical, related place can be cross-referenced.
As shown in figure 4, deep learning sorter includes: data module 400, subsidiary classification device screening module 500 and divides
Class device unsupervised learning module 600.
Data module 400: including input data set D, D is by flag data collection DlabelWith Unlabeled data collection DunlabelGroup
At meeting Dlabel<<Dunlabel, wherein DlabelAgain by training set DltrnWith test set DltstComposition, meets Dltrn:Dltst≥9:1;
DlabelInput data xiAuthentic signature result be yi, meet yi∈{t0,t1…tN-1, { t0,t1…tN-1In any classification
It is denoted as tn(n=0,1 ... N-1).
As shown in figure 5, subsidiary classification device screening module 500 includes:
Subsidiary classification device study module 501: subsidiary classification device is with training set DltrnIn data (xi, yi) to establish model defeated
Enter the mapping relations with output, subsidiary classification device passes through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk
(k=1,2 ... K indicate subsidiary classification device number).
Subsidiary classification device identification module 502: for test set DltstIn any data (xi, yi), yi=tn∈{t0,
t1…tN-1, xiPass through hkIdentification obtainsI.e.Meet
Recognition result counting module 503: compare yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nIt counts and adds
1;Rk,nFor adding up hkTo { t0,t1…tN-1In any classification tnIdentify correct number and Wk,nFor adding up hkTo { t0,
t1…tN-1In any classification tnIdentify the number of mistake;Rk,nAnd Wk,nInitial value be 0.
Judgment module 504: repeating subsidiary classification device identification module and recognition result counting module, until each hk(k=1,
2 ... K) to DltstIn each data be completed.
Accuracy rate computing module 505: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/
(Rk,n+Wk,n)。
Stand-by subsidiary classification device determining module 506: for each classification tn(n=0,1 ... N-1) selects wherein accuracy rate
Highest Gk,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K),
And record its selected foundation G of each stand-by subsidiary classification devicek,nCorresponding tnFor its identification advantage classification, to it is any to
With subsidiary classification device, { t1,t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identify
Disadvantage classification.
Stand-by subsidiary classification device mapping relations reconstructed module 507: advantage classification based on the identification of stand-by subsidiary classification device and
Disadvantage classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module.
As shown in fig. 6, classifier unsupervised learning module includes:
Main classification device mapping block 601: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mould
The mapping relations of type input and output.
Main classification device study module 602:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0。
Main classification device study module 602 can also include utilizing test set DltstFlag data test h0Identification essence
Degree, is denoted as G0;Work as G0When lower than preset value, or the G in operation T times of classifier unsupervised learning module0It is not promoted, is exited
Classifier unsupervised learning module.
The value of T can be rule of thumb arranged, or be set according to the rule of classifier unsupervised learning module the actual running results
It sets.
Main classification device identification module 603: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For
DbuffIn all data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level
According to(i=1,2 ... L) forms D 'buff。
Stand-by subsidiary classification device study module 604: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set
DltrnAnd D 'buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M).
Stand-by subsidiary classification device identification module 605: pass through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence.
Comparison module 606: D is enabledaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is addedaddSet.
Preset value can be user rule of thumb or C0Target identification precision set value, be also possible to G0, G0To utilize
Test set DltstFlag data test h0Obtained accuracy of identification.
Data set changes module 607: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, return are held
Row Main classification device study module 602.
In the device of the application, Main classification device accuracy computation module, further includes working as G0When lower than preset value, or dividing
G in operation T times of class device unsupervised learning module0It is not promoted, exits classifier unsupervised learning module.
The foregoing is merely illustrative of the preferred embodiments of the present invention, not to limit scope of the invention, it is all
Within the spirit and principle of technical solution of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this hair
Within bright protection scope.
Claims (10)
1. a kind of deep learning classification method includes input data set D, the D is by flag data collection DlabelAnd Unlabeled data
Collect DunlabelComposition, wherein the DlabelAgain by training set DltrnWith test set DltstComposition, the DlabelInput data xi's
Authentic signature result is yi, meet yi∈{t0,t1…tN-1, { the t0,t1…tN-1In any classification be denoted as tn(n=0,1 ...
N-1);It is characterized in that, the method includes at least subsidiary classification device screening step 100 and classifier unsupervised learning step
200;
The subsidiary classification device screening step 100 includes:
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) mapping relations of mode input and output are established,
The subsidiary classification device passes through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K are indicated
The subsidiary classification device number);
Step 102: for test set DltstIn any data (xi, yi), yi=tn, the xiPass through hkIdentification obtainsI.e.Meet
Step 103: the yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;The Rk,nFor
Accumulative hkTo any classification tnIdentify correct number, the Wk,nFor adding up hkTo any classification tnIdentify the number of mistake;
The Rk,nAnd Wk,nInitial value be 0;
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test
Finish;
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n
(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), record are each
Its selected foundation G of stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, any stand-by auxiliary is divided
Class device, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class
Not;
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device model
The mapping relations output and input execute the classifier unsupervised learning step 200;
The classifier unsupervised learning step 200 includes:
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish the mapping of mode input and output
Relationship;
Step 202: the C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0;
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all data xi
(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd
Record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level
Form D 'buff;
Step 204: the stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained
Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Step 205: passing through the h 'm(m=1,2 ... M) identifies the D 'buffIn L data, i.e.,=h 'm(xi) (i=1,
2 ... L, m=1,2 ... M), and recordClassification confidence;
Step 206: enabling DaddFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
2. classification method according to claim 1, which is characterized in that the step 202 further includes utilizing test set Dltst
Flag data test h0Accuracy of identification, be denoted as G0;
Preset value described in the step 207 is G0。
3. according to classification method described in right 1, which is characterized in that the D meets Dlabel<<Dunlabel。
4. according to classification method described in right 1, which is characterized in that the DlabelMeet Dltrn:Dltst≥9:1。
5. according to classification method described in right 2, which is characterized in that the step 202 further includes working as G0When lower than preset value,
Or the G in operation T times of the step 2000It is not promoted, terminates the classifier unsupervised learning step 200.
6. a kind of deep learning sorter, which is characterized in that described device includes at least: data module, subsidiary classification device sieve
Modeling block, classifier unsupervised learning module:
Data module: including input data set D, the D is by flag data collection DlabelWith Unlabeled data collection DunlabelComposition,
Described in DlabelAgain by training set DltrnWith test set DltstComposition, the DlabelInput data xiAuthentic signature result be
yi, meet yi∈{t0,t1…tN-1, { the t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1);
The subsidiary classification device screening module includes:
Subsidiary classification device study module: subsidiary classification device is with training set DltrnIn data (xi, yi) establish mode input and output
Mapping relations, the subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=
1,2 ... K indicates the subsidiary classification device number);
Subsidiary classification device identification module: for test set DltstIn any data (xi, yi), yi=tn, the xiPass through hkIdentification
It obtainsI.e.Meet
Recognition result counting module: the yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Institute
State Rk,nFor adding up hkTo any classification tnIdentify correct number, the Wk,nFor adding up hkTo any classification tnIdentification is wrong
Number accidentally;The Rk,nAnd Wk,nInitial value be 0;
Judgment module: repeating the subsidiary classification device identification module and the recognition result counting module, until each hk(k=1,
2 ... K) to DltstIn each data be completed;
Accuracy rate computing module: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Stand-by subsidiary classification device determining module: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n
(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), and record each
Its selected foundation G of stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, any stand-by auxiliary is divided
Class device, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class
Not;
Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the stand-by subsidiary classification device identification
Classification rebuilds the mapping relations of stand-by the subsidiary classification device mode input and output, executes the classifier unsupervised learning
Module;
The classifier unsupervised learning module includes:
Main classification device mapping block: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input with
The mapping relations of output;
Main classification device study module: the C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0;
Main classification device identification module: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn
All data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level
According toForm D 'buff;
Stand-by subsidiary classification device study module: the stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnWith
D’buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Stand-by subsidiary classification device identification module: pass through the h 'm(m=1,2 ... M) identifies the D 'buffIn L data, i.e.,And it recordsClassification confidence;
Comparison module: D is enabledaddFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Data set changes module: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal returns and executes main point
Class device study module.
7. according to device described in right 6, which is characterized in that the Main classification device study module further includes utilizing test set
DltstFlag data test h0Accuracy of identification, be denoted as G0;
Preset value described in the comparison module is G0。
8. according to device described in right 6, which is characterized in that the D meets Dlabel<<Dunlabel。
9. according to device described in right 6, which is characterized in that the DlabelMeet Dltrn:Dltst≥9:1。
10. according to device described in right 7, which is characterized in that the Main classification device study module further includes working as G0Lower than default
When value, or the G in operation T times of the classifier unsupervised learning module0It is not promoted, exits non-supervisory of the classifier
Practise module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710705331.9A CN109409390A (en) | 2017-08-17 | 2017-08-17 | Deep learning classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710705331.9A CN109409390A (en) | 2017-08-17 | 2017-08-17 | Deep learning classification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109409390A true CN109409390A (en) | 2019-03-01 |
Family
ID=65454795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710705331.9A Pending CN109409390A (en) | 2017-08-17 | 2017-08-17 | Deep learning classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109409390A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385890A (en) * | 2022-03-22 | 2022-04-22 | 深圳市世纪联想广告有限公司 | Internet public opinion monitoring system |
-
2017
- 2017-08-17 CN CN201710705331.9A patent/CN109409390A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114385890A (en) * | 2022-03-22 | 2022-04-22 | 深圳市世纪联想广告有限公司 | Internet public opinion monitoring system |
CN114385890B (en) * | 2022-03-22 | 2022-05-20 | 深圳市世纪联想广告有限公司 | Internet public opinion monitoring system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Stock et al. | Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases | |
CN112214610B (en) | Entity relationship joint extraction method based on span and knowledge enhancement | |
CN109635108B (en) | Man-machine interaction based remote supervision entity relationship extraction method | |
CN108229588B (en) | Machine learning identification method based on deep learning | |
CN103886330B (en) | Sorting technique based on semi-supervised SVM integrated study | |
CN109815801A (en) | Face identification method and device based on deep learning | |
CN103679160B (en) | Human-face identifying method and device | |
CN103838744B (en) | A kind of method and device of query word demand analysis | |
CN104239858A (en) | Method and device for verifying facial features | |
CN107169086B (en) | Text classification method | |
CN107169485A (en) | A kind of method for identifying mathematical formula and device | |
CN103559504A (en) | Image target category identification method and device | |
CN104834941A (en) | Offline handwriting recognition method of sparse autoencoder based on computer input | |
CN109934203A (en) | A kind of cost-sensitive increment type face identification method based on comentropy selection | |
CN104156690B (en) | A kind of gesture identification method based on image space pyramid feature bag | |
CN108596163A (en) | A kind of Coal-rock identification method based on CNN and VLAD | |
CN106845358A (en) | A kind of method and system of handwritten character characteristics of image identification | |
CN109213853A (en) | A kind of Chinese community's question and answer cross-module state search method based on CCA algorithm | |
CN107145514A (en) | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models | |
CN104376308B (en) | A kind of human motion recognition method based on multi-task learning | |
CN110458600A (en) | Portrait model training method, device, computer equipment and storage medium | |
CN109656808A (en) | A kind of Software Defects Predict Methods based on hybrid active learning strategies | |
CN108345942B (en) | Machine learning identification method based on embedded code learning | |
CN108229692B (en) | Machine learning identification method based on dual contrast learning | |
CN110414622A (en) | Classifier training method and device based on semi-supervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190301 |