CN109409390A - Deep learning classification method and device - Google Patents

Deep learning classification method and device Download PDF

Info

Publication number
CN109409390A
CN109409390A CN201710705331.9A CN201710705331A CN109409390A CN 109409390 A CN109409390 A CN 109409390A CN 201710705331 A CN201710705331 A CN 201710705331A CN 109409390 A CN109409390 A CN 109409390A
Authority
CN
China
Prior art keywords
classification
data
subsidiary
classification device
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710705331.9A
Other languages
Chinese (zh)
Inventor
侯国梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Original Assignee
Potevio Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Information Technology Co Ltd filed Critical Potevio Information Technology Co Ltd
Priority to CN201710705331.9A priority Critical patent/CN109409390A/en
Publication of CN109409390A publication Critical patent/CN109409390A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of deep learning classification method and device, and wherein deep learning classification method includes, step 100: being based on flag data collection Dlabel, the highest subsidiary classification device of recognition accuracy is selected as stand-by subsidiary classification device, is denoted as C '1、C’2…C’M;Step 200: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff, C0With stand-by subsidiary classification device to DbuffIt is identified, works as C0When consistent and confidence level is more than predetermined value with the recognition result of any stand-by subsidiary classification device, by C0To DbuffFlag data collection D is added in the recognition result of data and data itselflabel, and by the data from Unlabeled data collection DunlabelMiddle removal.The present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic high quality of data marks, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.

Description

Deep learning classification method and device
Technical field
The present invention relates to computer field, in particular to a kind of deep learning classification method and device.
Background technique
With the arrival of big data era, more and more data can be supplied to machine and be learnt, and deep learning is The novel sharp weapon of big data era Artificial Intelligence Development, compared to conventional machines learning method, deep learning makes artificial intelligence meter Calculation machine vision, speech recognition and natural language processing level are greatly improved, and the above problem is increased to one from unavailable It is a to can be generalized to commercial degree.
Classification problem is one of most basic problem of deep learning, and many problems of deep learning algorithm require to be based on The feature extraction of classification carries out other operations again.The classifying quality of each traditional methods, can not show a candle to deep learning.Based on depth For the method for study under " fuel " driving of big data, bottleneck effect is significantly lower than other learners, and performance can continue to rise, And after other conventional sorting methods reach some performance, learn also increase again anyway.
But big data era, the marker samples cost for obtaining high quality is very high, many professional domains, such as medical treatment interpreting blueprints Or translation, handmarking are also required to very professional level, low-quality label can not only improve machine learning effect, instead And the quality of study may be influenced.
In order to reduce the cost of full handmarking, raising handmarking's efficiency generally uses following methods:
1. developing dedicated marker software, labeling effciency is improved
2. preliminary making is carried out using traditional learner based on manual features, it is then artificial to veritify
Although the above method can effectively reduce the workload of handmarking, but still need artificial participation, consider existing Data volume explosion under stage big data form, efficiency is not enough to provide enough " fuel " to deep learning, therefore needs to solve Certainly the data under the conditions of big data mark the unsupervised learning problem of problem and deep learning method automatically.
Summary of the invention
In view of this, the present invention provides a kind of deep learning classification method and classifier, it can be achieved that the automatic height of data Quality status stamp, and then it is based on data " fuel ", the learning effect of deep learning classifier is continuously improved.
The present invention provides a kind of deep learning classification method, includes input data set D, D is by flag data collection DlabelNot Flag data collection DunlabelIt forms, wherein DlabelAgain by training set DltrnWith test set DltstComposition, DlabelInput data xi's Authentic signature result is yi, meet yi∈{t0,t1…tN-1, { t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N- 1);Deep learning classification method of the invention includes at least subsidiary classification device screening step 100 and classifier unsupervised learning walks Rapid 200;
Subsidiary classification device screening step 100 includes:
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) establish the mapping of mode input and output Relationship, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K, table Show that subsidiary classification device is numbered);
Step 102: for test set DltstIn any data (xi, yi), yi=tn, xiPass through hkIdentification obtainsI.e.Meet
Step 103: comparing yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Rk,nFor adding up hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnIdentify the number of mistake;Rk,nAnd Wk,n Initial value be 0;
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test It finishes;
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K) records each stand-by subsidiary classification Its selected foundation G of devicek,nCorresponding tnFor its identification advantage classification, to any stand-by subsidiary classification device, { t1, t2… tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification;
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device The mapping relations of mode input and output execute classifier unsupervised learning step 200;
Classifier unsupervised learning step 200 includes:
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input and output Mapping relations;
Step 202:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all numbers According to xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.Meet And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ... L D ') is formedbuff
Step 204: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Step 205: passing through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence;
Step 206: enabling DaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
The present invention also provides a kind of deep learning sorters, comprising: data module, is divided at subsidiary classification device screening module Class device unsupervised learning module;
Data module: including input data set D, D is by flag data collection DlabelWith Unlabeled data collection DunlabelComposition, Middle DlabelAgain by training set DltrnWith test set DltstComposition, DlabelInput data xiAuthentic signature result be yi, meet yi ∈{t0,t1…tN-1, { t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1);
Subsidiary classification device screening module includes:
Subsidiary classification device study module: subsidiary classification device is with training set DltrnIn data (xi, yi) establish mode input with The mapping relations of output, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k= 1,2 ... K indicates subsidiary classification device number);
Subsidiary classification device identification module: for test set DltstIn any data (xi, yi), yi=tn, xiPass through hkKnow It does not obtainI.e.Meet
Recognition result counting module: compare yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented; Rk,nFor adding up hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnTime of identification mistake Number;Rk,nAnd Wk,nInitial value be 0;
Judgment module: repeating subsidiary classification device identification module and recognition result counting module, until each hk(k=1,2 ... K) to DltstIn each data be completed;
Accuracy rate computing module: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+ Wk,n);
Stand-by subsidiary classification device determining module: for each classification tn(n=0,1 ... N-1) selects wherein accuracy rate highest Gk,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), and remember Record its selected foundation G of each stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, to any stand-by auxiliary Classifier is helped, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification it is bad Gesture classification;
Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the identification of stand-by subsidiary classification device Classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module;
Classifier unsupervised learning module includes:
Main classification device mapping block: deep learning classifier C0With training set DltrnIn data (xi, yi) to establish model defeated Enter the mapping relations with output;
Main classification device study module: C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
Main classification device identification module: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For Dbuff In all data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According to(i=1,2 ... L) forms D 'buff
Stand-by subsidiary classification device study module: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set Dltrn And D 'buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Stand-by subsidiary classification device identification module: pass through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence;
Comparison module: D is enabledaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Data set changes module: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal is returned and is executed Main classification device study module.
The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, constantly accumulates high quality reference numerals Learning ability can be improved jointly according to amount, while by the continuous learning to flag data, Main classification device and subsidiary classification device.
Detailed description of the invention
Fig. 1 is the flow chart of deep learning classification method of the present invention;
Fig. 2 is the flow chart of S100 in Fig. 1;
Fig. 3 is the flow chart of S200 in Fig. 1;
Fig. 4 is the structural schematic diagram of deep learning sorter of the present invention;
Fig. 5 is the structural schematic diagram of 500 modules in Fig. 4;
Fig. 6 is the structural schematic diagram of 600 modules in Fig. 4.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.
It include input data set D, deep learning classifier C the present invention relates to object0And multiple subsidiary classification devices are C1、 C2……CK
Input data set D is divided into the D of labellabelWith unlabelled Dunlabel, and meet Dlabel<<Dunlabel, wherein DlabelIt is divided into training set D againltrnWith test set Dltst, meet Dltrn:Dltst>=9:1, and maintain test set DltstIt is constant.
For DlabelIn any data (xi,yi), wherein subscript i represents data sequence number, yiFor xiAuthentic signature knot Fruit marks the collection of result to be combined into { t0,t1…tN-1, including N kind category label is as a result, meet yi ∈ { t0,t1…tN-1, { t0, t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1).
Correspondingly, DunlabelIn data be not yet marked, only xi, without marking result y accordinglyi, DunlabelIn Data can be used classifier and be identified, meet recognition result ∈ { t0,t1…tN-1}。
Multiple subsidiary classification devices are C1、C2…CK, it is except deep learning classifier C0Except other classifiers, including line Property recurrence, decision tree, supporting vector, bayes method, neural network etc..
Deep learning classification method of the invention, as shown in Figure 1, comprising: step 100: the screening of subsidiary classification device and step 200: classifier unsupervised learning step.
As shown in Fig. 2, step 100: subsidiary classification device screening step includes step 101 to step 107.
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) establish the mapping of mode input and output Relationship, subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K, table Show that subsidiary classification device is numbered).
Step 102: for test set DltstIn any data (xi, yi), yi=tn∈{t0,t1…tN-1, xiPass through hk Identification obtainsI.e.Meet
Step 103: comparing yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Rk,nFor adding up hkTo any classification tnIdentify correct number, Wk,nFor adding up hkTo any classification tnIdentify the number of mistake;Rk,nAnd Wk,n Initial value be 0.
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test It finishes.
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n)。
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K) records each stand-by subsidiary classification Its selected foundation G of devicek,nCorresponding tnFor its identification advantage classification, to any stand-by subsidiary classification device, { t1, t2… tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage classification.
For example, for subsidiary classification device C1, to the predictablity rate G of N class label1,nCollection be combined into { G1,0,G1,1, G1,2…G1,N-1};For subsidiary classification device C2, to the predictablity rate G of N class label2,nCollection be combined into { G2,0,G2,1,G2,2… G2,N-1};... for subsidiary classification device CK, to the predictablity rate G of N class labelK,nCollection be combined into { GK,0,GK,1,GK,2… GK,N-1}。
For t0,{G1,0,G2,0…GK,0In auxiliary C ' corresponding to maximum valuek,0As t0Stand-by subsidiary classification device.It is right Answer { t0,t1…tN-1In N number of classification, have N number of stand-by classifier { C 'k,0,C’k,1…C’k,N-1(k=i=1,2 ... K).It is right In { C1、C2…CK, it will be wherein not chosen as eliminating for stand-by subsidiary classification device, and do not use, the subsidiary classification not being selected Device sum marks0;Then stand-by subsidiary classification device number M=K-K0;Stand-by subsidiary classification device collection is combined into { C '1、C’2…C’M}; Each C 'mBecome independent NmMore/bis- classifiers of a classification.
It is assumed that classifier C1Complete t0The accuracy G of class1,0In all Gi,0The inside maximum G1,0=max { Gi,0(i=1, 2 ... K), and C1Have and there was only this t0Class defeats other classifiers completely, then forms C '1, C '1It is a two new classifier (Nm= 2) t can, be distinguished0With every other classification { t1,t2…tN-1}。
Or it is assumed that classifier C1Complete t0The accuracy G of class1,0In all Gi,0The inside maximum G1,0=max { Gi,0(i=1, 2 ... K), and classifier C1Complete t1The accuracy G of class1,1In all Gi,1The inside maximum G1,1=max { Gi,1(i=1,2 ... K), Then form C '1, C '1It is a new multi-categorizer (Nm=3) { t can, be distinguished0,t1Class and every other classification { t2,t3… tN-1, wherein { t0,t1It is C '1The advantage classification of identification, other classifications { t2,t3…tN-1It is C '1The disadvantage classification of identification.
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device The mapping relations of mode input and output execute classifier unsupervised learning step 200.
It is assumed that C '1It is to filter out new multi-categorizer (Nm=3) { t can, be distinguished0,t1Class and every other classification { t2, t3…tN-1, step 107, output result is divided into 3 class { t0,t1,{t2,t3…tN-1, rebuild C '1Mode input and output Mapping relations.
Because step 107 is reconstructed the mapping relations of mode input and output, model is obtained with step 101 study, even if Algorithm is identical, but the classification for exporting result makes a big difference, C 'mCorresponding C can be regarded askThe identification of advantage classification is added Identification reduction strong and to disadvantage classification.
As shown in figure 3, the step 200 in Fig. 1 may include step 201 and step 207.
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input and output Mapping relations.
Step 202:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
It step 202, can also include utilizing test set DltstFlag data test h0Accuracy of identification, be denoted as G0;When G0When lower than preset value, or the G in operation T times of step 2000It is not promoted, terminates classifier unsupervised learning step 200.
The value of T can be rule of thumb arranged, or be arranged according to the rule of the actual running results of step 200.
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all numbers According to xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.Meet And record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level(i=1,2 ... L D ') is formedbuff
Step 204: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M).
Step 205: passing through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence.
Step 206: enabling DaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is addedaddSet.
Preset value can be user rule of thumb or C0Target identification precision set, be also possible to G0, G0To utilize test Collect DltstFlag data test h0Obtained accuracy of identification.The standard of default settings is, DaddIn data precision it is big In preset value, to ensure C0Have and can promote the high-quality data fuel of its growth of constantly evolving.
Step 206 is to recycle Main classification device and stand-by subsidiary classification device votes in the identification data of optimal assurance.
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
The step of Fig. 3, every execution one was taken turns, training set DltrnJust new data D is addedadd, when thereby executing next round, main point Class device and stand-by subsidiary classification device are according to new DltrnAfter study, model can constantly evolve growth.In order to guarantee newly to be added Data DaddQuality, the application method use Main classification device and the stand-by secondary verifying of subsidiary classification device, to ensure Dadd's The quality of data;And subsidiary classification device, have been subjected to the screening of step 100, it can be ensured that its accuracy for exporting result.
The present invention is using deep learning classifier as Main classification device, using other classifiers as subsidiary classification device, auxiliary point Class device assists Main classification device to veritify flag data, to realize the automatic high quality label of data, accumulates high quality flag data amount, Learning ability can be improved by the continuous study to flag data, Main classification device and subsidiary classification device jointly simultaneously.
The present invention passes through the unsupervised learning process of positive feedback, can constantly promote the learning performance of Main classification device.
Deep learning classification method of the invention has the characteristics that: overall flow is divided into the screening of subsidiary classification device and classification 2 part of device unsupervised learning;Screening technique: auxiliary learner screening principle is that first pre-training finds out the classifier for being most suitable for certain class; Advantage classification method of determination: optimal classification device is determined by precise effect ranking using preparatory training result;Stand-by subsidiary classification Think highly of mapping: the less classification of re -training is after the selected subsidiary classification device of training to be further absorbed in a small number of feature classifications of classifying;It is main Auxiliary classifier ballot confirmation: carrying out confirming and being permanently added have label data collection for the consistent data of major-minor classifier classification, No label data collection does not abandon data mode and guarantees to be not in selection study phenomenon in learning process;Using when use Main classification Device (deep learning classifier C0) reduce complexity: the study of Main classification device device is assisted by learning, finally with Main classification device work more For application or production environment.
It is the explanation to deep learning classification method of the present invention above.
The invention also includes a kind of deep learning sorter, the principle of the deep learning classification method of the device and Fig. 1 Identical, related place can be cross-referenced.
As shown in figure 4, deep learning sorter includes: data module 400, subsidiary classification device screening module 500 and divides Class device unsupervised learning module 600.
Data module 400: including input data set D, D is by flag data collection DlabelWith Unlabeled data collection DunlabelGroup At meeting Dlabel<<Dunlabel, wherein DlabelAgain by training set DltrnWith test set DltstComposition, meets Dltrn:Dltst≥9:1; DlabelInput data xiAuthentic signature result be yi, meet yi∈{t0,t1…tN-1, { t0,t1…tN-1In any classification It is denoted as tn(n=0,1 ... N-1).
As shown in figure 5, subsidiary classification device screening module 500 includes:
Subsidiary classification device study module 501: subsidiary classification device is with training set DltrnIn data (xi, yi) to establish model defeated Enter the mapping relations with output, subsidiary classification device passes through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk (k=1,2 ... K indicate subsidiary classification device number).
Subsidiary classification device identification module 502: for test set DltstIn any data (xi, yi), yi=tn∈{t0, t1…tN-1, xiPass through hkIdentification obtainsI.e.Meet
Recognition result counting module 503: compare yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nIt counts and adds 1;Rk,nFor adding up hkTo { t0,t1…tN-1In any classification tnIdentify correct number and Wk,nFor adding up hkTo { t0, t1…tN-1In any classification tnIdentify the number of mistake;Rk,nAnd Wk,nInitial value be 0.
Judgment module 504: repeating subsidiary classification device identification module and recognition result counting module, until each hk(k=1, 2 ... K) to DltstIn each data be completed.
Accuracy rate computing module 505: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/ (Rk,n+Wk,n)。
Stand-by subsidiary classification device determining module 506: for each classification tn(n=0,1 ... N-1) selects wherein accuracy rate Highest Gk,n(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), And record its selected foundation G of each stand-by subsidiary classification devicek,nCorresponding tnFor its identification advantage classification, to it is any to With subsidiary classification device, { t1,t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identify Disadvantage classification.
Stand-by subsidiary classification device mapping relations reconstructed module 507: advantage classification based on the identification of stand-by subsidiary classification device and Disadvantage classification rebuilds the mapping relations of stand-by subsidiary classification device mode input and output, executes classifier unsupervised learning module.
As shown in fig. 6, classifier unsupervised learning module includes:
Main classification device mapping block 601: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mould The mapping relations of type input and output.
Main classification device study module 602:C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
Main classification device study module 602 can also include utilizing test set DltstFlag data test h0Identification essence Degree, is denoted as G0;Work as G0When lower than preset value, or the G in operation T times of classifier unsupervised learning module0It is not promoted, is exited Classifier unsupervised learning module.
The value of T can be rule of thumb arranged, or be set according to the rule of classifier unsupervised learning module the actual running results It sets.
Main classification device identification module 603: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According to(i=1,2 ... L) forms D 'buff
Stand-by subsidiary classification device study module 604: stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M).
Stand-by subsidiary classification device identification module 605: pass through h 'm(m=1,2 ... M) identifies D 'buffIn L data, i.e.,(i=1,2 ... L, m=1,2 ... M), and recordClassification confidence.
Comparison module 606: D is enabledaddFor empty set, for any(i=1,2 ... L, m=1,2 ... M), if metAndClassification confidence be more than or equal to preset value, then willD is addedaddSet.
Preset value can be user rule of thumb or C0Target identification precision set value, be also possible to G0, G0To utilize Test set DltstFlag data test h0Obtained accuracy of identification.
Data set changes module 607: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, return are held Row Main classification device study module 602.
In the device of the application, Main classification device accuracy computation module, further includes working as G0When lower than preset value, or dividing G in operation T times of class device unsupervised learning module0It is not promoted, exits classifier unsupervised learning module.
The foregoing is merely illustrative of the preferred embodiments of the present invention, not to limit scope of the invention, it is all Within the spirit and principle of technical solution of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this hair Within bright protection scope.

Claims (10)

1. a kind of deep learning classification method includes input data set D, the D is by flag data collection DlabelAnd Unlabeled data Collect DunlabelComposition, wherein the DlabelAgain by training set DltrnWith test set DltstComposition, the DlabelInput data xi's Authentic signature result is yi, meet yi∈{t0,t1…tN-1, { the t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1);It is characterized in that, the method includes at least subsidiary classification device screening step 100 and classifier unsupervised learning step 200;
The subsidiary classification device screening step 100 includes:
Step 101: subsidiary classification device is with training set DltrnIn data (xi, yi) mapping relations of mode input and output are established, The subsidiary classification device passes through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k=1,2 ... K are indicated The subsidiary classification device number);
Step 102: for test set DltstIn any data (xi, yi), yi=tn, the xiPass through hkIdentification obtainsI.e.Meet
Step 103: the yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;The Rk,nFor Accumulative hkTo any classification tnIdentify correct number, the Wk,nFor adding up hkTo any classification tnIdentify the number of mistake; The Rk,nAnd Wk,nInitial value be 0;
Step 104: step 102 and 103 is repeated, until each hk(k=1,2 ... K) is to DltstIn each data test Finish;
Step 105: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Step 106: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n
(k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), record are each Its selected foundation G of stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, any stand-by auxiliary is divided Class device, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class Not;
Step 107: advantage classification and disadvantage classification based on the identification of stand-by subsidiary classification device rebuild stand-by subsidiary classification device model The mapping relations output and input execute the classifier unsupervised learning step 200;
The classifier unsupervised learning step 200 includes:
Step 201: deep learning classifier C0With training set DltrnIn data (xi, yi) establish the mapping of mode input and output Relationship;
Step 202: the C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
Step 203: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn all data xi (i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd Record recognition resultClassification confidence, take the wherein highest L data of recognition result confidence level Form D 'buff
Step 204: the stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnAnd D 'buffWhat study obtained Respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Step 205: passing through the h 'm(m=1,2 ... M) identifies the D 'buffIn L data, i.e.,=h 'm(xi) (i=1, 2 ... L, m=1,2 ... M), and recordClassification confidence;
Step 206: enabling DaddFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Step 207: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal, returns to step 202.
2. classification method according to claim 1, which is characterized in that the step 202 further includes utilizing test set Dltst Flag data test h0Accuracy of identification, be denoted as G0
Preset value described in the step 207 is G0
3. according to classification method described in right 1, which is characterized in that the D meets Dlabel<<Dunlabel
4. according to classification method described in right 1, which is characterized in that the DlabelMeet Dltrn:Dltst≥9:1。
5. according to classification method described in right 2, which is characterized in that the step 202 further includes working as G0When lower than preset value, Or the G in operation T times of the step 2000It is not promoted, terminates the classifier unsupervised learning step 200.
6. a kind of deep learning sorter, which is characterized in that described device includes at least: data module, subsidiary classification device sieve Modeling block, classifier unsupervised learning module:
Data module: including input data set D, the D is by flag data collection DlabelWith Unlabeled data collection DunlabelComposition, Described in DlabelAgain by training set DltrnWith test set DltstComposition, the DlabelInput data xiAuthentic signature result be yi, meet yi∈{t0,t1…tN-1, { the t0,t1…tN-1In any classification be denoted as tn(n=0,1 ... N-1);
The subsidiary classification device screening module includes:
Subsidiary classification device study module: subsidiary classification device is with training set DltrnIn data (xi, yi) establish mode input and output Mapping relations, the subsidiary classification device pass through training set DltrnStudy obtains respective output model, i.e. Ck(Dltrn)→hk(k= 1,2 ... K indicates the subsidiary classification device number);
Subsidiary classification device identification module: for test set DltstIn any data (xi, yi), yi=tn, the xiPass through hkIdentification It obtainsI.e.Meet
Recognition result counting module: the yiWithIf consistent, make Rk,nCount is incremented, otherwise, makes Wk,nCount is incremented;Institute State Rk,nFor adding up hkTo any classification tnIdentify correct number, the Wk,nFor adding up hkTo any classification tnIdentification is wrong Number accidentally;The Rk,nAnd Wk,nInitial value be 0;
Judgment module: repeating the subsidiary classification device identification module and the recognition result counting module, until each hk(k=1, 2 ... K) to DltstIn each data be completed;
Accuracy rate computing module: computation model hkRecognition result in each classification tnAccuracy rate Gk,n=Rk,n/(Rk,n+Wk,n);
Stand-by subsidiary classification device determining module: for each classification tn(n=0,1 ... N-1) selects the wherein highest G of accuracy ratek,n (k=1,2 ... K) corresponding CkAs the stand-by subsidiary classification device of the category, it is denoted as C '1、C’2…C’M(M≤K), and record each Its selected foundation G of stand-by subsidiary classification devicek,nCorresponding tnFor the advantage classification of its identification, any stand-by auxiliary is divided Class device, { t1, t2…tN-1In removal its identification advantage classification after other classifications be the subsidiary classification device identification disadvantage class Not;
Stand-by subsidiary classification device mapping relations reconstructed module: advantage classification and disadvantage based on the stand-by subsidiary classification device identification Classification rebuilds the mapping relations of stand-by the subsidiary classification device mode input and output, executes the classifier unsupervised learning Module;
The classifier unsupervised learning module includes:
Main classification device mapping block: deep learning classifier C0With training set DltrnIn data (xi, yi) establish mode input with The mapping relations of output;
Main classification device study module: the C0Pass through training set DltrnLearning obtained model is h0, i.e. C0(Dltrn)→h0
Main classification device identification module: from unlabelled DunlabelB sample is taken out in data set constitutes Dbuff;For DbuffIn All data xi(i=1,2 ... b), passes through model h0Identify xiObtain recognition resultI.e.MeetAnd record recognition resultClassification confidence, take the wherein highest L number of recognition result confidence level According toForm D 'buff
Stand-by subsidiary classification device study module: the stand-by subsidiary classification device C '1、C’2…C’M(M≤K) passes through training set DltrnWith D’buffLearning obtained respective model is h 'm, i.e. C 'm(Dltrn+D’buff)→h’m(m=1,2 ... M);
Stand-by subsidiary classification device identification module: pass through the h 'm(m=1,2 ... M) identifies the D 'buffIn L data, i.e.,And it recordsClassification confidence;
Comparison module: D is enabledaddFor empty set, for anyIf metAndClassification confidence be greater than preset value, then willD is addedaddSet;
Data set changes module: by DaddCollection is incorporated to Dltrn, while by DaddThe x of collectioniFrom DunlabelMiddle removal returns and executes main point Class device study module.
7. according to device described in right 6, which is characterized in that the Main classification device study module further includes utilizing test set DltstFlag data test h0Accuracy of identification, be denoted as G0
Preset value described in the comparison module is G0
8. according to device described in right 6, which is characterized in that the D meets Dlabel<<Dunlabel
9. according to device described in right 6, which is characterized in that the DlabelMeet Dltrn:Dltst≥9:1。
10. according to device described in right 7, which is characterized in that the Main classification device study module further includes working as G0Lower than default When value, or the G in operation T times of the classifier unsupervised learning module0It is not promoted, exits non-supervisory of the classifier Practise module.
CN201710705331.9A 2017-08-17 2017-08-17 Deep learning classification method and device Pending CN109409390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710705331.9A CN109409390A (en) 2017-08-17 2017-08-17 Deep learning classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710705331.9A CN109409390A (en) 2017-08-17 2017-08-17 Deep learning classification method and device

Publications (1)

Publication Number Publication Date
CN109409390A true CN109409390A (en) 2019-03-01

Family

ID=65454795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710705331.9A Pending CN109409390A (en) 2017-08-17 2017-08-17 Deep learning classification method and device

Country Status (1)

Country Link
CN (1) CN109409390A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385890A (en) * 2022-03-22 2022-04-22 深圳市世纪联想广告有限公司 Internet public opinion monitoring system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385890A (en) * 2022-03-22 2022-04-22 深圳市世纪联想广告有限公司 Internet public opinion monitoring system
CN114385890B (en) * 2022-03-22 2022-05-20 深圳市世纪联想广告有限公司 Internet public opinion monitoring system

Similar Documents

Publication Publication Date Title
Stock et al. Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases
CN112214610B (en) Entity relationship joint extraction method based on span and knowledge enhancement
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN108229588B (en) Machine learning identification method based on deep learning
CN103886330B (en) Sorting technique based on semi-supervised SVM integrated study
CN109815801A (en) Face identification method and device based on deep learning
CN103679160B (en) Human-face identifying method and device
CN103838744B (en) A kind of method and device of query word demand analysis
CN104239858A (en) Method and device for verifying facial features
CN107169086B (en) Text classification method
CN107169485A (en) A kind of method for identifying mathematical formula and device
CN103559504A (en) Image target category identification method and device
CN104834941A (en) Offline handwriting recognition method of sparse autoencoder based on computer input
CN109934203A (en) A kind of cost-sensitive increment type face identification method based on comentropy selection
CN104156690B (en) A kind of gesture identification method based on image space pyramid feature bag
CN108596163A (en) A kind of Coal-rock identification method based on CNN and VLAD
CN106845358A (en) A kind of method and system of handwritten character characteristics of image identification
CN109213853A (en) A kind of Chinese community&#39;s question and answer cross-module state search method based on CCA algorithm
CN107145514A (en) Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN104376308B (en) A kind of human motion recognition method based on multi-task learning
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
CN109656808A (en) A kind of Software Defects Predict Methods based on hybrid active learning strategies
CN108345942B (en) Machine learning identification method based on embedded code learning
CN108229692B (en) Machine learning identification method based on dual contrast learning
CN110414622A (en) Classifier training method and device based on semi-supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190301