CN110020675A

CN110020675A - A kind of dual threshold AdaBoost classification method

Info

Publication number: CN110020675A
Application number: CN201910196149.4A
Authority: CN
Inventors: 张梦娇; 叶庆卫
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2019-07-16

Abstract

The invention discloses a kind of dual threshold AdaBoost classification methods, its one genetic sample matrix of first random initializtion, obtain the assessed value of every chromosome in genetic sample matrix, the small individual of the low i.e. assessed value of adaptive response is eliminated further according to assessed value, replicate fitness height, that is, big individual of assessed value, retain wherein one in the high chromosome of adaptive response, remaining chromosome is subjected to intersection recombination, variation, obtain new genetic sample matrix, then new genetic sample matrix is assessed, intersect, variation, it is iterated according to this process to obtain optimal solution, obtain the best Weak Classifier of each iteration；According to the best strong classifier of each iteration of weight calculation of the best Weak Classifier of each iteration；If best strong classifier is identical as sample labeling matrix, corresponding best strong classifier is determined as final strong classifier；Advantage is that it reduces the time for finding threshold value, the speed for finding threshold value is improved, to reduce computation complexity.

Description

A kind of dual threshold AdaBoost classification method

Technical field

The present invention relates to a kind of classification methods, more particularly, to a kind of dual threshold AdaBoost classification method.

Background technique

Nineteen ninety, Schapire propose Boosting algorithm, are a kind of important integrated study technologies, can will divide The Weak Classifier of class effect difference synthesizes the preferable strong classifier of classifying quality, however, Boosting algorithm there is no solve how Adjusting training collection and how by Weak Classifier synthesize strong classifier two large problems.Nineteen ninety-five, Freund and Schapire are proposed AdaBoost algorithm is a kind of iterative algorithm, reduces error rate by adaptive learning, and which employs Nearest Neighbor with Weighted Voting machines System synthesis strong classifier, has very high precision, can be used for solving classification problem and regression problem, and current research and answers With being concentrated mainly in classification problem, it is also the most actually to answer that wherein hand-written script identification is AdaBoost algorithm earliest With one of, secondly AdaBoost algorithm speech recognition, recognition of face, in terms of also have critically important application.

Traditional AdaBoost algorithm uses a large amount of single threshold Weak Classifier, this undoubtedly increases the instruction of Weak Classifier Practice the time.In order to reduce the training time, Wang Haichuan et al. proposes a kind of dual threshold Weak Classifier and divides by force to construct AdaBoost The structure of class device, the structure allow to reach phase using less Weak Classifier using dual threshold type AdaBoost classifier Same classifying quality.Currently, dual threshold type AdaBoost algorithm uses enumerative technique mainly to find threshold value, however it is this using piece The method that act method finds threshold value increases the time for finding threshold value, and hunting speed is slow, leads to dual threshold type AdaBoost algorithm Computation complexity is high.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of dual threshold AdaBoost classification methods, and which reduce searchings The time of threshold value improves the speed for finding threshold value, to reduce computation complexity.

The technical scheme of the invention to solve the technical problem is: a kind of dual threshold AdaBoost classification method, Characterized by the following steps:

Step 1: giving a sample, be denoted as Y in the matrix form,And it gives Determine the corresponding sample labeling matrix of Y, be denoted as G,K is enabled to indicate strong point in dual threshold AdaBoost classification method The number of iterations of class device optimizing；Wherein, the dimension of Y is N × M, and M indicates the total number for the feature for including in Y, being often classified as in Y One feature, M > 1, m are positive integer, and the initial value of m is 1, and 1≤m≤M, N indicate the sample for including in each feature in Y The total number of value, the value of each element in each column in Y are a sample value, and N > 1, n are positive integer, and the initial value of n is 1, The dimension of 1≤n≤N, G are N × 1, y_1,1、y_1,n、 y_1,NCorresponding the 1st sample value, n-th indicated in the 1st feature in Y Sample value, n-th sample value, y_m,1、 y_m,n、y_m,NCorresponding the 1st sample value, n-th of the sample indicated in m-th of feature in Y This value, n-th sample value, y_M,1、y_M,n、y_M,NCorresponding the 1st sample value, n-th of the sample indicated in the m-th feature in Y Value, n-th sample value, g₁Indicate the classification marker of all sample values in the 1st row in Y, g_nIndicate the institute in the line n in Y There are the classification marker of sample value, g_NIndicate the classification marker of all sample values in the Nth row in Y, g₁、g_nAnd g_NValue be 1 Or -1, k is positive integer, the initial value of k is 1；

Step 2: in kth time iterative process, optimizing is carried out to Weak Classifier using genetic algorithm, obtains kth time iteration Best Weak Classifier, detailed process are as follows:

Step 2_1: S is enabled to indicate genetic sample matrix；Then random initializtion is carried out to S, madeWherein, the dimension of S is N' × M', and N' indicates the total item for the chromosome for including in S It counts, every behavior item chromosome in S, N' >=1, n' are positive integer, and the initial value of n' is 1, and 1≤n'≤N', M' are indicated in S Every chromosome in include gene total number, the value of each element in every row in S is a gene, M' >=4, m' For positive integer, the initial value of m' is 1,1≤m'≤M', s_1,1、s_1,m'、s_1,M'Corresponding the 1st indicated in the 1st article of chromosome in S A gene, the m' gene, the M' gene, s_n',1、s_n',m'、s_n',M'Corresponding the 1st indicated in n-th ' article of chromosome in S Gene, the m' gene, the M' gene, s_N',1、s_N',m'、 s_N',M'Corresponding the 1st indicated in the N' articles chromosome in S Gene, the m' gene, the M' gene, the value of above-mentioned each gene are 0 or 1；

Step 2_2: being assessed to obtain corresponding assessed value to every chromosome in S, and the n-th ' item in S is dyed Body, evaluation process are as follows:

Step 2_2a: all genes in n-th ' chromosome are sequentially formed into the binary string that a length is M'；So Afterwards from left to right, using the 1st to α of binary string as first group, α+1 to alpha+beta position as second group, α+ β+1 to alpha+beta+γ as third group, last position as the 4th group；Further according to first group of calculating feature number, it is denoted as F_n'If first group of decimal number being converted into as binary number is 0, F is enabled_n'=1；If first group is used as binary number When the decimal number being converted into is not 0, then enableWherein, alpha+beta+γ+1=M', α >=1, β >=1, γ >=1, F_n',10Indicate first group of decimal number being converted into as binary number, symbolFor the oeprator that rounds up, F_n'∈[1,M]；

Step 2_2b: according to feature number F_n', F is extracted from Y_n'All sample values in a feature, and as spy Assemble-publish F_n'Under sample value；

Step 2_2c: determine that first threshold, second threshold and classification direction, correspondence are denoted as T1_n'、T2_n'And P_n',P is enabled if being 0 if the 4th group_n'=-1, if the 4th group is 1 Then enable P_n'=1；Then according to first threshold T1_n', second threshold T2_n'And classification direction P_n', and use dual threshold type AdaBoost Classifier is to feature number F_n'Under all sample values classify, work as P_n'By feature number F when=1_n'Under all sample values In between first threshold T1_n'With second threshold T2_n'Between all sample values labeled as 1 and by remaining sample value labeled as- 1；Conversely, working as P_n'By feature number F when=- 1_n'Under all sample values between first threshold T1_n'With second threshold T2_n'It Between all sample values labeled as -1 and by remaining sample value be labeled as 1；Then by feature number F_n'Under all sample values Composition and classification label vector is marked, H is denoted as_n',Wherein, T1_n',10Indicate that second group is converted as binary number At decimal number, y_n',maxIndicate feature number F_n'Under all sample values in maximum sample value, T2_n',10Indicate third group As the decimal number that binary number is converted into, H_n'Dimension be N × 1, h_n',1Indicate feature number F_n'Under the 1st sample The label of value, h_n',nIndicate feature number F_n'Under n-th of sample value label, h_n',NIndicate feature number F_n'Under n-th The label of sample value, h_n',1、h_n',n、h_n',NValue be 1 or -1；

Step 2_2d: it enablesIt indicates in kth time iterative processCorresponding weight vector, As k=1Then ordered pair is pressedIn each label with In each classification marker compared one by one,In extract it is all withIn corresponding classification Different label is marked, and counts different number, is denoted as N_dif；Then according toCalculate the n-th ' item The error rate of chromosome, is denoted as E_n', E_n'Value be equal to N_difThe corresponding weights sum of a different label；Again by E_n'As The assessed value of n' chromosome；Wherein,Dimension be N × 1,Indicate h_n',1Corresponding weight,Indicate h_n',nIt is right The weight answered,Indicate h_n',NCorresponding weight；

Step 2_3: all smallest evaluation values and all maximum assessments are found out from the assessed value of all chromosomes in S Value appoints and a smallest evaluation value is taken to be denoted as E_min, and appoint and a maximum assessed value is taken to be denoted as E_max；Then by E_minCorresponding dyeing Body replaces E_maxCorresponding chromosome obtains new genetic sample matrix, is denoted as S_new；Wherein, S_newDimension be N' × M'；

Step 2_4: from S_newIn all chromosomes assessed value in find out all smallest evaluation values, retain one of them Smallest evaluation is worth corresponding chromosome；Then to S_newIn remaining N'-1 chromosome carry out intersection recombination, obtain intersect recombination Genetic sample matrix afterwards, is denoted as S'_new；

Step 2_5: from S'_newIn select at randomChromosome, and the variation for setting selected all chromosomes is general Rate is pc；Then at random in every selected chromosomeA gene is changed to obtain new genetic sample square Battle array, is denoted as S "_new；Wherein, [10,30] q ∈, symbolFor downward rounding operation symbol, pc ∈ [0.1,0.3]；

Step 2_6: S=S " is enabled_new, then return step 2_2 is repeated, and executes step 2_ after repeating altogether K times 7；Wherein, S=S "_newIn "=" be assignment, K is positive integer, K ∈ [5,20]；

Step 2_7: the E that will be obtained in the kth implementation procedure of kth time iteration_minFeature corresponding to corresponding chromosome The best features number as kth time iteration is numbered, is denoted asThe E that will be obtained in the kth implementation procedure of kth time iteration_min Best first threshold of the first threshold corresponding to corresponding chromosome as kth time iteration, is denoted asBy kth time iteration Kth implementation procedure in the E that obtains_minSecond threshold corresponding to corresponding chromosome as kth time iteration best second Threshold value is denoted asThe E that will be obtained in the kth implementation procedure of kth time iteration_minClassification corresponding to corresponding chromosome Optimal classification direction of the direction as kth time iteration, is denoted asIt willUnder the labels of all sample values constitute kth time repeatedly The optimal classification label vector in generation, is denoted asThen will The structure sequentially formed is as The best Weak Classifier of k iteration, and the E that will be obtained in the kth implementation procedure of kth time iteration_minAs kth time iteration Best error rates of weak classifiers, are denoted as E again^(k)；Wherein,Dimension be N × 1；

Step 3: the weight of the best Weak Classifier of kth time iteration is calculated, is denoted as So The best strong classifier for calculating kth time iteration afterwards, is denoted as Wherein, sign () is Sign function,K' is positive integer, and the initial value of k' is 1,Table Show the weight of the best Weak Classifier of kth ' secondary iteration,Indicate the optimal classification label vector of kth ' secondary iteration；

Step 4: rightIn each element value withIn each classification marker compared one by one, It, will if all identicalIt is determined as final strong classifier, and terminates iterative process；It is not identical if it exists, then execute step Rapid five；

Step 5: all samples under feature number corresponding to every chromosome in+1 iterative process of kth in S are calculated The corresponding weight vector of classification marker vector that the label of this value is constituted, the n-th ' item in+1 iterative process of kth in S is dyed The corresponding weight vector of classification marker vector that the label of all sample values under feature number corresponding to body is constituted is denoted as Then k=is enabled K+1 returns again to step 2 and continues to execute；Wherein,It is corresponding to indicateIn the 1st label, n-th A label, n-th mark, and "=" in k=k+1 is assignment.

In the step 2_4, to S_newIn remaining N'-1 chromosome carry out intersect recombination detailed process are as follows:

Step 2_4a: from S_newIn arbitrarily choose two chromosomes in remaining all chromosomes；

Step 2_4b: the rear M'- ζ gene of the preceding ζ gene of wherein item chromosome and another item chromosome is formed One new chromosome；To the remaining rear M'- ζ gene of wherein item chromosome and the remaining preceding ζ base of another item chromosome Because forming another new chromosome；Wherein, ζ is the positive integer being randomly generated, ζ ∈ [1, M')；

Step 2_4c: two new chromosomes are replaced to two chromosomes of selection；

Step 2_4d: according to step 2_4a to the process of step 2_4c, to S_newIn untreated chromosome handled, Until S_newIn all chromosomes be disposed or only be left item chromosome it is untreated until.

In the step 2_5, at random in every selected chromosomeThe tool that a gene is changed Body process are as follows: 1 is changed by the value of the gene if the value of the gene is 0 for the gene chosen；If the value of the gene is 1, then the value of the gene is changed into 0.

Compared with the prior art, the advantages of the present invention are as follows:

1) the method for the present invention needs in terms of finding Weak Classifier longer mainly for dual threshold type AdaBoost classifier Time is improved, and the time for finding Weak Classifier is improved, and the method for the present invention passes through the genetic sample square of random initializtion Battle array in every chromosome coding mode come decode to obtain feature number, first threshold, second threshold and classification direction, then Obtained feature number, first threshold, second threshold and classification direction are obtained by dual threshold type AdaBoost classifier Assessed value, and according to the size of assessed value, it is constantly iterated, obtains best features number, best first threshold, best Two threshold values and optimal classification direction, to obtain best Weak Classifier, the mode of this heredity optimizing, which reduces, finds the first threshold The time of value and second threshold improves the speed for finding first threshold and second threshold, to also accelerate searching weak typing The speed of device.

2) method for finding best Weak Classifier in the method for the present invention using hereditary optimizing mostlys come from genetic algorithm, It is assessed by every chromosome in the genetic sample matrix to random initializtion, obtains its assessed value, assessed value is Adaptive response eliminates the small individual of the low i.e. assessed value of adaptive response, fitness height, that is, big individual of assessed value is replicated, adaptive It spends in high chromosome and retains wherein one, remaining chromosome is subjected to intersection recombination, variation, obtains new genetic sample square Battle array, then the genetic sample matrix newly obtained is assessed, intersects, make a variation, it is iterated according to this process to obtain optimal solution, Obtain best Weak Classifier.

3) enumerative technique is used mainly in original dual threshold type AdaBoost classification method to find threshold value, but such as Fruit finds threshold value using enumerative technique and needs very big calculation amount there are when a data up to ten thousand, and the heredity in the method for the present invention is calculated The method main analog evolutionary process of nature, remains the high chromosome of fitness, has eliminated the low chromosome of fitness, It is continuous to intersect recombination, make a variation, obtain optimal solution during update, to reduce computation complexity.

Detailed description of the invention

Fig. 1 is the flow diagram of the method for the present invention；

Fig. 2 is the structural schematic diagram of the hereditary optimizing coding structure in the method for the present invention.

Specific embodiment

The present invention will be described in further detail below with reference to the embodiments of the drawings.

A kind of dual threshold AdaBoost classification method proposed by the present invention, flow diagram are as shown in Figure 1 comprising following Step:

Step 1: giving a sample, be denoted as Y in the matrix form,And it gives Determine the corresponding sample labeling matrix of Y, be denoted as G,K is enabled to indicate strong point in dual threshold AdaBoost classification method The number of iterations of class device optimizing；Wherein, the dimension of Y is N × M, and M indicates the total number for the feature for including in Y, being often classified as in Y One feature, M > 1, m are positive integer, and the initial value of m is 1, and 1≤m≤M, N indicate the sample value for including in each feature in Y Total number, the value of each element in each column in Y is a sample value, and N > 1, n are positive integer, and the initial value of n is 1,1 The dimension of≤n≤N, G are N × 1, y_1,1、y_1,n、 y_1,NCorresponding the 1st sample value, n-th indicated in the 1st feature in Y Sample value, n-th sample value, y_m,1、 y_m,n、y_m,NCorresponding the 1st sample value, n-th of the sample indicated in m-th of feature in Y This value, n-th sample value, y_M,1、y_M,n、y_M,NCorresponding the 1st sample value, n-th of the sample indicated in the m-th feature in Y Value, n-th sample value, g₁Indicate the classification marker of all sample values in the 1st row in Y, g_nIndicate the institute in the line n in Y There are the classification marker of sample value, g_NIndicate the classification marker of all sample values in the Nth row in Y, g₁、g_nAnd g_NValue be 1 Or -1, k is positive integer, the initial value of k is 1.

Step 2_1: S is enabled to indicate genetic sample matrix；Then random initializtion is carried out to S, madeWherein, the dimension of S is N' × M', and N' indicates the total item for the chromosome for including in S It counts, every behavior item chromosome in S, N' >=1, n' are positive integer, and the initial value of n' is 1, and 1≤n'≤N', M' are indicated in S The value of the total number for the gene for including in every chromosome, each element in every row in S is a gene, and M' >=4, m' are Positive integer, the initial value of m' are 1,1≤m'≤M', s_1,1、s_1,m'、s_1,M'Corresponding the 1st indicated in the 1st article of chromosome in S Gene, the m' gene, the M' gene, s_n',1、s_n',m'、s_n',M'Corresponding the 1st base indicated in n-th ' article of chromosome in S Cause, the m' gene, the M' gene, s_N',1、s_N',m'、 s_N',M'Corresponding the 1st base indicated in the N' articles chromosome in S Cause, the m' gene, the M' gene, the value of above-mentioned each gene are 0 or 1.

Step 2_2a: all genes in n-th ' chromosome are sequentially formed into the binary string that a length is M'；So Afterwards from left to right, using the 1st to α of binary string as first group, α+1 to alpha+beta position as second group, α+ β+1 to alpha+beta+γ is used as third group, last position as the 4th group, hereditary optimizing coding structure is formed, such as Fig. 2 institute Show；Further according to first group of calculating feature number, it is denoted as F_n'If first group of decimal number being converted into as binary number is 0, Then enable F_n'=1；If first group of decimal number being converted into as binary number is not 0, enable Wherein, alpha+beta+γ+1=M', α >=1, β >=1, γ >=1, F_n',10Indicate first group of decimal number being converted into as binary number, SymbolFor the oeprator that rounds up, F_n'∈[1,M]。

Step 2_2b: according to feature number F_n', F is extracted from Y_n'All sample values in a feature, and as spy Assemble-publish F_n'Under sample value.

Step 2_2c: determine that first threshold, second threshold and classification direction, correspondence are denoted as T1_n'、T2_n'And P_n',P is enabled if being 0 if the 4th group_n'=-1, if the 4th group is 1 Then enable P_n'=1；Then according to first threshold T1_n', second threshold T2_n'And classification direction P_n', and use dual threshold type AdaBoost Classifier is to feature number F_n'Under all sample values classify, work as P_n'By feature number F when=1_n'Under all sample values In between first threshold T1_n'With second threshold T2_n'Between all sample values labeled as 1 and by remaining sample value labeled as- 1；Conversely, working as P_n'By feature number F when=- 1_n'Under all sample values between first threshold T1_n'With second threshold T2_n'It Between all sample values labeled as -1 and by remaining sample value be labeled as 1；Then by feature number F_n'Under all sample values Composition and classification label vector is marked, H is denoted as_n',Wherein, T1_n',10Indicate that second group is converted as binary number At decimal number, y_n',maxIndicate feature number F_n'Under all sample values in maximum sample value, T2_n',10Indicate third group As the decimal number that binary number is converted into, H_n'Dimension be N × 1, h_n',1Indicate feature number F_n'Under the 1st sample The label of value, h_n',nIndicate feature number F_n'Under n-th of sample value label, h_n',NIndicate feature number F_n'Under n-th The label of sample value, h_n',1、h_n',n、h_n',NValue be 1 or -1.

Step 2_2d: it enablesIt indicates in kth time iterative processCorresponding weight vector, As k=1Then ordered pair is pressedIn each label with In each classification marker compared one by one,In extract it is all withIn corresponding classification Different label is marked, and counts different number, is denoted as N_dif；Then according toCalculate the n-th ' item The error rate of chromosome, is denoted as E_n', E_n'Value be equal to N_difThe corresponding weights sum of a different label, error rate more it is low then Indicate that the fitness is better；Again by E_n'Assessed value as n-th ' chromosome；Wherein,Dimension be N × 1,It indicates h_n',1Corresponding weight,Indicate h_n',nCorresponding weight,Indicate h_n',NCorresponding weight, if only h_n',1With g₁No Together, and h_n',NWith g_NDifference, then E_n'Value be equal toWithSum.

Step 2_3: all smallest evaluation values and all maximum assessments are found out from the assessed value of all chromosomes in S Value appoints and a smallest evaluation value is taken to be denoted as E_min, and appoint and a maximum assessed value is taken to be denoted as E_max；Then by E_minCorresponding dyeing Body replaces E_maxCorresponding chromosome obtains new genetic sample matrix, is denoted as S_new；Wherein, S_newDimension be N' × M'.

Step 2_4: from S_newIn all chromosomes assessed value in find out all smallest evaluation values, retain one of them Smallest evaluation is worth corresponding chromosome；Then to S_newIn remaining N'-1 chromosome carry out intersection recombination, obtain intersect recombination Genetic sample matrix afterwards, is denoted as S'_new。

In this particular embodiment, in step 2_4, to S_newIn remaining N'-1 chromosome carry out intersect recombination tool Body process are as follows:

Step 2_4a: from S_newIn arbitrarily choose two chromosomes in remaining all chromosomes.

Step 2_4b: the rear M'- ζ gene of the preceding ζ gene of wherein item chromosome and another item chromosome is formed One new chromosome；To the remaining rear M'- ζ gene of wherein item chromosome and the remaining preceding ζ base of another item chromosome Because forming another new chromosome；Wherein, ζ is the positive integer being randomly generated, ζ ∈ [1, M').

Step 2_4c: two new chromosomes are replaced to two chromosomes of selection.

Step 2_5: from S'_newIn select at randomChromosome, and the variation for setting selected all chromosomes is general Rate is pc；Then at random in every selected chromosomeA gene is changed to obtain new genetic sample Matrix is denoted as S "_new；Wherein, [10,30] q ∈, symbolFor downward rounding operation symbol, pc ∈ [0.1,0.3].

In this particular embodiment, in step 2_5, at random in every selected chromosomeA gene The detailed process being changed are as follows: 1 is changed by the value of the gene if the value of the gene is 0 for the gene chosen；If The value of the gene is 1, then the value of the gene is changed into 0.

Step 2_6: S=S " is enabled_new, then return step 2_2 is repeated, and executes step 2_ after repeating altogether K times 7；Wherein, S=S "_newIn "=" be assignment, K is positive integer, K ∈ [5,20].

Step 2_7: the E that will be obtained in the kth implementation procedure of kth time iteration_minFeature corresponding to corresponding chromosome The best features number as kth time iteration is numbered, is denoted asBy what is obtained in the kth implementation procedure of kth time iteration E_minBest first threshold of the first threshold corresponding to corresponding chromosome as kth time iteration, is denoted asRepeatedly by kth time The E obtained in the kth implementation procedure in generation_minSecond threshold corresponding to corresponding chromosome as kth time iteration best the Two threshold values, are denoted asThe E that will be obtained in the kth implementation procedure of kth time iteration_minDivide corresponding to corresponding chromosome Optimal classification direction of the class direction as kth time iteration, is denoted asIt willUnder the labels of all sample values constitute kth time The optimal classification label vector of iteration, is denoted asThen will The structure conduct sequentially formed The best Weak Classifier of kth time iteration, and the E that will be obtained in the kth implementation procedure of kth time iteration_minAs kth time iteration Best error rates of weak classifiers, be denoted as E again^(k)；Wherein,Dimension be N × 1.

Step 3: the weight of the best Weak Classifier of kth time iteration is calculated, is denoted as So The best strong classifier for calculating kth time iteration afterwards, is denoted as Wherein, sign () is Sign function,K' is positive integer, and the initial value of k' is 1,Table Show the weight of the best Weak Classifier of kth ' secondary iteration,Indicate the optimal classification label vector of kth ' secondary iteration.

Step 4: rightIn each element value withIn each classification marker compared one by one, It, will if all identicalIt is determined as final strong classifier, and terminates iterative process；It is not identical if it exists, then execute step Rapid five.

Step 5: all samples under feature number corresponding to every chromosome in+1 iterative process of kth in S are calculated The corresponding weight vector of classification marker vector that the label of this value is constituted, the n-th ' item in+1 iterative process of kth in S is dyed The corresponding weight vector of classification marker vector that the label of all sample values under feature number corresponding to body is constituted is denoted as Then k=k is enabled + 1, it returns again to step 2 and continues to execute；Wherein,It is corresponding to indicateIn the 1st label, n-th A label, n-th mark, and "=" in k=k+1 is assignment.

For the feasibility and validity for verifying the method for the present invention, the method for the present invention is tested.

Here, being verified with specific data.

Step 1: it enablesIt enablesK is enabled to indicate in dual threshold AdaBoost classification method The number of iterations of strong classifier optimizing；Wherein, the dimension of Y is N × M=5 × 3, and M=3 indicates total of the feature for including in Y It counts, is often classified as a feature in Y, N=5 indicates the total number for the sample value for including in each feature in Y, each column in Y In the value of each element be a sample value, the dimension of G is N × 1=5 × 1.

Step 2_1: S is enabled to indicate genetic sample matrix；Then random initializtion is carried out to S, madeWherein, the dimension of S is N' × M'=5 × 11, and N'=5 is indicated in S The total number for the chromosome for including, every behavior item chromosome in S, M'=11 indicate include in every chromosome in S The total number of gene, the value of each element in every row in S are a gene.

Step 2_2: being assessed to obtain corresponding assessed value to every chromosome in S, for n-th '=1 in S Chromosome, evaluation process are as follows:

Step 2_2a: by all genes in n-th '=1 chromosome sequentially form a length be M'=11 two into System string 10001001010；Then from left to right, first group, α+1=3 is used as by the 1st of binary string to α=2 To alpha+beta=6 as second group, alpha+beta+1=7 to alpha+beta+γ=10 conduct third group, last position as the Four groups；Further according to first group of calculating feature number, it is denoted as F₁If first group of decimal number being converted into as binary number is 0 When, then enable F₁=1；If first group of decimal number being converted into as binary number is not 0, enable Preceding 2 binary strings 10 of the 1st article of corresponding binary string of chromosome from left to right are taken at this time, since binary number 10 is converted It is F at decimal number_1,10=2, therefore feature number

Step 2_2b: according to feature number F₁=2, from all sample values extracted in Y in the 2nd feature, and conduct Feature number F₁Sample value under=2, sample value are (5 216 8).

Step 2_2c: the 3rd to the 6th binary system of the 1st article of corresponding binary string of chromosome from left to right is taken at this time String 0010, changing into decimal number for 0010 is 2, therefore first thresholdTake the 1st article of chromosome corresponding The 7th to the 10th binary string 0101 of binary string from left to right, changing into decimal number for 0101 is 5, therefore the second threshold ValueLast bit of the 1st article of corresponding binary string of chromosome from left to right is 0, because This enables P₁=-1.To feature number F₁All sample values (5 216 8) under=2 are classified, will be between first threshold T1₁ =1 and second threshold T2₁All sample values between=2.5 are labeled as -1, and remaining sample value is labeled as 1, then will be special Assemble-publish F₁The label composition and classification label vector of all sample values under=2, is denoted as H₁,

Step 2_2d: W is enabled₁ ⁽¹⁾Indicate H in the 1st iterative process₁Corresponding weight vector,Work as k=1 WhenThen ordered pair is pressedIn each label withIn Each classification marker is compared one by one,In extract it is all withIn corresponding classification marker not Identical label, and different number is counted, it is denoted as N_dif=3；Then according toCalculate the 1st article of chromosome Error rate, be denoted as E₁, E₁Value be equal to N_dif=3 corresponding weights sums of different label are 0.6, error rate more it is low then Indicate that the fitness is better；Again by E₁Assessed value as the 1st article of chromosome.

Obtain the assessed value of every chromosome in S according to the above process, the column that the assessed value of all chromosomes is constituted to Amount is

Step 2_3: all smallest evaluation values and all maximum assessments are found out from the assessed value of all chromosomes in S Value appoints and a smallest evaluation value is taken to be denoted as E_min=0, and appoint and a maximum assessed value is taken to be denoted as E_max=0.6；Then by E_min= 0 corresponding chromosome (0 101011111 0) replaces E_max=0.6 corresponding chromosome (1 000100 1 01 0), obtains new genetic sample matrix, is denoted asWherein, S_newDimension be N' × M'=5 × 11.

Step 2_4: from S_newIn all chromosomes assessed value in find out all smallest evaluation values, retain one of minimum and comment The corresponding chromosome of valuation (01 01011111 0)；Then to S_newIn remaining N'-1=5-1=4 chromosome handed over Fork recombination is obtained intersecting the genetic sample matrix after recombination, is denoted as

In step 2_4, to S_newIn remaining N'-1=5-1=4 chromosome carry out intersect recombination detailed process Are as follows:

Step 2_4a: from S_newIn arbitrarily choose two chromosomes in remaining all chromosomes, such as chromosome (1 10 0011011 0) with chromosome (0 100011010 1)；

Step 2_4b: by rear M'- ζ=11-5 of preceding ζ=5 gene of wherein item chromosome and another item chromosome =6 genes form a new chromosome；By preceding 6 genes of wherein item chromosome and latter 5 of another item chromosome Gene forms another new chromosome.

As by rear the 6 of preceding 5 genes of (1 100011011 0) and (0 100011010 1) A gene forms a new chromosome (1 100011010 1), by (0 100011010 1) Preceding 5 genes and rear 6 genes of (1 100011011 0) form another new chromosome (0 1000 1 1 0 1 1 0)。

Step 2_4c: two new chromosomes are replaced to two chromosomes of selection.

Step 2_5: from S_newIn select at randomChromosome, and set the change of selected all chromosomes Different probability is pc=0.1；Then at random in every selected chromosomeA gene is changed to obtain new Genetic sample matrix, is denoted as

In step 2_5, at random in every selected chromosomeThe specific mistake that a gene is changed Journey are as follows: 1 is changed by the value of the gene if the value of the gene is 0 for the gene chosen；If the value of the gene is 1, The value of the gene is changed into 0.

Step 2_6: S=S " is enabled_new, then return step 2_2 is repeated, and executes step after repeating altogether K=10 times 2_7；Wherein, S=S "_newIn "=" be assignment.

Step 2_7: the E that will be obtained in the K=10 times implementation procedure of kth=1 time iteration_min=0 corresponding chromosome institute Corresponding feature number is numbered as the best features of kth=1 time iteration, is denoted asBy the K=10 of kth=1 time iteration The E obtained in secondary implementation procedure_minFirst threshold corresponding to=0 corresponding chromosome as kth=1 time iteration best One threshold value, is denoted asThe E that will be obtained in the K=10 times implementation procedure of kth=1 time iteration_min=0 corresponding chromosome Best second threshold of the corresponding second threshold as kth=1 time iteration, is denoted asBy the K of kth=1 time iteration The E obtained in=10 implementation procedures_minClassification direction corresponding to=0 corresponding chromosome is divided as the best of kth time iteration Class direction, is denoted asIt willUnder the labels of all sample values constitute the optimal classification label vector of kth time iteration, be denoted asThen willBest weak typing of the structure sequentially formed as kth=1 time iteration Device, and the E that will be obtained in the kth implementation procedure of kth=1 time iteration_minBest Weak Classifier as kth=1 time iteration Error rate is denoted as E again⁽¹⁾。

Step 3: the weight of the best Weak Classifier of kth=1 time iteration is calculated, is denoted as Inf It is expressed as infinity.Then the best strong classifier for calculating kth=1 time iteration, is denoted as

Step 4: rightIn each element value withIn each classification marker compared one by one, if It is all identical, then willIt is determined as final strong classifier, and terminates iterative process；It is not identical if it exists, it thens follow the steps Five；At this time due toWithIt is all identical, therefore obtain final strong classifier, it is assumed that there are it is identical, Then follow the steps five.

Step 5: all samples under feature number corresponding to every chromosome in+1 iterative process of kth in S are calculated The corresponding weight vector of classification marker vector that the label of this value is constituted, the n-th ' item in+1 iterative process of kth in S is dyed The corresponding weight vector of classification marker vector that the label of all sample values under feature number corresponding to body is constituted is denoted as Then k=k is enabled + 1, it returns again to step 2 and continues to execute；Wherein,It is corresponding to indicateIn the 1st label, n-th Label, n-th mark, and "=" in k=k+1 is assignment.

Part of detecting:

Utilize final strong classifier obtained aboveIt carries out Test.The strong classifier has 1 best Weak Classifier, and the weight of the best Weak Classifier isIt is corresponding best Feature numberBest first thresholdBest second thresholdOptimal classification direction

Give first group of test sample matrixAnd the given corresponding sample of first group of test sample matrix This label matrixClass test is carried out to first group of test sample matrix using strong classifier obtained above.It will First group of test sample matrix substitutes into best Weak Classifier according to best features number, best first threshold, best second threshold Value and optimal classification direction obtain optimal classification label vectorOptimal classification label vector is multiplied into respective weights, Gone out by sign function predictionIt willWith sample labeling matrixIt is compared, finds Classification entirely accurate.

Give second group of test sample matrixAnd it is corresponding to give second group of test sample matrix Sample labeling matrixClass test is carried out to second group of test sample matrix using strong classifier obtained above. Second group of test sample matrix is substituted into best Weak Classifier according to best features number, best first threshold, best second Threshold value and optimal classification direction obtain optimal classification label vectorOptimal classification label vector is multiplied into corresponding power Weight, is gone out by sign function predictionIt willWith sample labeling matrixIt is compared, sends out Now classify entirely accurate.

Given third group test sample matrixAnd it is corresponding to give third group test sample matrix Sample labeling matrixClass test is carried out to third group test sample matrix using strong classifier obtained above. Third group test sample matrix is substituted into best Weak Classifier according to best features number, best first threshold, best second Threshold value and optimal classification direction obtain optimal classification label vectorOptimal classification label vector is multiplied into corresponding power Weight, is gone out by sign function predictionIt willWith sample labeling matrixIt is compared, sends out Now classify entirely accurate.

By testing above-mentioned 3 groups of test sample matrixes, the equal entirely accurate of classification results has absolutely proved the present invention The feasibility and validity of method.

Claims

1. a kind of dual threshold AdaBoost classification method, it is characterised in that the following steps are included:

Step 1: giving a sample, be denoted as Y in the matrix form,And it is Y pairs given The sample labeling matrix answered, is denoted as G,K is enabled to indicate that the strong classifier in dual threshold AdaBoost classification method is sought Excellent the number of iterations；Wherein, the dimension of Y is N × M, and M indicates the total number for the feature for including in Y, is often classified as a spy in Y Sign, M > 1, m are positive integer, and the initial value of m is 1, and 1≤m≤M, N indicate total of the sample value for including in each feature in Y Number, the value of each element in each column in Y are a sample value, and N > 1, n are positive integer, and the initial value of n is 1,1≤n≤N, The dimension of G is N × 1, y_1,1、y_1,n、y_1,NCorresponding the 1st sample value indicated in the 1st feature in Y, n-th of sample value, the N number of sample value, y_m,1、y_m,n、y_m,NCorresponding the 1st sample value, n-th of the sample value, n-th indicated in m-th of feature in Y Sample value, y_M,1、y_M,n、y_M,NCorresponding the 1st sample value, n-th of sample value, the n-th sample indicated in the m-th feature in Y Value, g₁Indicate the classification marker of all sample values in the 1st row in Y, g_nIndicate point of all sample values in the line n in Y Class label, g_NIndicate the classification marker of all sample values in the Nth row in Y, g₁、g_nAnd g_NValue be 1 or -1, k be positive it is whole Number, the initial value of k are 1；

Step 2: in kth time iterative process, optimizing is carried out to Weak Classifier using genetic algorithm, obtains kth time iteration most Good Weak Classifier, detailed process are as follows:

Step 2_1: S is enabled to indicate genetic sample matrix；Then random initializtion is carried out to S, made Wherein, the dimension of S is N' × M', and N' indicates the total number for the chromosome for including in S, every behavior item chromosome in S, N' >= 1, n' is positive integer, and the initial value of n' is 1, and 1≤n'≤N', M' indicate total of the gene for including in every chromosome in S It counting, the value of each element in every row in S is a gene, and M' >=4, m' are positive integer, and the initial value of m' is 1,1≤m'≤ M', s_1,1、s_1,m'、s_1,M'Corresponding the 1st gene, the m' gene, the M' gene indicated in the 1st article of chromosome in S, s_n',1、s_n',m'、s_n',M'Corresponding the 1st gene, the m' gene, the M' gene indicated in n-th ' article of chromosome in S, s_N',1、s_N',m'、s_N',M'Corresponding the 1st gene, the m' gene, the M' gene indicated in the N' articles chromosome in S, on The value for stating each gene is 0 or 1；

Step 2_2: being assessed to obtain corresponding assessed value to every chromosome in S, for n-th ' chromosome in S, Its evaluation process are as follows:

Step 2_2a: all genes in n-th ' chromosome are sequentially formed into the binary string that a length is M'；Then certainly From left to right, using the 1st to α of binary string as first group, α+1 to alpha+beta position as second group, alpha+beta+1 Position is used as that third group, last position is as the 4th group to alpha+beta+γ；Further according to first group of calculating feature number, it is denoted as F_n', If first group of decimal number being converted into as binary number is 0, F is enabled_n'=1；If first group is converted as binary number At decimal number be not 0 when, then enableWherein, alpha+beta+γ+1=M', α >=1, β >=1, γ >=1, F_n',10Indicate first group of decimal number being converted into as binary number, symbolFor the oeprator that rounds up, F_n'∈ [1,M]；

Step 2_2b: according to feature number F_n', F is extracted from Y_n'All sample values in a feature, and compiled as feature Number F_n'Under sample value；

Step 2_2c: determine that first threshold, second threshold and classification direction, correspondence are denoted as T1_n'、T2_n'And P_n',P is enabled if being 0 if the 4th group_n'=-1, if the 4th group is 1 Then enable P_n'=1；Then according to first threshold T1_n', second threshold T2_n'And classification direction P_n', and use dual threshold type AdaBoost Classifier is to feature number F_n'Under all sample values classify, work as P_n'By feature number F when=1_n'Under all sample values In between first threshold T1_n'With second threshold T2_n'Between all sample values labeled as 1 and by remaining sample value labeled as- 1；Conversely, working as P_n'By feature number F when=- 1_n'Under all sample values between first threshold T1_n'With second threshold T2_n'It Between all sample values labeled as -1 and by remaining sample value be labeled as 1；Then by feature number F_n'Under all sample values Composition and classification label vector is marked, H is denoted as_n',Wherein, T1_n',10Indicate that second group is converted as binary number At decimal number, y_n',maxIndicate feature number F_n'Under all sample values in maximum sample value, T2_n',10Indicate third group As the decimal number that binary number is converted into, H_n'Dimension be N × 1, h_n',1Indicate feature number F_n'Under the 1st sample value Label, h_n',nIndicate feature number F_n'Under n-th of sample value label, h_n',NIndicate feature number F_n'Under n-th sample The label of this value, h_n',1、h_n',n、h_n',NValue be 1 or -1；

Step 2_3: all smallest evaluation values and all maximum assessed values are found out from the assessed value of all chromosomes in S, are appointed A smallest evaluation value is taken to be denoted as E_min, and appoint and a maximum assessed value is taken to be denoted as E_max；Then by E_minCorresponding chromosome replacement Fall E_maxCorresponding chromosome obtains new genetic sample matrix, is denoted as S_new；Wherein, S_newDimension be N' × M'；

Step 2_4: from S_newIn all chromosomes assessed value in find out all smallest evaluation values, retain one of minimum The corresponding chromosome of assessed value；Then to S_newIn remaining N'-1 chromosome carry out intersection recombination, obtain intersect recombination after Genetic sample matrix, is denoted as S'_new；

Step 2_5: from S'_newIn select at randomChromosome, and set the mutation probabilities of selected all chromosomes as pc；Then at random in every selected chromosomeA gene is changed to obtain new genetic sample matrix, It is denoted as S "_new；Wherein, [10,30] q ∈, symbolFor downward rounding operation symbol, pc ∈ [0.1,0.3]；

Step 2_6: S=S " is enabled_new, then return step 2_2 is repeated, and executes step 2_7 after repeating altogether K times；Its In, S=S "_newIn "=" be assignment, K is positive integer, K ∈ [5,20]；

Step 2_7: the E that will be obtained in the kth implementation procedure of kth time iteration_minFeature number corresponding to corresponding chromosome As the best features number of kth time iteration, it is denoted asThe E that will be obtained in the kth implementation procedure of kth time iteration_minIt is right Best first threshold of the first threshold corresponding to the chromosome answered as kth time iteration, is denoted asBy kth time iteration The E obtained in kth implementation procedure_minBest second threshold of the second threshold corresponding to corresponding chromosome as kth time iteration Value, is denoted asThe E that will be obtained in the kth implementation procedure of kth time iteration_minClassification side corresponding to corresponding chromosome To the optimal classification direction as kth time iteration, it is denoted asIt willUnder the labels of all sample values constitute kth time iteration Optimal classification label vector, be denoted asThen will The structure sequentially formed is as kth The best Weak Classifier of secondary iteration, and the E that will be obtained in the kth implementation procedure of kth time iteration_minMost as kth time iteration Good error rates of weak classifiers, be denoted as again E (^k)；Wherein,Dimension be N × 1；

Step 3: the weight of the best Weak Classifier of kth time iteration is calculated, is denoted as Then it counts The best strong classifier for calculating kth time iteration, is denoted as Wherein, sign () is symbol letter Number,K' is positive integer, and the initial value of k' is 1,Expression kth ' The weight of the best Weak Classifier of secondary iteration,Indicate the optimal classification label vector of kth ' secondary iteration；

Step 4: rightIn each element value withIn each classification marker compared one by one, if entirely Portion is identical, then willIt is determined as final strong classifier, and terminates iterative process；It is not identical if it exists, then follow the steps five；

Step 5: all sample values under feature number corresponding to every chromosome in+1 iterative process of kth in S are calculated Label constitute the corresponding weight vector of classification marker vector, by n-th ' chromosome institute in+1 iterative process of kth in S The corresponding weight vector of classification marker vector that the label of all sample values under corresponding feature number is constituted is denoted as Then k=k+ is enabled 1, it returns again to step 2 and continues to execute；Wherein,It is corresponding to indicateIn the 1st label, n-th Label, n-th mark, and "=" in k=k+1 is assignment.

2. a kind of dual threshold AdaBoost classification method according to claim 1, it is characterised in that the step 2_4 In, to S_newIn remaining N'-1 chromosome carry out intersect recombination detailed process are as follows:

Step 2_4b: the rear M'- ζ gene of the preceding ζ gene of wherein item chromosome and another item chromosome is formed one New chromosome；To the remaining rear M'- ζ gene of wherein item chromosome and the remaining preceding ζ genome of another item chromosome At another new chromosome；Wherein, ζ is the positive integer being randomly generated, ζ ∈ [1, M')；

Step 2_4c: two new chromosomes are replaced to two chromosomes of selection；

3. a kind of dual threshold AdaBoost classification method according to claim 1 or 2, it is characterised in that the step 2_ In 5, at random in every selected chromosomeThe detailed process that a gene is changed are as follows: for the base chosen The value of the gene is changed into 1 if the value of the gene is 0 by cause；If the value of the gene is 1, the value of the gene is changed into 0。