Background technology
Pattern classification refers to be processed and analyzes the various forms of information of sign things or phenomenon, with the process that things or phenomenon are described, recognize, are classified and explain, is the important component part of information science and artificial intelligence.Pattern classification can be applied to multiple fields such as word identification, speech recognition, fingerprint recognition, data mining, remote sensing images identification, medical diagnosis.
Method for pattern classification is a lot, such as, Chinese patent application 201310060437.X is on February 26th, 2013 disclosed " based on the method for classifying modes of differentiating linear expression ", training sample set is divided into sub-training sample set according to sample class, calculate the linear expression coefficient of test sample book in each sub-training sample set, then calculate the reconstructed error of test sample book in each sub-training sample set, when the reconstructed error of test sample book in the sub-training sample set of certain class hour, test sample book is classified as to this class.By reducing training sample number, reduced difficulty in computation, but discrimination is not high.
Summary of the invention
Object of the present invention, is to provide a kind of sorting technique of differentiating linear expression based on core, has greatly improved accuracy of identification.
To achieve these goals, the technical solution adopted in the present invention is as follows:
Based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, described pre-service is for adopting core principle component analysis method (Kernel Principal Component Analysis, KPCA) treat processing sample and do dimensionality reduction computing, again all samples after dimensionality reduction are done to normalization computing.
Preferably, the method that described normalization computing adopts is L
2-norm method for normalizing.
A kind of face identification method, facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize after digitizing and training sample and test sample book data that column vector obtains, use as mentioned above based on core and differentiate the classification under the sorting technique identification test sample book of linear expression.
A kind of Handwritten Numeral Recognition Method, digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as mentioned above to identify the classification under test sample book.
Adopt after such scheme, the sorting technique of differentiating linear expression based on core of the present invention, than prior art, carries out all samples in training sample set and test sample book respectively carrying out pattern classification after coring processing in advance again, has further improved accuracy of identification.
Embodiment
Below technical scheme of the present invention is described in further detail.
Utilization comprises the affiliated classification of training sample set X identification test sample y of c classification, comprises the steps.
Make X=[X
1, X
2..., X
c],
represent i class training sample set, X
icomprise N
iindividual sample, x
ij∈ R
d(R
drepresent the real vector set of d dimension) represent j training sample of i class, (i=1,2 ..., c; J=1,2 ..., N
i),
y ∈ R
d, c is greater than 1 natural number, N
ifor natural number.
Step 1, does core mapping, dimensionality reduction and normalization computing to all samples in training sample set X and test sample y, obtains new training sample set and new test sample book; By core mapping phi: R
d→ F is mapped to the non-linear space F of higher-dimension by all samples in training sample set X and test sample y from the linear space of d dimension, and (dimension of F is much larger than d), training sample xi
jbe mapped as φ (xi
j), test sample y is mapped as φ (y).Order
φ (X)=[φ (X
1), φ (X
2) ..., φ (X
c)].
By core principle component analysis method (KPCA), the dimension of higher-dimension nuclear space sample is reduced to N-1, N is training sample sum, and the sample after dimensionality reduction is carried out to L
2-norm normalization.KPCA method is B.Scholkopf author, " the Nonlinear Component Analysis as a Kernel Eigenvalue Problem (eigenvalue problem that non-linear component analysis is core) " of A.Smola and K.Muller (1998, the 10th volume, the 5th periodical, 1299-1319 page, the neural > of calculating of < < >) in describe to some extent.L
2the ratio of normalized two norms that are defined as sample and sample of-norm.Use φ
kpca(x
ij) represent the φ (x after dimensionality reduction normalization
ij),
φ k
pca(X)=[φ
kpca(X
1), φ
kpca(X
2) ..., φ
kpca(X
c)], φ
kpca(y) represent the φ (y) after dimensionality reduction normalization.
Step 2, the new training sample φ of each sample class obtaining after step 1 is processed
kpca(x
ij) form new sub-training sample set φ
kpca(X
i), (i=1,2 ..., c; J=1,2 ..., N
i).
Step 3, calculates the linear expression coefficient of new test sample book in each new sub-training sample set and corresponding reconstructed error; To the new sub-training sample set φ of i classification
kpca(X
i) utilize expression formula
determine new test sample book φ
kpca(y) at the new sub-training sample set φ of i class
kpca(X
i) interior linear expression coefficient
with corresponding reconstructed error r
i(φ
kpca(y))=|| φ
kpca(y)-φ
kpca(X
i) β
i||
2(i=1,2 ..., c).
Step 4, newer test sample book φ
kpca(y) at each new sub-training sample set φ
kpca(X
i) interior reconstructed error, as new test sample book φ
kpca(y) at the new sub-training sample set φ of k class
kpca(X
k) interior reconstructed error minimum, i.e. r
k(φ
kpca(y)) at r
1(φ
kpca(y)), r
2(φ
kpca(y)) ..., r
c(φ
kpca(y)), hour, test sample y is classified as to k class, described k is the natural number that is less than or equal to c.
Select Yale B face database (The Yale Face Database B) and the MNIST handwriting digital database authentication sorting technique of differentiating linear expression based on core of the present invention below.Yale B face database is A.S.Georghiades author, P.N.Belhumeur, with " From Few to Many:Illumination Cone Models for Face Recognition under Variable Lighting and the Pose " (calendar year 2001 of D.J.Kriegman, the 23rd volume, the 6th periodical, 643-660 page, < < IEEE pattern analysis and machine intelligence > >) in describe to some extent.MNIST handwriting digital database is Y.Mizukami author, K.Tadamura, J.Warrell, " the CUDA Implementation of Deformable Pattern Recognition and Its Application to MNIST Handwritten Digit Database " of P.Li and S.Prince (2010,2001-2004 page, < < pattern-recognition > >) in describe to some extent.
The totally 5760 width images that Yale B face database has comprised 10 people, the gray level image of everyone 576 640 × 480 sizes, comprises 9 kinds of posture changings and 64 kinds of illumination conversion.All original images of 10 people have been selected in experiment, and all original images of choosing are proofreaied and correct to (making two is horizontal), convergent-divergent and cutting, and each image pattern only retains the human face region of 32 × 32 sizes.In experiment from everyone 576 width images random select 70 width images as training sample, 20 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian radial basis function (Gaussian radial basis function, Gaussian RBF) as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
MNIST handwriting digital database is by 10 hand-written arabic numeral, and totally 10000 width pictures form, and each numeral has 863~1127 width black white images, has comprised various forms of hand-written arabic numeral.Original handwritten numeral image size is 20 × 20, then by reverse sawtooth technology, is converted to 28 × 28 sizes.In experiment from each handwriting digital image random select 100 width images as training sample, 30 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian RBF as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
Method for classifying modes and the of the present invention average recognition rate of based on core differentiating the sorting technique of linear expression of experiment statistics based on differentiating linear expression, in Table 1, wherein average recognition rate is defined as the ratio of identifying correct number of times and total identification number of times.As can be seen from Table 1, the average recognition rate of the sorting technique based on core discriminating linear expression is apparently higher than the method for classifying modes based on differentiating linear expression.
Table 1 is differentiated the average recognition rate of the method for classifying modes of linear expression based on discriminating and core
Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.