CN103761533A

CN103761533A - Classification method based on nuclear identification linear representation

Info

Publication number: CN103761533A
Application number: CN201410026937.6A
Authority: CN
Inventors: 刘茜
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2014-01-21
Filing date: 2014-01-21
Publication date: 2014-04-30

Abstract

本发明公开了基于核鉴别线性表示的分类方法，利用训练样本集识别测试样本所属的类别。首先对训练样本集和测试样本中的所有样本分别进行预处理，得到新的训练样本集和新的测试样本，各个样本类别的新的训练样本构成新的子训练样本集，然后计算新的测试样本在各个新的子训练样本集内的重构误差，最后将测试样本归为重构误差最小的新的子训练样本集所对应的那一类，所述预处理为采用核主成分分析方法对待处理样本做降维运算，再对降维后的所有样本做归一化运算。本发明基于核鉴别线性表示的分类方法相较于现有技术大大提高了识别精度。The invention discloses a classification method based on a kernel discriminant linear representation, which utilizes a training sample set to identify the category to which a test sample belongs. Firstly, preprocess all the samples in the training sample set and test sample respectively to obtain a new training sample set and a new test sample. The new training samples of each sample category form a new sub-training sample set, and then calculate the new test sample set The reconstruction error of the sample in each new sub-training sample set, and finally the test sample is classified into the class corresponding to the new sub-training sample set with the smallest reconstruction error. The preprocessing is to use the kernel principal component analysis method Perform dimensionality reduction operations on the samples to be processed, and then perform normalization operations on all samples after dimensionality reduction. Compared with the prior art, the classification method based on kernel discrimination linear representation of the present invention greatly improves the recognition accuracy.

Description

Based on core, differentiate the sorting technique of linear expression

Technical field

The present invention relates to differentiate based on core the sorting technique of linear expression, belong to the technical field of pattern-recognition.

Background technology

Pattern classification refers to be processed and analyzes the various forms of information of sign things or phenomenon, with the process that things or phenomenon are described, recognize, are classified and explain, is the important component part of information science and artificial intelligence.Pattern classification can be applied to multiple fields such as word identification, speech recognition, fingerprint recognition, data mining, remote sensing images identification, medical diagnosis.

Method for pattern classification is a lot, such as, Chinese patent application 201310060437.X is on February 26th, 2013 disclosed " based on the method for classifying modes of differentiating linear expression ", training sample set is divided into sub-training sample set according to sample class, calculate the linear expression coefficient of test sample book in each sub-training sample set, then calculate the reconstructed error of test sample book in each sub-training sample set, when the reconstructed error of test sample book in the sub-training sample set of certain class hour, test sample book is classified as to this class.By reducing training sample number, reduced difficulty in computation, but discrimination is not high.

Summary of the invention

Object of the present invention, is to provide a kind of sorting technique of differentiating linear expression based on core, has greatly improved accuracy of identification.

To achieve these goals, the technical solution adopted in the present invention is as follows:

Based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, described pre-service is for adopting core principle component analysis method (Kernel Principal Component Analysis, KPCA) treat processing sample and do dimensionality reduction computing, again all samples after dimensionality reduction are done to normalization computing.

Preferably, the method that described normalization computing adopts is L ₂-norm method for normalizing.

A kind of face identification method, facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize after digitizing and training sample and test sample book data that column vector obtains, use as mentioned above based on core and differentiate the classification under the sorting technique identification test sample book of linear expression.

A kind of Handwritten Numeral Recognition Method, digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as mentioned above to identify the classification under test sample book.

Adopt after such scheme, the sorting technique of differentiating linear expression based on core of the present invention, than prior art, carries out all samples in training sample set and test sample book respectively carrying out pattern classification after coring processing in advance again, has further improved accuracy of identification.

Embodiment

Below technical scheme of the present invention is described in further detail.

Utilization comprises the affiliated classification of training sample set X identification test sample y of c classification, comprises the steps.

Make X=[X ₁, X ₂..., X _c],

represent i class training sample set, X _icomprise N _iindividual sample, x _ij∈ R ^d(R ^drepresent the real vector set of d dimension) represent j training sample of i class, (i=1,2 ..., c; J=1,2 ..., N _i),

y ∈ R ^d, c is greater than 1 natural number, N _ifor natural number.

Step 1, does core mapping, dimensionality reduction and normalization computing to all samples in training sample set X and test sample y, obtains new training sample set and new test sample book; By core mapping phi: R ^d→ F is mapped to the non-linear space F of higher-dimension by all samples in training sample set X and test sample y from the linear space of d dimension, and (dimension of F is much larger than d), training sample xi _jbe mapped as φ (xi _j), test sample y is mapped as φ (y).Order

φ (X)=[φ (X ₁), φ (X ₂) ..., φ (X _c)].

By core principle component analysis method (KPCA), the dimension of higher-dimension nuclear space sample is reduced to N-1, N is training sample sum, and the sample after dimensionality reduction is carried out to L ₂-norm normalization.KPCA method is B.Scholkopf author, " the Nonlinear Component Analysis as a Kernel Eigenvalue Problem (eigenvalue problem that non-linear component analysis is core) " of A.Smola and K.Muller (1998, the 10th volume, the 5th periodical, 1299-1319 page, the neural > of calculating of < < >) in describe to some extent.L ₂the ratio of normalized two norms that are defined as sample and sample of-norm.Use φ _kpca(x _ij) represent the φ (x after dimensionality reduction normalization _ij),

φ_{kpca} (X_{i}) = [φ_{kpca} (x_{i 1}), φ_{kpca} (x_{i 2}), . . ., φ_{kpca} (x_{i N_{i}})],

φ k _pca(X)=[φ _kpca(X ₁), φ _kpca(X ₂) ..., φ _kpca(X _c)], φ _kpca(y) represent the φ (y) after dimensionality reduction normalization.

Step 2, the new training sample φ of each sample class obtaining after step 1 is processed _kpca(x _ij) form new sub-training sample set φ _kpca(X _i), (i=1,2 ..., c; J=1,2 ..., N _i).

Step 3, calculates the linear expression coefficient of new test sample book in each new sub-training sample set and corresponding reconstructed error; To the new sub-training sample set φ of i classification _kpca(X _i) utilize expression formula

determine new test sample book φ _kpca(y) at the new sub-training sample set φ of i class _kpca(X _i) interior linear expression coefficient

with corresponding reconstructed error r _i(φ _kpca(y))=|| φ _kpca(y)-φ _kpca(X _i) β _i|| ₂(i=1,2 ..., c).

Step 4, newer test sample book φ _kpca(y) at each new sub-training sample set φ _kpca(X _i) interior reconstructed error, as new test sample book φ _kpca(y) at the new sub-training sample set φ of k class _kpca(X _k) interior reconstructed error minimum, i.e. r _k(φ _kpca(y)) at r ₁(φ _kpca(y)), r ₂(φ _kpca(y)) ..., r _c(φ _kpca(y)), hour, test sample y is classified as to k class, described k is the natural number that is less than or equal to c.

Select Yale B face database (The Yale Face Database B) and the MNIST handwriting digital database authentication sorting technique of differentiating linear expression based on core of the present invention below.Yale B face database is A.S.Georghiades author, P.N.Belhumeur, with " From Few to Many:Illumination Cone Models for Face Recognition under Variable Lighting and the Pose " (calendar year 2001 of D.J.Kriegman, the 23rd volume, the 6th periodical, 643-660 page, < < IEEE pattern analysis and machine intelligence > >) in describe to some extent.MNIST handwriting digital database is Y.Mizukami author, K.Tadamura, J.Warrell, " the CUDA Implementation of Deformable Pattern Recognition and Its Application to MNIST Handwritten Digit Database " of P.Li and S.Prince (2010,2001-2004 page, < < pattern-recognition > >) in describe to some extent.

The totally 5760 width images that Yale B face database has comprised 10 people, the gray level image of everyone 576 640 × 480 sizes, comprises 9 kinds of posture changings and 64 kinds of illumination conversion.All original images of 10 people have been selected in experiment, and all original images of choosing are proofreaied and correct to (making two is horizontal), convergent-divergent and cutting, and each image pattern only retains the human face region of 32 × 32 sizes.In experiment from everyone 576 width images random select 70 width images as training sample, 20 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian radial basis function (Gaussian radial basis function, Gaussian RBF) as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.

MNIST handwriting digital database is by 10 hand-written arabic numeral, and totally 10000 width pictures form, and each numeral has 863～1127 width black white images, has comprised various forms of hand-written arabic numeral.Original handwritten numeral image size is 20 × 20, then by reverse sawtooth technology, is converted to 28 × 28 sizes.In experiment from each handwriting digital image random select 100 width images as training sample, 30 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian RBF as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.

Method for classifying modes and the of the present invention average recognition rate of based on core differentiating the sorting technique of linear expression of experiment statistics based on differentiating linear expression, in Table 1, wherein average recognition rate is defined as the ratio of identifying correct number of times and total identification number of times.As can be seen from Table 1, the average recognition rate of the sorting technique based on core discriminating linear expression is apparently higher than the method for classifying modes based on differentiating linear expression.

Table 1 is differentiated the average recognition rate of the method for classifying modes of linear expression based on discriminating and core

Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.

Claims

1. Based on the classification method of kernel discriminative linear representation, the training sample set is used to identify the category of the test sample. First, all samples in the training sample set and test samples are preprocessed respectively to obtain a new training sample set and a new test sample. , the new training samples of each sample category form a new sub-training sample set, and then calculate the reconstruction error of the new test sample in each new sub-training sample set, and finally classify the test sample as the new sub-training sample set with the smallest reconstruction error The type corresponding to the sub-training sample set is characterized in that: the preprocessing is to perform dimension reduction operation on the samples to be processed by using the kernel principal component analysis method, and then perform normalization operation on all the samples after dimension reduction.

2. The classification method based on kernel discriminative linear representation according to claim 1, characterized in that: the method adopted in the normalization operation is the L ₂ -norm normalization method.

3. a face recognition method is characterized in that: the face image in the face database is preprocessed, the training sample and the test sample are selected from the preprocessed face image, the training sample and the test sample digitization and Column vectorization, using the training sample and test sample data obtained after digitization and column vectorization, using the classification method based on kernel discrimination linear representation as described in claim 1 or 2 to identify the category to which the test sample belongs.

4. a handwritten numeral recognition method, is characterized in that: the digital image in the handwritten numeral database is preprocessed, selects training sample and test sample from the digital image after preprocessing, training sample and test sample digitization and column vector The training sample and test sample data obtained after digitization and column vector quantization are used to identify the category to which the test sample belongs using the classification method based on kernel discrimination linear representation as described in claim 1 or 2.