CN103761533A - Classification method based on nuclear identification linear representation - Google Patents

Classification method based on nuclear identification linear representation Download PDF

Info

Publication number
CN103761533A
CN103761533A CN201410026937.6A CN201410026937A CN103761533A CN 103761533 A CN103761533 A CN 103761533A CN 201410026937 A CN201410026937 A CN 201410026937A CN 103761533 A CN103761533 A CN 103761533A
Authority
CN
China
Prior art keywords
training sample
test sample
new
training
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410026937.6A
Other languages
Chinese (zh)
Inventor
刘茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410026937.6A priority Critical patent/CN103761533A/en
Publication of CN103761533A publication Critical patent/CN103761533A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了基于核鉴别线性表示的分类方法,利用训练样本集识别测试样本所属的类别。首先对训练样本集和测试样本中的所有样本分别进行预处理,得到新的训练样本集和新的测试样本,各个样本类别的新的训练样本构成新的子训练样本集,然后计算新的测试样本在各个新的子训练样本集内的重构误差,最后将测试样本归为重构误差最小的新的子训练样本集所对应的那一类,所述预处理为采用核主成分分析方法对待处理样本做降维运算,再对降维后的所有样本做归一化运算。本发明基于核鉴别线性表示的分类方法相较于现有技术大大提高了识别精度。The invention discloses a classification method based on a kernel discriminant linear representation, which utilizes a training sample set to identify the category to which a test sample belongs. Firstly, preprocess all the samples in the training sample set and test sample respectively to obtain a new training sample set and a new test sample. The new training samples of each sample category form a new sub-training sample set, and then calculate the new test sample set The reconstruction error of the sample in each new sub-training sample set, and finally the test sample is classified into the class corresponding to the new sub-training sample set with the smallest reconstruction error. The preprocessing is to use the kernel principal component analysis method Perform dimensionality reduction operations on the samples to be processed, and then perform normalization operations on all samples after dimensionality reduction. Compared with the prior art, the classification method based on kernel discrimination linear representation of the present invention greatly improves the recognition accuracy.

Description

Based on core, differentiate the sorting technique of linear expression
Technical field
The present invention relates to differentiate based on core the sorting technique of linear expression, belong to the technical field of pattern-recognition.
Background technology
Pattern classification refers to be processed and analyzes the various forms of information of sign things or phenomenon, with the process that things or phenomenon are described, recognize, are classified and explain, is the important component part of information science and artificial intelligence.Pattern classification can be applied to multiple fields such as word identification, speech recognition, fingerprint recognition, data mining, remote sensing images identification, medical diagnosis.
Method for pattern classification is a lot, such as, Chinese patent application 201310060437.X is on February 26th, 2013 disclosed " based on the method for classifying modes of differentiating linear expression ", training sample set is divided into sub-training sample set according to sample class, calculate the linear expression coefficient of test sample book in each sub-training sample set, then calculate the reconstructed error of test sample book in each sub-training sample set, when the reconstructed error of test sample book in the sub-training sample set of certain class hour, test sample book is classified as to this class.By reducing training sample number, reduced difficulty in computation, but discrimination is not high.
Summary of the invention
Object of the present invention, is to provide a kind of sorting technique of differentiating linear expression based on core, has greatly improved accuracy of identification.
To achieve these goals, the technical solution adopted in the present invention is as follows:
Based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, described pre-service is for adopting core principle component analysis method (Kernel Principal Component Analysis, KPCA) treat processing sample and do dimensionality reduction computing, again all samples after dimensionality reduction are done to normalization computing.
Preferably, the method that described normalization computing adopts is L 2-norm method for normalizing.
A kind of face identification method, facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize after digitizing and training sample and test sample book data that column vector obtains, use as mentioned above based on core and differentiate the classification under the sorting technique identification test sample book of linear expression.
A kind of Handwritten Numeral Recognition Method, digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as mentioned above to identify the classification under test sample book.
Adopt after such scheme, the sorting technique of differentiating linear expression based on core of the present invention, than prior art, carries out all samples in training sample set and test sample book respectively carrying out pattern classification after coring processing in advance again, has further improved accuracy of identification.
Embodiment
Below technical scheme of the present invention is described in further detail.
Utilization comprises the affiliated classification of training sample set X identification test sample y of c classification, comprises the steps.
Make X=[X 1, X 2..., X c],
Figure BDA0000459658850000022
represent i class training sample set, X icomprise N iindividual sample, x ij∈ R d(R drepresent the real vector set of d dimension) represent j training sample of i class, (i=1,2 ..., c; J=1,2 ..., N i),
Figure BDA0000459658850000021
y ∈ R d, c is greater than 1 natural number, N ifor natural number.
Step 1, does core mapping, dimensionality reduction and normalization computing to all samples in training sample set X and test sample y, obtains new training sample set and new test sample book; By core mapping phi: R d→ F is mapped to the non-linear space F of higher-dimension by all samples in training sample set X and test sample y from the linear space of d dimension, and (dimension of F is much larger than d), training sample xi jbe mapped as φ (xi j), test sample y is mapped as φ (y).Order
Figure BDA0000459658850000031
φ (X)=[φ (X 1), φ (X 2) ..., φ (X c)].
By core principle component analysis method (KPCA), the dimension of higher-dimension nuclear space sample is reduced to N-1, N is training sample sum, and the sample after dimensionality reduction is carried out to L 2-norm normalization.KPCA method is B.Scholkopf author, " the Nonlinear Component Analysis as a Kernel Eigenvalue Problem (eigenvalue problem that non-linear component analysis is core) " of A.Smola and K.Muller (1998, the 10th volume, the 5th periodical, 1299-1319 page, the neural > of calculating of < < >) in describe to some extent.L 2the ratio of normalized two norms that are defined as sample and sample of-norm.Use φ kpca(x ij) represent the φ (x after dimensionality reduction normalization ij), &phi; kpca ( X i ) = [ &phi; kpca ( x i 1 ) , &phi; kpca ( x i 2 ) , . . . , &phi; kpca ( x i N i ) ] , φ k pca(X)=[φ kpca(X 1), φ kpca(X 2) ..., φ kpca(X c)], φ kpca(y) represent the φ (y) after dimensionality reduction normalization.
Step 2, the new training sample φ of each sample class obtaining after step 1 is processed kpca(x ij) form new sub-training sample set φ kpca(X i), (i=1,2 ..., c; J=1,2 ..., N i).
Step 3, calculates the linear expression coefficient of new test sample book in each new sub-training sample set and corresponding reconstructed error; To the new sub-training sample set φ of i classification kpca(X i) utilize expression formula
Figure BDA0000459658850000033
determine new test sample book φ kpca(y) at the new sub-training sample set φ of i class kpca(X i) interior linear expression coefficient
Figure BDA0000459658850000034
with corresponding reconstructed error r ikpca(y))=|| φ kpca(y)-φ kpca(X i) β i|| 2(i=1,2 ..., c).
Step 4, newer test sample book φ kpca(y) at each new sub-training sample set φ kpca(X i) interior reconstructed error, as new test sample book φ kpca(y) at the new sub-training sample set φ of k class kpca(X k) interior reconstructed error minimum, i.e. r kkpca(y)) at r 1kpca(y)), r 2kpca(y)) ..., r ckpca(y)), hour, test sample y is classified as to k class, described k is the natural number that is less than or equal to c.
Select Yale B face database (The Yale Face Database B) and the MNIST handwriting digital database authentication sorting technique of differentiating linear expression based on core of the present invention below.Yale B face database is A.S.Georghiades author, P.N.Belhumeur, with " From Few to Many:Illumination Cone Models for Face Recognition under Variable Lighting and the Pose " (calendar year 2001 of D.J.Kriegman, the 23rd volume, the 6th periodical, 643-660 page, < < IEEE pattern analysis and machine intelligence > >) in describe to some extent.MNIST handwriting digital database is Y.Mizukami author, K.Tadamura, J.Warrell, " the CUDA Implementation of Deformable Pattern Recognition and Its Application to MNIST Handwritten Digit Database " of P.Li and S.Prince (2010,2001-2004 page, < < pattern-recognition > >) in describe to some extent.
The totally 5760 width images that Yale B face database has comprised 10 people, the gray level image of everyone 576 640 × 480 sizes, comprises 9 kinds of posture changings and 64 kinds of illumination conversion.All original images of 10 people have been selected in experiment, and all original images of choosing are proofreaied and correct to (making two is horizontal), convergent-divergent and cutting, and each image pattern only retains the human face region of 32 × 32 sizes.In experiment from everyone 576 width images random select 70 width images as training sample, 20 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian radial basis function (Gaussian radial basis function, Gaussian RBF) as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
MNIST handwriting digital database is by 10 hand-written arabic numeral, and totally 10000 width pictures form, and each numeral has 863~1127 width black white images, has comprised various forms of hand-written arabic numeral.Original handwritten numeral image size is 20 × 20, then by reverse sawtooth technology, is converted to 28 × 28 sizes.In experiment from each handwriting digital image random select 100 width images as training sample, 30 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian RBF as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
Method for classifying modes and the of the present invention average recognition rate of based on core differentiating the sorting technique of linear expression of experiment statistics based on differentiating linear expression, in Table 1, wherein average recognition rate is defined as the ratio of identifying correct number of times and total identification number of times.As can be seen from Table 1, the average recognition rate of the sorting technique based on core discriminating linear expression is apparently higher than the method for classifying modes based on differentiating linear expression.
Table 1 is differentiated the average recognition rate of the method for classifying modes of linear expression based on discriminating and core
Figure BDA0000459658850000041
Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.

Claims (4)

1. 基于核鉴别线性表示的分类方法,利用训练样本集识别测试样本所属的类别,首先对训练样本集和测试样本中的所有样本分别进行预处理,得到新的训练样本集和新的测试样本,各个样本类别的新的训练样本构成新的子训练样本集,然后计算新的测试样本在各个新的子训练样本集内的重构误差,最后将测试样本归为重构误差最小的新的子训练样本集所对应的那一类,其特征在于:所述预处理为采用核主成分分析方法对待处理样本做降维运算,再对降维后的所有样本做归一化运算。 1. Based on the classification method of kernel discriminative linear representation, the training sample set is used to identify the category of the test sample. First, all samples in the training sample set and test samples are preprocessed respectively to obtain a new training sample set and a new test sample. , the new training samples of each sample category form a new sub-training sample set, and then calculate the reconstruction error of the new test sample in each new sub-training sample set, and finally classify the test sample as the new sub-training sample set with the smallest reconstruction error The type corresponding to the sub-training sample set is characterized in that: the preprocessing is to perform dimension reduction operation on the samples to be processed by using the kernel principal component analysis method, and then perform normalization operation on all the samples after dimension reduction. 2.如权利要求1所述基于核鉴别线性表示的分类方法,其特征在于:所述归一化运算采用的方法为L2-norm归一化方法。 2. The classification method based on kernel discriminative linear representation according to claim 1, characterized in that: the method adopted in the normalization operation is the L 2 -norm normalization method. 3.一种人脸识别方法,其特征在于:对人脸数据库中的人脸图像进行预处理,从预处理后的人脸图像中选取训练样本和测试样本,对训练样本和测试样本数字化和列向量化,利用数字化和列向量化后得到的训练样本和测试样本数据,使用如权利要求1或2所述基于核鉴别线性表示的分类方法识别测试样本所属的类别。 3. a face recognition method is characterized in that: the face image in the face database is preprocessed, the training sample and the test sample are selected from the preprocessed face image, the training sample and the test sample digitization and Column vectorization, using the training sample and test sample data obtained after digitization and column vectorization, using the classification method based on kernel discrimination linear representation as described in claim 1 or 2 to identify the category to which the test sample belongs. 4.一种手写体数字识别方法,其特征在于:对手写体数字数据库中的数字图像进行预处理,从预处理后的数字图像中选取训练样本和测试样本,对训练样本和测试样本数字化和列向量化,利用数字化和列向量化后得到的训练样本和测试样本数据,使用如权利要求1或2所述基于核鉴别线性表示的分类方法识别测试样本所属的类别。 4. a handwritten numeral recognition method, is characterized in that: the digital image in the handwritten numeral database is preprocessed, selects training sample and test sample from the digital image after preprocessing, training sample and test sample digitization and column vector The training sample and test sample data obtained after digitization and column vector quantization are used to identify the category to which the test sample belongs using the classification method based on kernel discrimination linear representation as described in claim 1 or 2.
CN201410026937.6A 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation Pending CN103761533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026937.6A CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026937.6A CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Publications (1)

Publication Number Publication Date
CN103761533A true CN103761533A (en) 2014-04-30

Family

ID=50528768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026937.6A Pending CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Country Status (1)

Country Link
CN (1) CN103761533A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656737A (en) * 2018-10-31 2019-04-19 阿里巴巴集团控股有限公司 The statistical method and device of exception information

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238200A1 (en) * 2004-04-27 2005-10-27 Rakesh Gupta Simultaneous localization and mapping using multiple view feature descriptors
CN102073880A (en) * 2011-01-13 2011-05-25 西安电子科技大学 Integration method for face recognition by using sparse representation
CN102122355A (en) * 2011-03-15 2011-07-13 西安电子科技大学 SAR (synthetic aperture radar) target identification method based on nuclear sparse representation
CN102930301A (en) * 2012-10-16 2013-02-13 西安电子科技大学 Image classification method based on characteristic weight learning and nuclear sparse representation
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
CN102982349A (en) * 2012-11-09 2013-03-20 深圳市捷顺科技实业股份有限公司 Image recognition method and device
CN103106409A (en) * 2013-01-29 2013-05-15 北京交通大学 Composite character extraction method aiming at head shoulder detection
CN103226714A (en) * 2013-05-09 2013-07-31 山东大学 Sparse coding method reinforced based on larger coding coefficient

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238200A1 (en) * 2004-04-27 2005-10-27 Rakesh Gupta Simultaneous localization and mapping using multiple view feature descriptors
CN102073880A (en) * 2011-01-13 2011-05-25 西安电子科技大学 Integration method for face recognition by using sparse representation
CN102122355A (en) * 2011-03-15 2011-07-13 西安电子科技大学 SAR (synthetic aperture radar) target identification method based on nuclear sparse representation
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
CN102930301A (en) * 2012-10-16 2013-02-13 西安电子科技大学 Image classification method based on characteristic weight learning and nuclear sparse representation
CN102982349A (en) * 2012-11-09 2013-03-20 深圳市捷顺科技实业股份有限公司 Image recognition method and device
CN103106409A (en) * 2013-01-29 2013-05-15 北京交通大学 Composite character extraction method aiming at head shoulder detection
CN103226714A (en) * 2013-05-09 2013-07-31 山东大学 Sparse coding method reinforced based on larger coding coefficient

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张楠: "低秩鉴别分析与回归分类方法研究", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656737A (en) * 2018-10-31 2019-04-19 阿里巴巴集团控股有限公司 The statistical method and device of exception information

Similar Documents

Publication Publication Date Title
Yang et al. Retracted: Face recognition attendance system based on real-time video processing
Obaidullah et al. PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification
Kaur et al. Plant species identification based on plant leaf using computer vision and machine learning techniques
Khan et al. Painting-91: a large scale database for computational painting categorization
CN102938065B (en) Face feature extraction method and face identification method based on large-scale image data
CN103839042B (en) Face identification method and face identification system
CN105956560A (en) Vehicle model identification method based on pooling multi-scale depth convolution characteristics
Lumini et al. Ensemble of texture descriptors and classifiers for face recognition
CN101075291A (en) Efficient promoting exercising method for discriminating human face
CN102968626A (en) Human face image matching method
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN111325275B (en) Robust image classification method and device based on low-rank two-dimensional local discriminant map embedding
Rao et al. Exploring deep learning techniques for Kannada handwritten character recognition: a boon for digitization
Maken et al. A study on various techniques involved in gender prediction system: a comprehensive review
Angadi et al. Face recognition through symbolic modeling of face graphs and texture
Reta et al. Amharic handwritten character recognition using combined features and support vector machine
Halder et al. Content independent writer identification on Bangla Script: a document level approach
Singh et al. Writer identification using texture features: A comparative study
Ouyang et al. Robust automatic facial expression detection method based on sparse representation plus LBP map
Chahi et al. WriterINet: a multi-path deep CNN for offline text-independent writer identification
Sadeghzadeh et al. Triplet loss-based convolutional neural network for static sign language recognition
Li et al. Sparse-based neural response for image classification
CN102609715A (en) Object type identification method combining plurality of interest point testers
Boudraa et al. Contribution to historical manuscript dating: A hybrid approach employing hand-crafted features with vision transformers
Rasel et al. An efficient framework for hand gesture recognition based on histogram of oriented gradients and support vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140430

RJ01 Rejection of invention patent application after publication