CN103761533A - Classification method based on nuclear identification linear representation - Google Patents

Classification method based on nuclear identification linear representation Download PDF

Info

Publication number
CN103761533A
CN103761533A CN201410026937.6A CN201410026937A CN103761533A CN 103761533 A CN103761533 A CN 103761533A CN 201410026937 A CN201410026937 A CN 201410026937A CN 103761533 A CN103761533 A CN 103761533A
Authority
CN
China
Prior art keywords
training sample
test sample
new
book
sample book
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410026937.6A
Other languages
Chinese (zh)
Inventor
刘茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410026937.6A priority Critical patent/CN103761533A/en
Publication of CN103761533A publication Critical patent/CN103761533A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a classification method based on nuclear identification linear representation. A training sample set is utilized to identify the category of tested samples. First, all the samples in the training sample set and the testing samples are pre-treated to obtain a new training sample set and new tested samples. The new tested samples of various sample categories constitute new sub-training sample sets. Then, the reconstruction error of the new tested samples in the new sub-training sample sets is calculated. Finally, the tested samples are classified as the category corresponding to the new sub-training sample sets of smallest reconstruction error. The pre-treatment mainly refers to adopting the nuclear component analysis method to conduct dimension reduction calculation on the processed samples, and then conducting the normalization calculation on all the samples after dimension reduction. Compared with the prior art, the classification method based on the nuclear identification linear representation greatly improves identification precision.

Description

Based on core, differentiate the sorting technique of linear expression
Technical field
The present invention relates to differentiate based on core the sorting technique of linear expression, belong to the technical field of pattern-recognition.
Background technology
Pattern classification refers to be processed and analyzes the various forms of information of sign things or phenomenon, with the process that things or phenomenon are described, recognize, are classified and explain, is the important component part of information science and artificial intelligence.Pattern classification can be applied to multiple fields such as word identification, speech recognition, fingerprint recognition, data mining, remote sensing images identification, medical diagnosis.
Method for pattern classification is a lot, such as, Chinese patent application 201310060437.X is on February 26th, 2013 disclosed " based on the method for classifying modes of differentiating linear expression ", training sample set is divided into sub-training sample set according to sample class, calculate the linear expression coefficient of test sample book in each sub-training sample set, then calculate the reconstructed error of test sample book in each sub-training sample set, when the reconstructed error of test sample book in the sub-training sample set of certain class hour, test sample book is classified as to this class.By reducing training sample number, reduced difficulty in computation, but discrimination is not high.
Summary of the invention
Object of the present invention, is to provide a kind of sorting technique of differentiating linear expression based on core, has greatly improved accuracy of identification.
To achieve these goals, the technical solution adopted in the present invention is as follows:
Based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, described pre-service is for adopting core principle component analysis method (Kernel Principal Component Analysis, KPCA) treat processing sample and do dimensionality reduction computing, again all samples after dimensionality reduction are done to normalization computing.
Preferably, the method that described normalization computing adopts is L 2-norm method for normalizing.
A kind of face identification method, facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize after digitizing and training sample and test sample book data that column vector obtains, use as mentioned above based on core and differentiate the classification under the sorting technique identification test sample book of linear expression.
A kind of Handwritten Numeral Recognition Method, digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as mentioned above to identify the classification under test sample book.
Adopt after such scheme, the sorting technique of differentiating linear expression based on core of the present invention, than prior art, carries out all samples in training sample set and test sample book respectively carrying out pattern classification after coring processing in advance again, has further improved accuracy of identification.
Embodiment
Below technical scheme of the present invention is described in further detail.
Utilization comprises the affiliated classification of training sample set X identification test sample y of c classification, comprises the steps.
Make X=[X 1, X 2..., X c],
Figure BDA0000459658850000022
represent i class training sample set, X icomprise N iindividual sample, x ij∈ R d(R drepresent the real vector set of d dimension) represent j training sample of i class, (i=1,2 ..., c; J=1,2 ..., N i),
Figure BDA0000459658850000021
y ∈ R d, c is greater than 1 natural number, N ifor natural number.
Step 1, does core mapping, dimensionality reduction and normalization computing to all samples in training sample set X and test sample y, obtains new training sample set and new test sample book; By core mapping phi: R d→ F is mapped to the non-linear space F of higher-dimension by all samples in training sample set X and test sample y from the linear space of d dimension, and (dimension of F is much larger than d), training sample xi jbe mapped as φ (xi j), test sample y is mapped as φ (y).Order
Figure BDA0000459658850000031
φ (X)=[φ (X 1), φ (X 2) ..., φ (X c)].
By core principle component analysis method (KPCA), the dimension of higher-dimension nuclear space sample is reduced to N-1, N is training sample sum, and the sample after dimensionality reduction is carried out to L 2-norm normalization.KPCA method is B.Scholkopf author, " the Nonlinear Component Analysis as a Kernel Eigenvalue Problem (eigenvalue problem that non-linear component analysis is core) " of A.Smola and K.Muller (1998, the 10th volume, the 5th periodical, 1299-1319 page, the neural > of calculating of < < >) in describe to some extent.L 2the ratio of normalized two norms that are defined as sample and sample of-norm.Use φ kpca(x ij) represent the φ (x after dimensionality reduction normalization ij), &phi; kpca ( X i ) = [ &phi; kpca ( x i 1 ) , &phi; kpca ( x i 2 ) , . . . , &phi; kpca ( x i N i ) ] , φ k pca(X)=[φ kpca(X 1), φ kpca(X 2) ..., φ kpca(X c)], φ kpca(y) represent the φ (y) after dimensionality reduction normalization.
Step 2, the new training sample φ of each sample class obtaining after step 1 is processed kpca(x ij) form new sub-training sample set φ kpca(X i), (i=1,2 ..., c; J=1,2 ..., N i).
Step 3, calculates the linear expression coefficient of new test sample book in each new sub-training sample set and corresponding reconstructed error; To the new sub-training sample set φ of i classification kpca(X i) utilize expression formula
Figure BDA0000459658850000033
determine new test sample book φ kpca(y) at the new sub-training sample set φ of i class kpca(X i) interior linear expression coefficient
Figure BDA0000459658850000034
with corresponding reconstructed error r ikpca(y))=|| φ kpca(y)-φ kpca(X i) β i|| 2(i=1,2 ..., c).
Step 4, newer test sample book φ kpca(y) at each new sub-training sample set φ kpca(X i) interior reconstructed error, as new test sample book φ kpca(y) at the new sub-training sample set φ of k class kpca(X k) interior reconstructed error minimum, i.e. r kkpca(y)) at r 1kpca(y)), r 2kpca(y)) ..., r ckpca(y)), hour, test sample y is classified as to k class, described k is the natural number that is less than or equal to c.
Select Yale B face database (The Yale Face Database B) and the MNIST handwriting digital database authentication sorting technique of differentiating linear expression based on core of the present invention below.Yale B face database is A.S.Georghiades author, P.N.Belhumeur, with " From Few to Many:Illumination Cone Models for Face Recognition under Variable Lighting and the Pose " (calendar year 2001 of D.J.Kriegman, the 23rd volume, the 6th periodical, 643-660 page, < < IEEE pattern analysis and machine intelligence > >) in describe to some extent.MNIST handwriting digital database is Y.Mizukami author, K.Tadamura, J.Warrell, " the CUDA Implementation of Deformable Pattern Recognition and Its Application to MNIST Handwritten Digit Database " of P.Li and S.Prince (2010,2001-2004 page, < < pattern-recognition > >) in describe to some extent.
The totally 5760 width images that Yale B face database has comprised 10 people, the gray level image of everyone 576 640 × 480 sizes, comprises 9 kinds of posture changings and 64 kinds of illumination conversion.All original images of 10 people have been selected in experiment, and all original images of choosing are proofreaied and correct to (making two is horizontal), convergent-divergent and cutting, and each image pattern only retains the human face region of 32 × 32 sizes.In experiment from everyone 576 width images random select 70 width images as training sample, 20 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian radial basis function (Gaussian radial basis function, Gaussian RBF) as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
MNIST handwriting digital database is by 10 hand-written arabic numeral, and totally 10000 width pictures form, and each numeral has 863~1127 width black white images, has comprised various forms of hand-written arabic numeral.Original handwritten numeral image size is 20 × 20, then by reverse sawtooth technology, is converted to 28 × 28 sizes.In experiment from each handwriting digital image random select 100 width images as training sample, 30 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian RBF as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
Method for classifying modes and the of the present invention average recognition rate of based on core differentiating the sorting technique of linear expression of experiment statistics based on differentiating linear expression, in Table 1, wherein average recognition rate is defined as the ratio of identifying correct number of times and total identification number of times.As can be seen from Table 1, the average recognition rate of the sorting technique based on core discriminating linear expression is apparently higher than the method for classifying modes based on differentiating linear expression.
Table 1 is differentiated the average recognition rate of the method for classifying modes of linear expression based on discriminating and core
Figure BDA0000459658850000041
Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.

Claims (4)

1. based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, it is characterized in that: described pre-service is done dimensionality reduction computing for adopting core principle component analysis method to treat processing sample, again all samples after dimensionality reduction are done to normalization computing.
2. the sorting technique of differentiating as claimed in claim 1 linear expression based on core, is characterized in that: the method that described normalization computing adopts is L 2-norm method for normalizing.
3. a face identification method, it is characterized in that: the facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as claimed in claim 1 or 2 to identify the classification under test sample book.
4. a Handwritten Numeral Recognition Method, it is characterized in that: the digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as claimed in claim 1 or 2 to identify the classification under test sample book.
CN201410026937.6A 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation Pending CN103761533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026937.6A CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026937.6A CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Publications (1)

Publication Number Publication Date
CN103761533A true CN103761533A (en) 2014-04-30

Family

ID=50528768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026937.6A Pending CN103761533A (en) 2014-01-21 2014-01-21 Classification method based on nuclear identification linear representation

Country Status (1)

Country Link
CN (1) CN103761533A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656737A (en) * 2018-10-31 2019-04-19 阿里巴巴集团控股有限公司 The statistical method and device of exception information

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238200A1 (en) * 2004-04-27 2005-10-27 Rakesh Gupta Simultaneous localization and mapping using multiple view feature descriptors
CN102073880A (en) * 2011-01-13 2011-05-25 西安电子科技大学 Integration method for face recognition by using sparse representation
CN102122355A (en) * 2011-03-15 2011-07-13 西安电子科技大学 SAR (synthetic aperture radar) target identification method based on nuclear sparse representation
CN102930301A (en) * 2012-10-16 2013-02-13 西安电子科技大学 Image classification method based on characteristic weight learning and nuclear sparse representation
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
CN102982349A (en) * 2012-11-09 2013-03-20 深圳市捷顺科技实业股份有限公司 Image recognition method and device
CN103106409A (en) * 2013-01-29 2013-05-15 北京交通大学 Composite character extraction method aiming at head shoulder detection
CN103226714A (en) * 2013-05-09 2013-07-31 山东大学 Sparse coding method reinforced based on larger coding coefficient

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050238200A1 (en) * 2004-04-27 2005-10-27 Rakesh Gupta Simultaneous localization and mapping using multiple view feature descriptors
CN102073880A (en) * 2011-01-13 2011-05-25 西安电子科技大学 Integration method for face recognition by using sparse representation
CN102122355A (en) * 2011-03-15 2011-07-13 西安电子科技大学 SAR (synthetic aperture radar) target identification method based on nuclear sparse representation
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
CN102930301A (en) * 2012-10-16 2013-02-13 西安电子科技大学 Image classification method based on characteristic weight learning and nuclear sparse representation
CN102982349A (en) * 2012-11-09 2013-03-20 深圳市捷顺科技实业股份有限公司 Image recognition method and device
CN103106409A (en) * 2013-01-29 2013-05-15 北京交通大学 Composite character extraction method aiming at head shoulder detection
CN103226714A (en) * 2013-05-09 2013-07-31 山东大学 Sparse coding method reinforced based on larger coding coefficient

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张楠: "低秩鉴别分析与回归分类方法研究", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656737A (en) * 2018-10-31 2019-04-19 阿里巴巴集团控股有限公司 The statistical method and device of exception information

Similar Documents

Publication Publication Date Title
Obaidullah et al. PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification
Akbari et al. Wavelet-based gender detection on off-line handwritten documents using probabilistic finite state automata
CN107169485B (en) Mathematical formula identification method and device
Sharma et al. Indian sign language recognition using neural networks and KNN classifiers
Chitlangia et al. Handwriting analysis based on histogram of oriented gradient for predicting personality traits using SVM
CN108509833B (en) Face recognition method, device and equipment based on structured analysis dictionary
CN103839042B (en) Face identification method and face identification system
CN105117708A (en) Facial expression recognition method and apparatus
Angona et al. Automated Bangla sign language translation system for alphabets by means of MobileNet
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN103366175B (en) Based on the scene image classification method that potential Di Li Cray distributes
Halder et al. Content independent writer identification on Bangla Script: a document level approach
Maring et al. Recognition of cheising iyek/eeyek-manipuri digits using support vector machines
Ahmed et al. Recognition of Urdu Handwritten Alphabet Using Convolutional Neural Network (CNN).
Antony et al. Haar features based handwritten character recognition system for Tulu script
Inunganbi et al. Recognition of handwritten Meitei Mayek script based on texture feature
Pratiwi et al. Personality type assessment system by using enneagram-graphology techniques on digital handwriting
Rasel et al. An efficient framework for hand gesture recognition based on histogram of oriented gradients and support vector machine
Kumar et al. RWIL: robust writer identification for Indic language
Anil et al. Malayalam character recognition using singular value decomposition
Tang et al. Hierarchical kernel-based rotation and scale invariant similarity
Halder et al. Individuality of isolated Bangla numerals
Kishan et al. Handwritten character recognition using CNN
CN103761533A (en) Classification method based on nuclear identification linear representation
Ma et al. Facial expression recognition based on characteristics of block LGBP and sparse representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140430