CN103761533A - Classification method based on nuclear identification linear representation - Google Patents
Classification method based on nuclear identification linear representation Download PDFInfo
- Publication number
- CN103761533A CN103761533A CN201410026937.6A CN201410026937A CN103761533A CN 103761533 A CN103761533 A CN 103761533A CN 201410026937 A CN201410026937 A CN 201410026937A CN 103761533 A CN103761533 A CN 103761533A
- Authority
- CN
- China
- Prior art keywords
- training sample
- test sample
- new
- book
- sample book
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a classification method based on nuclear identification linear representation. A training sample set is utilized to identify the category of tested samples. First, all the samples in the training sample set and the testing samples are pre-treated to obtain a new training sample set and new tested samples. The new tested samples of various sample categories constitute new sub-training sample sets. Then, the reconstruction error of the new tested samples in the new sub-training sample sets is calculated. Finally, the tested samples are classified as the category corresponding to the new sub-training sample sets of smallest reconstruction error. The pre-treatment mainly refers to adopting the nuclear component analysis method to conduct dimension reduction calculation on the processed samples, and then conducting the normalization calculation on all the samples after dimension reduction. Compared with the prior art, the classification method based on the nuclear identification linear representation greatly improves identification precision.
Description
Technical field
The present invention relates to differentiate based on core the sorting technique of linear expression, belong to the technical field of pattern-recognition.
Background technology
Pattern classification refers to be processed and analyzes the various forms of information of sign things or phenomenon, with the process that things or phenomenon are described, recognize, are classified and explain, is the important component part of information science and artificial intelligence.Pattern classification can be applied to multiple fields such as word identification, speech recognition, fingerprint recognition, data mining, remote sensing images identification, medical diagnosis.
Method for pattern classification is a lot, such as, Chinese patent application 201310060437.X is on February 26th, 2013 disclosed " based on the method for classifying modes of differentiating linear expression ", training sample set is divided into sub-training sample set according to sample class, calculate the linear expression coefficient of test sample book in each sub-training sample set, then calculate the reconstructed error of test sample book in each sub-training sample set, when the reconstructed error of test sample book in the sub-training sample set of certain class hour, test sample book is classified as to this class.By reducing training sample number, reduced difficulty in computation, but discrimination is not high.
Summary of the invention
Object of the present invention, is to provide a kind of sorting technique of differentiating linear expression based on core, has greatly improved accuracy of identification.
To achieve these goals, the technical solution adopted in the present invention is as follows:
Based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, described pre-service is for adopting core principle component analysis method (Kernel Principal Component Analysis, KPCA) treat processing sample and do dimensionality reduction computing, again all samples after dimensionality reduction are done to normalization computing.
Preferably, the method that described normalization computing adopts is L
2-norm method for normalizing.
A kind of face identification method, facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize after digitizing and training sample and test sample book data that column vector obtains, use as mentioned above based on core and differentiate the classification under the sorting technique identification test sample book of linear expression.
A kind of Handwritten Numeral Recognition Method, digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as mentioned above to identify the classification under test sample book.
Adopt after such scheme, the sorting technique of differentiating linear expression based on core of the present invention, than prior art, carries out all samples in training sample set and test sample book respectively carrying out pattern classification after coring processing in advance again, has further improved accuracy of identification.
Embodiment
Below technical scheme of the present invention is described in further detail.
Utilization comprises the affiliated classification of training sample set X identification test sample y of c classification, comprises the steps.
Make X=[X
1, X
2..., X
c],
represent i class training sample set, X
icomprise N
iindividual sample, x
ij∈ R
d(R
drepresent the real vector set of d dimension) represent j training sample of i class, (i=1,2 ..., c; J=1,2 ..., N
i),
y ∈ R
d, c is greater than 1 natural number, N
ifor natural number.
Step 1, does core mapping, dimensionality reduction and normalization computing to all samples in training sample set X and test sample y, obtains new training sample set and new test sample book; By core mapping phi: R
d→ F is mapped to the non-linear space F of higher-dimension by all samples in training sample set X and test sample y from the linear space of d dimension, and (dimension of F is much larger than d), training sample xi
jbe mapped as φ (xi
j), test sample y is mapped as φ (y).Order
φ (X)=[φ (X
1), φ (X
2) ..., φ (X
c)].
By core principle component analysis method (KPCA), the dimension of higher-dimension nuclear space sample is reduced to N-1, N is training sample sum, and the sample after dimensionality reduction is carried out to L
2-norm normalization.KPCA method is B.Scholkopf author, " the Nonlinear Component Analysis as a Kernel Eigenvalue Problem (eigenvalue problem that non-linear component analysis is core) " of A.Smola and K.Muller (1998, the 10th volume, the 5th periodical, 1299-1319 page, the neural > of calculating of < < >) in describe to some extent.L
2the ratio of normalized two norms that are defined as sample and sample of-norm.Use φ
kpca(x
ij) represent the φ (x after dimensionality reduction normalization
ij),
φ k
pca(X)=[φ
kpca(X
1), φ
kpca(X
2) ..., φ
kpca(X
c)], φ
kpca(y) represent the φ (y) after dimensionality reduction normalization.
Step 2, the new training sample φ of each sample class obtaining after step 1 is processed
kpca(x
ij) form new sub-training sample set φ
kpca(X
i), (i=1,2 ..., c; J=1,2 ..., N
i).
Step 3, calculates the linear expression coefficient of new test sample book in each new sub-training sample set and corresponding reconstructed error; To the new sub-training sample set φ of i classification
kpca(X
i) utilize expression formula
determine new test sample book φ
kpca(y) at the new sub-training sample set φ of i class
kpca(X
i) interior linear expression coefficient
with corresponding reconstructed error r
i(φ
kpca(y))=|| φ
kpca(y)-φ
kpca(X
i) β
i||
2(i=1,2 ..., c).
Step 4, newer test sample book φ
kpca(y) at each new sub-training sample set φ
kpca(X
i) interior reconstructed error, as new test sample book φ
kpca(y) at the new sub-training sample set φ of k class
kpca(X
k) interior reconstructed error minimum, i.e. r
k(φ
kpca(y)) at r
1(φ
kpca(y)), r
2(φ
kpca(y)) ..., r
c(φ
kpca(y)), hour, test sample y is classified as to k class, described k is the natural number that is less than or equal to c.
Select Yale B face database (The Yale Face Database B) and the MNIST handwriting digital database authentication sorting technique of differentiating linear expression based on core of the present invention below.Yale B face database is A.S.Georghiades author, P.N.Belhumeur, with " From Few to Many:Illumination Cone Models for Face Recognition under Variable Lighting and the Pose " (calendar year 2001 of D.J.Kriegman, the 23rd volume, the 6th periodical, 643-660 page, < < IEEE pattern analysis and machine intelligence > >) in describe to some extent.MNIST handwriting digital database is Y.Mizukami author, K.Tadamura, J.Warrell, " the CUDA Implementation of Deformable Pattern Recognition and Its Application to MNIST Handwritten Digit Database " of P.Li and S.Prince (2010,2001-2004 page, < < pattern-recognition > >) in describe to some extent.
The totally 5760 width images that Yale B face database has comprised 10 people, the gray level image of everyone 576 640 × 480 sizes, comprises 9 kinds of posture changings and 64 kinds of illumination conversion.All original images of 10 people have been selected in experiment, and all original images of choosing are proofreaied and correct to (making two is horizontal), convergent-divergent and cutting, and each image pattern only retains the human face region of 32 × 32 sizes.In experiment from everyone 576 width images random select 70 width images as training sample, 20 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian radial basis function (Gaussian radial basis function, Gaussian RBF) as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
MNIST handwriting digital database is by 10 hand-written arabic numeral, and totally 10000 width pictures form, and each numeral has 863~1127 width black white images, has comprised various forms of hand-written arabic numeral.Original handwritten numeral image size is 20 × 20, then by reverse sawtooth technology, is converted to 28 × 28 sizes.In experiment from each handwriting digital image random select 100 width images as training sample, 30 width images as test sample book, training sample and test sample book are carried out to digitizing and column vector (every width image obtains the column vector of one 1024 dimension), obtain training sample and test sample book data after digitizing and column vector, select Gaussian RBF as core mapping function, use the above-mentioned sorting technique based on core discriminating linear expression to identify the classification under test sample book.
Method for classifying modes and the of the present invention average recognition rate of based on core differentiating the sorting technique of linear expression of experiment statistics based on differentiating linear expression, in Table 1, wherein average recognition rate is defined as the ratio of identifying correct number of times and total identification number of times.As can be seen from Table 1, the average recognition rate of the sorting technique based on core discriminating linear expression is apparently higher than the method for classifying modes based on differentiating linear expression.
Table 1 is differentiated the average recognition rate of the method for classifying modes of linear expression based on discriminating and core
Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.
Claims (4)
1. based on core, differentiate the sorting technique of linear expression, utilize the affiliated classification of training sample set identification test sample book, first all samples in training sample set and test sample book are carried out respectively to pre-service, obtain new training sample set and new test sample book, the new training sample of each sample class forms new sub-training sample set, then calculate the reconstructed error of new test sample book in each new sub-training sample set, finally test sample book is classified as to corresponding that class of new sub-training sample set of reconstructed error minimum, it is characterized in that: described pre-service is done dimensionality reduction computing for adopting core principle component analysis method to treat processing sample, again all samples after dimensionality reduction are done to normalization computing.
2. the sorting technique of differentiating as claimed in claim 1 linear expression based on core, is characterized in that: the method that described normalization computing adopts is L
2-norm method for normalizing.
3. a face identification method, it is characterized in that: the facial image in face database is carried out to pre-service, from pretreated facial image, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as claimed in claim 1 or 2 to identify the classification under test sample book.
4. a Handwritten Numeral Recognition Method, it is characterized in that: the digital picture in handwritten form numerical data base is carried out to pre-service, from pretreated digital picture, choose training sample and test sample book, to training sample and test sample book digitizing and column vector, utilize the training sample and the test sample book data that after digitizing and column vector, obtain, use the sorting technique based on core discriminating linear expression as claimed in claim 1 or 2 to identify the classification under test sample book.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410026937.6A CN103761533A (en) | 2014-01-21 | 2014-01-21 | Classification method based on nuclear identification linear representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410026937.6A CN103761533A (en) | 2014-01-21 | 2014-01-21 | Classification method based on nuclear identification linear representation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103761533A true CN103761533A (en) | 2014-04-30 |
Family
ID=50528768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410026937.6A Pending CN103761533A (en) | 2014-01-21 | 2014-01-21 | Classification method based on nuclear identification linear representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103761533A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109656737A (en) * | 2018-10-31 | 2019-04-19 | 阿里巴巴集团控股有限公司 | The statistical method and device of exception information |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050238200A1 (en) * | 2004-04-27 | 2005-10-27 | Rakesh Gupta | Simultaneous localization and mapping using multiple view feature descriptors |
CN102073880A (en) * | 2011-01-13 | 2011-05-25 | 西安电子科技大学 | Integration method for face recognition by using sparse representation |
CN102122355A (en) * | 2011-03-15 | 2011-07-13 | 西安电子科技大学 | SAR (synthetic aperture radar) target identification method based on nuclear sparse representation |
CN102930301A (en) * | 2012-10-16 | 2013-02-13 | 西安电子科技大学 | Image classification method based on characteristic weight learning and nuclear sparse representation |
CN102938070A (en) * | 2012-09-11 | 2013-02-20 | 广西工学院 | Behavior recognition method based on action subspace and weight behavior recognition model |
CN102982349A (en) * | 2012-11-09 | 2013-03-20 | 深圳市捷顺科技实业股份有限公司 | Image recognition method and device |
CN103106409A (en) * | 2013-01-29 | 2013-05-15 | 北京交通大学 | Composite character extraction method aiming at head shoulder detection |
CN103226714A (en) * | 2013-05-09 | 2013-07-31 | 山东大学 | Sparse coding method reinforced based on larger coding coefficient |
-
2014
- 2014-01-21 CN CN201410026937.6A patent/CN103761533A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050238200A1 (en) * | 2004-04-27 | 2005-10-27 | Rakesh Gupta | Simultaneous localization and mapping using multiple view feature descriptors |
CN102073880A (en) * | 2011-01-13 | 2011-05-25 | 西安电子科技大学 | Integration method for face recognition by using sparse representation |
CN102122355A (en) * | 2011-03-15 | 2011-07-13 | 西安电子科技大学 | SAR (synthetic aperture radar) target identification method based on nuclear sparse representation |
CN102938070A (en) * | 2012-09-11 | 2013-02-20 | 广西工学院 | Behavior recognition method based on action subspace and weight behavior recognition model |
CN102930301A (en) * | 2012-10-16 | 2013-02-13 | 西安电子科技大学 | Image classification method based on characteristic weight learning and nuclear sparse representation |
CN102982349A (en) * | 2012-11-09 | 2013-03-20 | 深圳市捷顺科技实业股份有限公司 | Image recognition method and device |
CN103106409A (en) * | 2013-01-29 | 2013-05-15 | 北京交通大学 | Composite character extraction method aiming at head shoulder detection |
CN103226714A (en) * | 2013-05-09 | 2013-07-31 | 山东大学 | Sparse coding method reinforced based on larger coding coefficient |
Non-Patent Citations (1)
Title |
---|
张楠: "低秩鉴别分析与回归分类方法研究", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109656737A (en) * | 2018-10-31 | 2019-04-19 | 阿里巴巴集团控股有限公司 | The statistical method and device of exception information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Obaidullah et al. | PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification | |
Akbari et al. | Wavelet-based gender detection on off-line handwritten documents using probabilistic finite state automata | |
CN107169485B (en) | Mathematical formula identification method and device | |
Sharma et al. | Indian sign language recognition using neural networks and KNN classifiers | |
Chitlangia et al. | Handwriting analysis based on histogram of oriented gradient for predicting personality traits using SVM | |
CN108509833B (en) | Face recognition method, device and equipment based on structured analysis dictionary | |
CN103839042B (en) | Face identification method and face identification system | |
CN105117708A (en) | Facial expression recognition method and apparatus | |
Angona et al. | Automated Bangla sign language translation system for alphabets by means of MobileNet | |
Li et al. | Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes | |
CN103366175B (en) | Based on the scene image classification method that potential Di Li Cray distributes | |
Halder et al. | Content independent writer identification on Bangla Script: a document level approach | |
Maring et al. | Recognition of cheising iyek/eeyek-manipuri digits using support vector machines | |
Ahmed et al. | Recognition of Urdu Handwritten Alphabet Using Convolutional Neural Network (CNN). | |
Antony et al. | Haar features based handwritten character recognition system for Tulu script | |
Inunganbi et al. | Recognition of handwritten Meitei Mayek script based on texture feature | |
Pratiwi et al. | Personality type assessment system by using enneagram-graphology techniques on digital handwriting | |
Rasel et al. | An efficient framework for hand gesture recognition based on histogram of oriented gradients and support vector machine | |
Kumar et al. | RWIL: robust writer identification for Indic language | |
Anil et al. | Malayalam character recognition using singular value decomposition | |
Tang et al. | Hierarchical kernel-based rotation and scale invariant similarity | |
Halder et al. | Individuality of isolated Bangla numerals | |
Kishan et al. | Handwritten character recognition using CNN | |
CN103761533A (en) | Classification method based on nuclear identification linear representation | |
Ma et al. | Facial expression recognition based on characteristics of block LGBP and sparse representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140430 |