Summary of the invention
It is generally existing to outlier non-robust and height in order to solve the canonical correlations algorithm such as traditional CCA, KCCA and LPCCA
Small sample problem is tieed up, is affected when causing to extract characteristics of image by noise and image pattern amount are few, to reduce identification
The problem of rate, the present invention propose a kind of image-recognizing method based on extended mean value canonical correlation analysis, and the method is based on broad sense
The robust canonical correlation analysis algorithm (CCA based on generalized mean, GMCCA) of mean value carries out image recognition.
In GMCCA algorithm, the correlated error concept in projector space between sample is proposed first, to better describe the sample after projection
Between similarity degree;Secondly, being based on extended mean value, objective function is rebuild by correlated error, is replaced original based on L2
The objective function of the least mean-square error of norm, obtains new model;Finally, the method by linear iteraction solves new model.?
Multiple features handwritten form database (Multiple feature database, MFD), human face data collection (ORL) and characteristics of objects number
Showing new algorithm not only according to the experiment in library (COIL-20) three real data sets has better robustness, but also avoids
The problem that higher-dimension small sample causes sample covariance matrix unusual.A kind of image recognition based on extended mean value canonical correlation analysis
The step of method, specifically can be described as follows:
(1) image pattern is collected;
(2) sample set that one group of size is N is inputtedThe parameter p of extended mean value, it is interior
Portion iteration total degree T1And T2, outer iteration total degree T, the dimension d of feature after dimensionality reduction;
(3) firstly, calculating sample set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN) central value:
It is used in combinationWithCentralization X and Y:
For uniformity, after centralizationWithStill it is denoted as X=(x1,x2,...,xN) and Y=(y1,y2,...,
yN);
(4) traditional canonical correlation analysis (Canonical correlation analysis, CCA) is to find two groups of samples
The projection vector of this collection X and YWithSo that in projector space two groups of sample sets feature have maximum
Correlation, criterion function is as follows:
Above formula is converted into two following eigen characteristic value problems:
And wxAnd wyWith following equilibrium relationships:
Sxywy=λ Sxxwx,Syxwx=λ Syywy
Combination of eigenvectors corresponding to maximum preceding d eigen characteristic value is finally chosen into two groups of set of projectionsWith
As can be seen that the solution of CCA is needed to SxxAnd SyyIt inverts.But higher-dimension small sample easily leads to SxxAnd SyyIt is unusual, shadow
Ring the performance of CCA;
(5) p ≠ 0 is assumed, for a scalar datasets { ai> 0, i=1,2 ..., N extended mean value MGBe defined as
Under:
Further analysis, extended mean value MGInIt can be by data set { aiOne group of non-negative linear combination table
Show, as follows:
biA can be regarded asiWeight, i.e. aiTo MGContribution margin.As p < 1, with aiIt is bigger, biIt is smaller, it is meant that when
When p < 1, extended mean value MGBy { aiIn smaller value be affected, and p is smaller, influences bigger.This property of extended mean value
Matter plays main function in the influence that GMCCA inhibits outlier.
Correlated error e (W in projector space shown in being defined as follows between samplex,Wy):
In conjunction with above-mentioned extended mean value and correlated error, the robust canonical correlation analysis based on extended mean value as follows is constructed
The objective optimization function of (CCA based on generalized mean, GMCCA):
Above-mentioned objective function is solved, obtains WxAnd Wy.Take WxAnd WyTwo groups of set of projections of d column composition GMCCA before obtainingWithThe essence of GMCCA robustness as can be seen from the above equation: as p < 1, αiValue with relative error increase and
Reduce, therefore, sample point biggish for correlated error in projector space, i.e. outlier impart lesser weight, inhibit wild
Adverse effect of the value point to criterion function, enhances the robustness of algorithm;
(6) it is acquired using step (5)WithFeature extraction and dimensionality reduction are carried out to original sample:
It willWithFor next pattern recognition task;
(7) identification mission of image is completed using nearest neighbor classifier.
Above-mentioned objective function is solved by a kind of linear iterative method, and this method is specific as follows:
Assuming that the number t of current iteration1=t2=t=0;The W that the t times iteration obtainsxAnd WyRespectivelyWithAnd
InitializationWith
It is fixed firstIt is acquired by following minimum problem:
I.e.It isThe corresponding orthogonal eigenvectors collection of maximum d characteristic value;
At this point, withIt goes to update Wx.It is fixedSimilarly,It is acquired by following minimum problem:
So far it can obtain, solve WxAnd WyLinear iterative algorithm it is as follows:
As can be seen that GMCCA is different from traditional CCA from above-mentioned linear iterative method, two groups of features of GMCCA are thrown
Photograph album is to separate to solve to obtain, WxAnd WyHave no the equilibrium relationships in CCA.WxAnd WyIt is the weighting association of sample set X and Y respectively
The corresponding orthogonal eigenvectors collection of the maximum d characteristic value of variance.Moreover, entire solution procedure is not related to sample set X and Y
Covariance matrix invert.Therefore, GMCCA avoids the higher-dimension small sample in traditional CCA and causes sample covariance matrix odd
Different problem
GMCCA algorithm used in the present invention has the advantage that
(1) influence of the outlier to objective optimization function is inhibited by extended mean value.
(2) the sample rotational invariance of Euclidean distance is remained.
(3) GMCCA avoids the problem that high dimensional and small sample size problem causes sample covariance matrix unusual.
And a kind of image-recognizing method based on extended mean value canonical correlation analysis provided by the invention is mentioned in characteristics of image
Strong robustness when taking can preferably cope with picture noise and the few problem of image pattern amount, so having higher discrimination.
Specific embodiment
In order to illustrate the object, technical solutions and advantages of the present invention, below in conjunction with specific embodiments and drawings, to the present invention
It is described in further details.
Referring to Fig.1, specific implementation process of the invention the following steps are included:
(1) image pattern is collected;
(2) sample set that one group of size is N is inputtedThe parameter p of extended mean value, it is interior
Portion iteration total degree T1And T2, outer iteration total degree T, the dimension d of feature after dimensionality reduction;
(3) firstly, centralization sample set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN):
Calculate sample set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN) central value:
It is used in combinationWithCentralization X and Y:
For uniformity, after centralizationWithStill it is denoted as X=(x1,x2,...,xN) and Y=(y1,y2,...,yN);
(4) objective function of tradition CCA is as follows:
Above formula is converted into two following eigen characteristic value problems:
Combination of eigenvectors corresponding to maximum preceding d eigen characteristic value is finally chosen into two groups of set of projectionsWith
(5) p ≠ 0 is assumed, for a scalar datasets { ai> 0, i=1,2 ..., N extended mean value MGBe defined as
Under:
And the correlated error e (W in projector space shown in being defined as follows between samplex,Wy):
In conjunction with above-mentioned extended mean value and correlated error, the robust canonical correlation analysis based on extended mean value as follows is constructed
The objective optimization function of (CCA based on generalized mean, GMCCA):
Above-mentioned objective optimization function is solved by a kind of linear iterative method, and this method is specific as follows:
Assuming that the number t of current iteration1=t2=t=0;The W that the t times iteration obtainsxAnd WyRespectivelyWithAnd
InitializationWith
It is fixed firstIt is acquired by following minimum problem:
I.e.It isThe corresponding orthogonal eigenvectors collection of maximum d characteristic value;
At this point, withIt goes to update Wx.It is fixedSimilarly,It is acquired by following minimum problem:
So far it can obtain, solve WxAnd WyLinear iterative algorithm it is as follows:
Take WxAnd WyTwo groups of set of projections of d column composition GMCCA before obtainingWith
(6) it is acquired using step (5)WithFeature extraction and dimensionality reduction are carried out to original sample:
It willWithFor next pattern recognition task;
(7) identification mission of image is completed using nearest neighbor classifier.
Effect of the invention can be further illustrated by the experiment below on truthful data library.
1. description of test
For the validity for verifying GMCCA, this section is in multiple features handwritten form database (Multiple feature
Database, MFD), it carries out in three real data sets of human face data collection (ORL) and characteristics of objects database (COIL-20) real
It tests, and induces CCA with PCA, CCA, robust CCA (Robust CCA, ROCCA), complete CCA (Complete CCA, C3A) and core
(CCA based on kernel-induced measure, KI-CCA) is compared.ROCCA passes through building approximate matrix generation
For sample covariance matrix, eliminate high dimensional and small sample size problem, the experimental verification of the identification validity of ROCCA.C3A overcomes
The problem of CCA may lose information, extracts more complete canonical correlation information.ROCCA induces distance metric generation with core
For the euclidean distance metric of traditional CCA, while improving algorithm robustness, and nonlinear problem is solved.
In all experiments herein, the p of GMCCA is set as 0.1, t1、t210,10 and 20 are respectively set to T.PCA needs
2 groups of features are joined end to end to form new high dimensional feature vector, then carry out feature extraction with PCA, CCA, ROCCA, C3A,
By concatenated mode after KICCA and GMCCA extraction feature, i.e., the feature after two groups of dimensionality reductions is serially connected end to end
Carry out discriminance analysis.Classifier uses nearest neighbor classifier.
2. experimental result
Test the experiment of 1 multiple features handwritten form
The performance of GMCCA is tested in this experiment with the hand-written volumetric data set of multiple features (MFD).The data set is UCI machine learning
One component part (http://archive.ics.uci.edu/ml/datasets/Multiple+ of knowledge base
Features), there is important value in Handwritten Digital Recognition.The database includes 0~9 totally 10 digital 6 features
Data set, 200 samples of every class, totally 2000 samples, are widely used in the research of pattern-recognition and machine learning.From two-value
Change and extract 6 features in handwriting digital image, including Fourier coefficient, profile correlated characteristic, Karhunen-Loeve expansion
Feature, pixel be average, Zernike square and morphological feature, corresponding feature name and dimension are respectively as follows: (fou, 76),
(fac, 216), (kar, 64), (pix, 240), (zer, 47) and (mor, 6).On this data set, optional 2 groups of feature conducts
Input, shares 15 kinds of combinations.Each feature is combined, 100 samples are randomly selected from every class as training, are left
100 samples as test.
Table 1 show the average recognition result for 10 random experiments that 6 kinds of algorithms are closed in different characteristic group, every kind of algorithm
In best identified rate indicated with black matrix, similarly hereinafter.It can be seen that in most of combination from result shown in table
The average recognition rate of GMCCA algorithm is better than other algorithms, is especially apparent the recognition effect higher than CCA, in addition, 15 kinds of combinations is flat
Equal discrimination is also above other algorithms.These result verifications validity of GMCCA.In fou-pix, kar-pix, mor-pix
In mor-zer combination, the discrimination of GMCCA is also illustrated that lower than other algorithms although the discrimination of GMCCA is still higher than CCA
GMCCA still has shortcoming in the combination of some features.
16 kinds of algorithms of table recognition result that different characteristic group is closed in MFD experiment
Test the experiment of 2 ORL face databases
In order to further verify the validity of GMCCA, the ORL database that human face posture changes greatly is chosen in this experiment
(http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html).The database
It is made of the laboratory Britain Camb Olivetti from April, 1992 to a series of facial images shot during in April, 1994,
Share 40 all ages and classes, different sexes and not agnate object.Each 10 width image of object amounts to 400 width gray level image groups
At picture size is 92 × 112, and image background is black.Wherein face part expression and details change, for example, laugh at not
Laugh at, eyes are opened or closed, and are worn or are not worn glasses, human face posture also changes, depth rotation and Plane Rotation up to 20
Degree, facial size also have most 10% variation.The library is current most popular standard database, it contains a large amount of ratio
Relatively result.Fig. 2 shows the 6 width images of a people in ORL database.
4,5,6,7 or 8 width images are randomly selected in experiment from everyone 10 width images as training, remaining, which is used as, surveys
Examination;3 groups of features are extracted to each image.Wherein, primitive image features are denoted as O;By original image local binary patterns
Feature after (Local binary pattern, LBP) is extracted is denoted as L;By original image histograms of oriented gradients
Feature after (Histogram of Oriented Gradient, HOG) is extracted is denoted as H.LBP and HOG feature and combinations thereof is special
Sign has proved to be effective in recognition of face problem.In order to avoid singularity problem, with PCA by above-mentioned 3 kinds of feature reducings
To 100 dimensions.
Table 2 show the average recognition result for 10 random experiments that 6 kinds of algorithms are closed in 3 kinds of feature groups, and " n " indicates every
The number of training of class, similarly hereinafter.From table 2 it can be seen that the recognition effect of GMCCA is in 3 kinds of various combinations in the case where number big absolutely
It is superior to other 5 kinds of algorithms, and more all combined average recognition rates, GMCCA is also superior to other 5 kinds of algorithms.From table 2
It can also be seen that the recognition effect of GMCCA improves a lot than traditional CCA, and especially when number of training is less, such as every class
4 training samples.These are the result shows that the feature of GMCCA extraction demonstrates the validity of method with more robustness.In table 2
Result also show, there are four types of in the case of, the discrimination of GMCCA is slightly below other algorithms, but very close with optimal value.
Recognition result of the 26 kinds of algorithms of table on ORL face database
The preceding 4 width image for choosing everyone in ORL database again is trained, and residual image is for testing, experimental result
As shown in figure 3, figure 4 and figure 5.Because other 5 kinds of algorithms are substantially better than PCA in table 2, Fig. 2, Fig. 3 and Fig. 4 illustrate only 5
Kind canonical correlation algorithm is in the case where 3 kinds of features are combined with the recognition result of dimension variation.It can be seen that from Fig. 2, Fig. 3 and Fig. 4 that GMCCA
Better than other 4 kinds of algorithms, especially in the case where dimension is less, the discrimination of GMCCA is apparently higher than other algorithms.From algorithm
Stability angle, GMCCA are also preferable than other 4 kinds of algorithms.Experimental result effectively demonstrates the robustness of GMCCA again.
Test the experiment of 3 COIL-20 object databases
Using the COIL-20 object database being widely used in the world, COIL-20 is Columbia University for this section experiment
One include 20 objects image data base (http://www.cs.columbia.edu/CAVE/software/
Softlib/coil-20.php), the database respectively to each object from 0 °~360 ° carry out horizontal direction rotation, every 5 °
Piece image is sampled, each object is total to take 72 width images, amounts to 1440 width images.The database has been successfully applied for
The fields such as pattern-recognition and machine learning, such as the visualizations of data, the estimation of posture.20 in COIL-20 database are right
As shown in Figure 6.
In experiment, 10,20,30,40 and 50 width images, remaining image are randomly selected from 72 width images of each object
As test.It is independent to carry out 10 random experiments, then calculate its average recognition rate.3 groups of spies are extracted to each image in experiment
Sign.This experiment will primitive image features be denoted as O;Feature after original image is extracted with LBP is denoted as L;Original image is used
Feature after HOG is extracted is denoted as H.And PCA is executed by above-mentioned 3 kinds of feature reducings to 50 dimensions.
Table 3 shows the average recognition result for 10 random experiments that 6 kinds of algorithms are closed in 3 kinds of feature groups.From the reality of table 3
It tests result and can be seen that GMCCA and be substantially better than traditional CCA.In most cases, GMCCA is slightly better than ROCCA effect.
In table 3, the discrimination of CCA and C3A are suitable, illustrate this data set after PCA extracts Feature Dimension Reduction, CCA can extract complete
Characteristic information, and the discrimination of GMCCA be better than CCA and C3A, also illustrate that GMCCA not only extracts complete characteristic information, and
And the feature extracted more has robustness.In table 3 still there are two types of, the discrimination of GMCCA is than the summary of other algorithms
It is low, but difference very little.Also, from the point of view of ensemble average discrimination, GMCCA is better than other 5 kinds of algorithms.These experiment shows
GMCCA validity and robustness.
Recognition result of the 36 kinds of algorithms of table on COIL-20 object database
The preceding 25 width image for choosing each object in COIL-20 database again is trained, and residual image is used to test,
Fig. 7, Fig. 8 and Fig. 9 show algorithm in 5 in the case where 3 kinds of features are combined with the recognition result of dimension variation.It can from the result of 3 figures
To find out that GMCCA is substantially better than other 4 kinds of algorithms, compared to traditional CCA, discrimination is enhanced, and further
Demonstrate GMCCA discrimination when dimension is less conclusion more higher than other algorithms.Moreover, increase of the GMCCA with dimension, identification
Rate more tends towards stability than other 4 kinds of algorithms, and the feature that these results illustrate that GMCCA is extracted more has robustness.It is noted that
The Dimension-Recognition Rate broken line of CCA and C3A is to be overlapped, and demonstrates the discrimination of CCA and C3A in table 3
Comparable conclusion illustrates that CCA can extract complete characteristic information from data set.This is also reflected in CCA and can mention in side
While taking standby information, GMCCA can inhibit the influence of outlier, extract more robust feature.Above-mentioned experimental result
Further demonstrate the validity and robustness of GMCCA.