CN104376312A

CN104376312A - Face recognition method based on word bag compressed sensing feature extraction

Info

Publication number: CN104376312A
Application number: CN201410739127.5A
Authority: CN
Inventors: 周凯; 元昌安; 郑彦; 宋文展
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2014-12-08
Filing date: 2014-12-08
Publication date: 2015-02-25
Anticipated expiration: 2034-12-08
Also published as: CN104376312B

Abstract

The invention discloses a face recognition method based on word bag compressed sensing feature extraction. A face recognition system is used for the face recognition method. The face recognition method is characterized by comprising the following steps of extraction of scale invariant feature transformation features of an image, feature coding and fusion and classification of features with different scales. Compared with the original word bag model, the face recognition method is simple, practical and effective; and after key features of the image are extracted by scale invariant feature transformation, a clustering center or a learning dictionary is not required, and a random matrix is used. The key features are coded through the random matrix, a large amount of time can be saved, and the problem that a large amount of space information is lost in the original word bag model method is solved. Influences on face recognition due to change of illumination, shielding, expressions and the like of the face of a user can be avoided well, the recognition rate can be high, the running speed is also high, face recognition is carried out on a challenging AR (auxiliary register) database, the face recognition rate is greatly improved, and the instantaneity is high.

Description

Face recognition method based on bag-of-words compressed sensing feature extraction

Technical Field

The invention relates to a machine vision and image processing technology, in particular to a face recognition method.

Background

In the existing face recognition system, the brightness of light, the posture of a face, glasses and the like are challenging problems in face recognition, and feature extraction is a key step in image preprocessing. The existing face recognition methods are various, such as a method based on bag-of-words features, because the bag-of-words model ignores the spatial information of the features and makes the features unordered, the algorithm recognition rate is low, and the K-means clustering in the bag-of-words model takes long time, so that the running time of the whole algorithm is long.

Disclosure of Invention

The invention aims to provide a face recognition method based on bag-of-words compressed sensing feature extraction, which improves the performance of an algorithm and runs faster.

In order to achieve the purpose, the technical scheme of the method is as follows:

a face recognition method based on bag-of-words compressed sensing feature extraction comprises a face recognition system, and is characterized in that the recognition step comprises the following steps:

step one, extracting image features by using a method of transforming features with scale-invariant features

(1) Setting a function of an image asConvolving the image with a Gaussian kernel function to obtain scale spaces under different scales; the formula is as follows:

(1)

wherein,the position of the pixel is represented by,the scale space is represented by a scale of,representing scale space factorsA seed;

(2) after obtaining the scale space of the image, adoptPyramidal methods, i.e. spaces obtained by convolution of images with a differential Gaussian pyramid functionThe method of (1) searching for extreme points to obtainThe formula of (1) is:

(2)

where k is two different adjacent scale spaces;

(3) determining a key point according to the extreme point, and giving a direction to the key point to realize the rotation invariance of the image(ii) a ComputingEach point ofThe gradient and direction of (d);

(3)

(4)

(4) a neighborhood around the keypoint is selected, and a histogram is formed using the gradients of all points in this region centered on the keypoint. And gaussian weighted for the gradient at its midpoint. This neighborhood is divided into four sub-regions, taking eight directions in each sub-region. Thereby obtaining the scale-invariant feature transformation of the image;

step two, feature coding

Segmentation of images into blocksAfter blocking, each block obtains the local features of the image by using the scale invariant feature transformation, and then a random dictionary B is randomly generated by the system by using the concept of compressed sensing, and then feature codes are obtained through sparse representation;

if the system generates a random dictionary of one imageIs divided intoBlock，Is thatImage scale invariant feature transform extractionThe block feature, each local block of an image can be obtained by the feature coding according to the formula (5), which is as follows:

(5)

whereinIs a constant number of times that the number of the first,is the desired feature code

Step three, fusing features of different scales in the image

Obtaining a feature encoding matrix of an image using equation (5) as，Is corresponding toThe coefficients of a block, defined as the maximum pool method for fusing coefficients:

(6)

whereinIs a pool vectorTo (1) aThe number of the elements is one,representing a coefficient coding matrixIs/are as followsThe rows of the image data are, in turn,columns;

finally, a space pyramid matching algorithm is used, namely, a pair of images are segmented intoAnddifferent blocks can carry out feature coding on the subregions with different spatial positions and scales, if a spatial pyramid matching algorithm is used for obtaining the maximum pool of the scales asThen, the feature vectors of different scales and regions are connected in series to obtain the feature vector of the image;

step four, classifying

After the feature vector of each image is obtained by feature extraction by the method, the classification is carried out by adopting a kernel sparse representation method, a kernel function adopts a histogram cross kernel, and the expression is as follows:

(7)

whereinIs two dimensions ofIs determined by the feature vector of (a),are respectivelyThe eigenvalues of the eigenvectors;

if the training set obtained after the image feature extraction isThe test sample isTo a first orderOne of the test samples is taken as an example,can pass through a matrixThe kernel is represented as:

(8)

whereinIs a kernel functionAnd (3) the sparse coefficient of the high-dimensional feature projection space is obtained by expanding the above formula as follows:

(9)

whereinIs shown asThe number of the test specimens is determined,expression solutionAndcross kernel of the histogram of (1). Solving equation (9) to obtain the coefficientThen, finally, the minimum residual error is calculatedThe classification of (1):

(10)

in the formulaIs shown asClass corresponds toThe coefficients are sparsely represented.

The invention has the characteristics and advantages that:

1. compared with the original bag-of-words model, the method is simple, practical and more effective, and after the scale-invariant feature transformation is used for extracting the image key point features, the method does not find the clustering center or learn the dictionary any more, but utilizes the random matrix. The keypoint features are encoded by a random matrix. Therefore, a large amount of time can be saved, and a large amount of spatial information is not lost like the original bag-of-words model method, so that the recognition rate is greatly improved;

2. the method combines the idea of compressed sensing, can well reconstruct the original image by solving a sparse coefficient matrix in the compressed sensing scheme, and also utilizes the advantages of a space pyramid model and a maximum pool, so that the algorithm has more stability;

3. the method can well overcome the influence of changes such as illumination, shielding and expression of the face on face recognition, not only can obtain higher recognition rate, but also has higher running speed, carries out face recognition on a challenging AR database, and compared with a classical bag-of-words model, the method can well overcome the influence of various factors, greatly improves the face recognition rate and has real-time property.

Drawings

FIG. 1 is a 7 frontal face image of the present invention with varying illumination, expression and camouflage.

Detailed Description

The invention will be further explained with reference to the drawings.

The present invention is explained in detail by a specific example, in a face recognition system, simulation is performed by MATLAB, and an experimental platform is an i5 processor, a main frequency 2.4GHz, and a 2G memory. The protective scope of the invention is not limited to the following examples of implementation.

Fig. 1 shows 7 frontal face images of the present invention with varying illumination, expression and camouflage. The first image is a normal image, the second image is an image with a changed facial expression, the third image is a changed illumination, the fourth image is worn with glasses, the fifth image is worn with glasses and changed illumination, the sixth image is a scarf, and the seventh image is a scarf and changed illumination. This example was run on an AR database, a public very challenging face database. The AR database contains 2600 front face images with different illumination, expression and camouflage changes, and totally 100 people, and 26 images of each person. The AR database is divided into two parts, the first 1-7 images of the first part are changes of expressions and illumination, the text is used as a training set (700 pieces), then the 8 th-10 th glasses and the 11 th-13 th scarf wearing face images of the first part and the second part are respectively taken to be used as a testing set (300 pieces respectively), and the face is normalized to 83 pieces for reducing the costAn image of 60 pixels in size.

First in Matlab for all image segmentationAndand different blocks, and then extracting the features of each block by using a scale-invariant feature transformation method. Assuming that the features of the training set are obtained asThe test set is characterized in thatWhereinAndrepresenting an image composed of 1 to N blocks.

Then training setTest setBy the formulaFeature codes of each block of an image are obtained. All blocks of the image are processed by a formula by using a space pyramid matching and maximum pool methodFusing, and finally obtaining the feature vector of each image and the training set asTest set。

Using kernel projection into a high-dimensional feature space, according to kernel functionsComputing training setAnd test set，Andrespectively to obtainAnd。

reuse formulaCalculating each test sampleCorresponding to training samplesOf the sparse coefficient matrix。

Finally according to the sparse coefficient matrixBy finding the minimum residual errorTo distinguish the classification:

(11)

in the formulaIs shown asClass-corresponding sparse representation coefficients.

The experimental results are shown in table 1, wherein it can be seen that the method of the present invention has a significant recognition rate superior to the existing methods. The recognition rate of wearing glasses and scarf on the first part of the AR face library is more than 97%, and the recognition rate of the method is 7% higher than that of the existing algorithm although the recognition rate of the second part is reduced because the first 7 pieces of the first part are adopted in the experimental training set. The time in the final table is representative of the average per image processing time, and it can be seen that the method of the present invention takes less time than the prior art methods.

TABLE 1 comparison of recognition rates of two algorithms on AR database

	Spectacles 1	Scarf 1	Glasses 2	Scarf 2	Time(s)
						Word bag (existing method)	81.35	80.34	73.37	62.03	0.1800
The method of the invention	98.32	97.33	80.96	87.02	0.1001

Therefore, the method can be widely applied to real life, and the method has good robustness as can be seen from experiments.

Claims

1. A face recognition method based on bag-of-words compressed sensing feature extraction comprises a face recognition system and is characterized in that the recognition steps are as follows:

firstly, extracting image features by using a method of transforming the features by using scale-invariant features;

(1)

wherein,the position of the pixel is represented by,the scale space is represented by a scale of,representing a scale space factor;

after obtaining the scale space of the image, adoptPyramidal methods, i.e. spaces obtained by convolution of images with a differential Gaussian pyramid functionThe method of (1) searching for extreme points to obtainThe formula of (1) is:

(2)

where k is two different adjacent scale spaces;

(3) determining key points according to the extreme points, and giving a direction to the key points to realize the rotation invariance of the imageSmooth image(ii) a ComputingEach point ofThe gradient and direction of (d);

(3)

(4)

(4) selecting a neighborhood around the key point, and forming a histogram by taking the key point as a center and utilizing the gradients of all points in the neighborhood;

and the gradient of the midpoint is weighted Gaussian;

the neighborhood is divided into four sub-areas, and eight directions are taken in each sub-area;

thereby obtaining the scale-invariant feature transformation of the image;

step two, feature coding

(5)

Step three, fusing features of different scales in the image

(6)

finally, a space pyramid matching algorithm is used, namely, a pair of images are segmented intoAnddifferent blocks can be used for carrying out feature coding on sub-regions with different spatial positions and scalesCode, if the maximum pool of the scale is obtained by using the space pyramid matching algorithmThen, the feature vectors of different scales and regions are connected in series to obtain the feature vector of the image;

step four, classifying

(7)

(8)

(9)

whereinIs shown asThe number of the test specimens is determined,expression solutionAndthe histogram cross kernel of (1);

solving equation (9) to obtain the coefficientThen, finally, the minimum residual error is calculatedThe classification of (1):

(10)