CN108052867B

CN108052867B - Single-sample face recognition method based on bag-of-words model

Info

Publication number: CN108052867B
Application number: CN201711155556.8A
Authority: CN
Inventors: 刘凡; 许峰
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2017-11-20
Filing date: 2017-11-20
Publication date: 2021-11-23
Anticipated expiration: 2037-11-20
Also published as: CN108052867A

Abstract

The invention discloses a single-sample face recognition method based on a bag-of-words model, which utilizes the bag-of-words model to extract middle-layer semantic features so as to reduce the semantic gap under the condition of a single sample. The method comprises the steps of firstly dividing a face into a plurality of sub-blocks, then extracting SIFT features of all the sub-blocks, clustering the SIFT features of all the sub-blocks, and further constructing a visual word dictionary; presenting multiple stages based on visual word dictionarykThe neighbor collaborative representation coding method projects the local features of each sub-block to a semantic space; in order to describe the spatial information and reduce the feature dimension, the coded features are pooled by using a spatial pyramid model, and a histogram based on visual words is generated to describe the human face. And finally, combining the pooled features together, and classifying by using an SVM classifier based on a linear kernel. The single-sample face recognition method has good robustness on expression, illumination change, shielding and the like, and is high in recognition accuracy.

Description

Single-sample face recognition method based on bag-of-words model

Technical Field

The invention relates to a single-sample face recognition method, in particular to a single-sample face recognition method based on a bag-of-words model, wherein each object to be recognized only has one training image, and belongs to the technical field of face recognition.

Background

Through the development of the last fifty years, the face automatic identification technology has made a long-term progress, and the face identification technology under a controllable condition has made a satisfactory performance. However, under the non-controllable condition, the accuracy of the face recognition technology is sharply reduced due to the influence of factors such as illumination, expression, posture, noise, shielding and the like, and the application requirements can not be met. The most direct method for solving these problems is to add training samples, but in many practical applications such as identification card recognition, passport recognition, judicial validation, admission control, etc., only one training sample can be obtained, and the face recognition problem in this case is called single sample person (SSPP), which further aggravates the difficulty of face recognition under uncontrollable conditions.

The difficulty of single-sample face recognition lies in that it is difficult to distinguish between the essential changes of different faces and the changes caused by illumination, expression and shielding, that is, there is a semantic gap between the face features and the identity. The excellent performance of bag-of-words models in image classification tasks has attracted extensive research interest of scholars in recent years, and such models are being introduced into the field of face recognition research. For example, Li et al (z.s.li, j.i.imai, and m.kaneko, "Robust face Recognition using block-based Bag of Words," Proceedings of the 26th International Conference on Pattern Recognition, pp.1285-1288) first explicitly apply a Bag-of-Words model to face Recognition, proposing a Robust face Recognition algorithm based on the block-by-block Bag-of-Words model; xie et al (S.F. Xie, S.G. Shan, X.L. Chen, X.Meng and W.Gao, "Learned local Gabor pattern for surface representation and retrieval," Signal Processing, vol.89, No.3, pp. 2333-. Recently, Cui et al (z.cui, w.li, d.xu, s.g.shann, and x.l.chen, "fused robust face region description via multiple metric learning for the face Recognition in the world," Proceedings of the 26th International Conference Computer Vision and Pattern Recognition, pp. 3554 3561,2013) proposed a face Recognition algorithm based on spatial face region description operators (SFRD) that also used grayscale feature description image tiles, but used a non-negative encoding method to encode each tile against an offline trained dictionary, then split the image into several sub-regions, and use a metric learning algorithm to fuse pooled features within each sub-region.

The bag-of-words model can be considered as extracting a middle-layer semantic feature, and can weaken the semantic gap between the high-layer semantic feature and the bottom-layer semantic feature to some extent. Therefore, the method can be theoretically used for improving the face recognition performance under the condition of a single sample, but the computational complexity of the sparse coding or the non-negative sparse coding frequently used in the current bag-of-words model is too high.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the single-sample face recognition method based on the bag-of-words model solves the problem of single-sample face recognition by adopting a coding method which is more robust and higher in calculation efficiency.

The invention adopts the following technical scheme for solving the technical problems:

a single sample face recognition method based on a bag of words model comprises the following steps:

step 1, dividing each training face image into a series of sub-blocks, extracting SIFT local features of each sub-block, and obtaining an SIFT local feature set X belonging to R of all training face images^D×ND is the dimension of SIFT local features, and N is the total number of SIFT local features of all training face images;

step 2, randomly selecting a subset Xs of a set X consisting of SIFT local features of all training face images, carrying out K-means clustering on the subset Xs, and obtaining a visual word dictionary V ═ V₁,v₂,…,v_K]∈R^D×K；

Step 3, for any face image, dividing the subblocks according to the dividing mode of the face image trained in the step 1, and extracting the SIFT local features of the subblocks to form a set X_r＝{x₁,x₂,…,x_M}∈R^D×MWherein M is the number of all sub-blocks of one face image; based on the visual word dictionary V obtained in the step 2, SIFT local feature set X of the face image is subjected to_rCentral SIFT local feature x_mPerforming multi-stage k-neighbor co-representation coding to obtain a coding vector c_m，m＝1,2,…,M；

Step 4, the face image in the step 3 is divided into 2 by utilizing a space pyramid model^l×2^lSub-blocks at different scales, L is 0,1, …, L is a positive integer greater than 0; setting the H sub-block of the l layer to contain M_HThe coding vectors are subjected to maximum pooling operation to obtain pooled features; performing maximum pooling operation on coding vectors contained in all subblocks of the spatial pyramid model under different scales, and merging pooled features of all subblocks to obtain a face feature representation of the face image;

and 5, performing the operations of the step 3 and the step 4 on all the training face images and all the testing face images to obtain the face feature representations of all the training face images and the face feature representations of all the testing face images, constructing an SVM classifier based on a linear kernel function by using the face feature representations of all the training face images, and identifying all the testing face images by using the constructed SVM classifier.

As a preferred scheme of the present invention, step 3 describes that based on the visual word dictionary V obtained in step 2, the SIFT local feature set X of the face image is subjected to_rCentral SIFT local feature x_mPerforming multi-stage k-neighbor co-representation coding to obtain a coding vector c_mThe specific process is as follows:

(1) finding k neighbors V from a visual word dictionary V_k＝[v₁,v₂,…,v_k]∈R^D×kD is the dimension of SIFT local features, and k is the number of neighbors;

(2) using k neighbors V_k＝[v₁,v₂,…,v_k]∈R^D×kSynergistic SIFT local feature x_mCalculating a co-expression coefficient vector c of k neighbors according to the following formula_m ^*：

Wherein λ is a regular coefficient, | · |. non-woven phosphor₂Is a 2 norm;

(3) co-representation coefficient vector c of k neighbors_m ^*Conversion into K1 representation coefficient vector

The corresponding k adjacent values in the data are the collaborative representation coefficients, and the other values are 0;

(4) reducing k neighbors to k-1 neighbors, and calculating the co-expression coefficients according to the steps (1) to (3)

Until k is 1; when k is equal to 1, the first step is carried out,

the corresponding neighbor value in (1) and the others are all 0;

(5) summing the expression coefficient vectors calculated from k to 1 times to obtain a multi-stage k neighbor collaborative expression coding vector c_mThe formula is as follows:

as a preferred embodiment of the present invention, the maximum pooling operation in step 4 has the following calculation formula:

wherein, B_lHFor the pooled features, c_hFor the H-th coding vector in the H-th sub-block of the l-th layer, M_HThe number of coding vectors contained in the H-th sub-block of the l-th layer is set.

As a preferred scheme of the present invention, in step 4, the pooled features of all sub-blocks are merged together to obtain a face feature representation of the face image, and a formula is as follows:

B_i＝[B_i1；B_i2；…；B_iS]

wherein, B_iFor the face feature representation of the ith training face image,

2^l×2^lrepresenting segmentation of a face image into 2^l×2^lSub-blocks at different scales.

As a preferred scheme of the present invention, the optimal classification function of the SVM classifier in step 5 is:

wherein, f (B)_j) To be optimalClassification function, B_jFace feature representation for the jth test face image, B_iRepresenting the face characteristics of the ith training face image, wherein n is the number of the training face images and y_iClass label, α, for the ith training face image_iFor Lagrange coefficients, k (·,) is the kernel function, b^*Is the threshold of the SVM classifier.

In a preferred embodiment of the present invention, K in the K-means cluster in step 2 is a positive integer greater than 1.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

1. the invention introduces the feature of bag-of-words model extraction with middle-layer semantics, weakens the semantic gap problem under the condition of single sample to a certain extent, and has good robustness on expression, illumination change, shielding and the like, thereby having higher identification precision.

2. The invention designs a more efficient multi-stage k neighbor collaborative representation coding method, which is simple and easy to implement and has higher computational efficiency.

Drawings

FIG. 1 is a flow chart of a single-sample face recognition method based on a bag-of-words model according to the present invention.

FIG. 2 is a working principle diagram of a multi-stage k-nearest neighbor collaborative representation encoding method in the single-sample face recognition method based on the bag-of-words model.

FIG. 3 is a schematic diagram of a spatial pyramid model in the single-sample face recognition method based on the bag-of-words model.

FIG. 4 is a diagram of the experimental results of the single-sample face recognition method based on the bag-of-words model in the LFW face library.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

The difficulty of the single-sample face recognition problem lies in solving the semantic gap between the face features and the identity, so that the semantic gap can be weakened by fully considering the face features with semantics in the single-sample face recognition, and the bag-of-words model can extract the middle-layer semantic features and is theoretically suitable for solving the single-sample face recognition problem. Based on the idea, the invention provides a single-sample face recognition method based on a bag-of-words model.

As shown in fig. 1, the single-sample face recognition method based on the bag-of-words model of the present invention includes the following steps:

1. dividing each training face image into a series of sub-blocks, extracting SIFT (scale invariant feature transform) features from each sub-block, and obtaining a local feature set X belonging to R of all training face images^D×ND is the dimension of SIFT local features, and N is the total number of local features of all the training face image sub-blocks;

2. randomly selecting a subset Xs of a set X formed by local features of all sub-blocks of all the training face images, carrying out K-means clustering on the subset Xs, and obtaining a visual word dictionary V ═ V₁,v₂,…,v_K]∈R^D×K；

3. For any face image, dividing subblocks according to the dividing mode of the face image trained in the step 1 and extracting the SIFT characteristics of the subblocks to form a set X_r＝{x₁,x₂,…,x_M}∈R^D×MAnd M is the number of all sub-blocks of one face image. Based on the visual word dictionary V obtained in the step 2, the local feature set X of the face image is obtained_r＝{x₁,x₂,…,x_M}∈R^D×MLocal feature x in (1)_mPerforming multi-stage k-neighbor co-representation coding to obtain its code c_mThe specific process is shown in fig. 2, and specifically includes the following steps:

(1) from a visual word dictionary V ═ V₁,v₂,…,v_K]∈R^D×KIn finding k neighbors V_k＝[v₁,v₂,…,v_k]∈R^D×k；

(2) Using k neighbors V_k＝[v₁,v₂,…,v_k]∈R^D×kSynergistic local feature x_mCalculated according to the following formulaCo-expression coefficient:

(3) calculating k adjacent co-expression coefficient vectors c according to (2)_m ^*Then, it is converted into k × 1 vector

Until k is 1; when k is 1, the coefficient value of the nearest neighbor is 1, and the other coefficients are 0;

(5) and summing the expression coefficient vectors calculated from k to 1 times according to the following formula to form the final multi-stage k-neighbor collaborative expression code:

4. performing feature pooling on the M coded vectors coded in the step 3, and dividing the face image into 2 parts by using a spatial pyramid model shown in fig. 3^l×2^lSub-blocks at different scales, L ═ 0,1, …, L. Suppose there is M in the H sub-block of the l layer_HAnd (3) carrying out maximum pooling operation on the coding vectors to obtain pooled features, wherein the calculation formula is as follows:

pooling all the subblocks of the pyramid model under different scales, and combining the pooled features of all the subblocks together to be used as a final face feature representation.

5. And (4) after the face feature representations of all the training sample test samples are obtained according to the step (4), constructing an SVM classifier based on a linear kernel function by using the training samples, and finishing the classification of the test samples. The optimal classification function of the SVM classifier is as follows:

wherein, B_jFace feature representation for the jth test face image, B_iRepresenting the face characteristics of the ith training face image, wherein n is the number of the training face images and y_iClass label, α, for the ith training face image_iFor Lagrange coefficients, k (·,) is the kernel function, b^*Is the threshold of the SVM classifier.

The single-sample face recognition method based on the bag-of-words model can extract middle-layer semantic features and effectively weaken the problem of semantic gap under the condition of a single sample, and compared with sparse representation and nonnegative sparse representation, the multi-stage k neighbor collaborative representation coding method is higher in calculation efficiency and has good robustness on illumination, expression, shielding and time change. As shown in fig. 4, the method achieves an improvement of nearly 10% on the LFW database compared to the conventional single-sample face recognition method, and has better effect and higher computational efficiency compared to the non-negative sparse coding method NSC _ BoF.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims

1. A single sample face recognition method based on a bag of words model is characterized by comprising the following steps:

step 1, dividing each training face image into a series of sub-blocks, extracting SIFT local features of each sub-block, and obtaining an SIFT local feature set X belonging to R of all training face images^D×NWhich isD is the dimension of SIFT local features, and N is the total number of SIFT local features of all training face images;

Based on the visual word dictionary V obtained in the step 2, SIFT local feature set X of the face image is obtained_rCentral SIFT local feature x_mPerforming multi-stage k-neighbor co-representation coding to obtain a coding vector c_mThe specific process is as follows:

Wherein λ is a regular coefficient, | · |. non-woven phosphor₂Is a 2 norm;

(3) co-expressing coefficients of k neighborsVector c_m ^*Conversion into K1 representation coefficient vector

Until k is 1; when k is equal to 1, the first step is carried out,

the corresponding neighbor value in (1) and the others are all 0;

2. The bag-of-words model based single-sample face recognition method according to claim 1, wherein the maximum pooling operation of step 4 is as follows:

3. The bag-of-words model based single-sample face recognition method according to claim 1, wherein step 4 is to combine all the pooled sub-block features together to obtain the face feature representation of the face image, and the formula is as follows:

B_i＝[B_i1；B_i2；…；B_iS]

wherein, B_iIs a face feature representation of the ith training face image,

4. The bag-of-words model based single-sample face recognition method according to claim 1, wherein the optimal classification function of the SVM classifier in step 5 is:

wherein, f (B)_j) As an optimal classification function, B_jFace feature representation for the jth test face image, B_iRepresenting the face characteristics of the ith training face image, wherein n is the number of the training face images and y_iClass label, α, for the ith training face image_iFor Lagrange coefficients, k (·,) is the kernel function, b^*Is the threshold of the SVM classifier.

5. The bag-of-words model based single-sample face recognition method according to claim 1, wherein K in the K-means cluster of step 2 is a positive integer greater than 1.