CN108805216A

CN108805216A - Face image processing process based on depth Fusion Features

Info

Publication number: CN108805216A
Application number: CN201810630864.XA
Authority: CN
Inventors: 孙晓; 夏平平; 吕曼; 丁帅; 杨善林; 田芳
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2018-11-13

Abstract

The embodiment of the present invention discloses a kind of face image processing process based on depth Fusion Features, the generalization ability that can be improved on finite data collection.This method includes：Facial expression image data set is expanded using data enhancement methods, while extracting the SIFT feature of each image in the facial expression image data set；Using the SIFT feature that bag of words Bow is indicated as the shallow-layer feature of extraction, the depth feature series connection that correspondence image is obtained is a feature vector, and is trained using SVM classifier.

Description

Face image processing process based on depth Fusion Features

Technical field

The present invention relates to field of image recognition more particularly to a kind of face image processing sides based on depth Fusion Features Method.

Background technology

People can trace back in the 1970s, the research of early stage is concentrated mainly on psychology the research of Expression Recognition In terms of biology.

Traditional facial expression recognition flow based on classification includes Face datection, feature extraction and the several steps of pattern classification Suddenly, face detection module is detected and positions to face；Expression spy extraction module is extracted from face subgraph and can be characterized The description information of expression；Mode classification module analyzes the output of a upper module, expression classification to phase according to Classification and Identification standard The classification answered.Wherein human facial feature extraction is most important part in Expression Recognition system, and the quality of recognition effect relies primarily on The quality of feature.

How researchers is extracting and using in better feature if generally spending a large amount of time and efforts, these The feature of hand-designed not only can the telephone expenses researcher a large amount of time, simultaneously for training data have stronger dependence, Often less reliable and stability is poor, is easily disturbed, to be improved in the generalization that Expression Recognition field has.

Invention content

The embodiment of the present invention provides a kind of face image processing process based on depth Fusion Features, can be in finite data The generalization ability improved on collection.

The embodiment of the present invention adopts the following technical scheme that：

A kind of face image processing process based on depth Fusion Features, including：

Facial expression image data set is expanded using data enhancement methods, while extracting the facial expression image The SIFT feature of each image in data set；

Using the SIFT feature that bag of words Bow is indicated as the shallow-layer feature of extraction, the depth that correspondence image is obtained Feature series connection is a feature vector, and is trained using SVM classifier.

Optionally, the facial expression image data set is CK+ data sets, described to be used to facial expression image data set Data enhancement methods expand：

The CK+ data sets for including multiple facial images are obtained, by data set according to 8:1:1 ratio cut partition training set is tested Card collection, test set, and ensure that piece identity is not overlapped in each set；

All images concentrated to data pre-process, and pass through the Adaboost method for detecting human face based on Haar features Face datection is carried out to every face picture, cuts human face region, removes background influence；

Spatial normalization is carried out to image using opencv visions library, the line adjusted between face two is allowed to keep water It is flat, face is snapped into same position；

Histogram equalization is carried out to all images, enhances the contrast of image, brightness of image caused by weakening illumination is poor Influence；Finally by all image normalizations to 70*70 pixels；

Using the method EDS extended data set for carrying out geometric transformation to raw image data, the training of every 70*70 is schemed Picture, cut its upper left, upper right, lower-left, bottom right, centre 64*64 sizes region, and to after cutting image carry out horizon glass Training set is expanded 10 times by picture.

Optionally, the SIFT feature of each image includes in the extraction facial expression image data set：

To the training set image zooming-out SIFT feature after all expansions, it is a 4*4* that each SIFT key point, which describes son, The vector of 8=128 dimensions；

After extracting feature, each sample is converted to the eigenmatrix of n*128 dimensions, and wherein n is the feature extracted The number of point, it is specified that extracting 20 SIFT feature description for each image, with 20*128 tieed up by the shallow-layer feature of each training sample Vector indicate；

Dictionary is built using all characteristic points of the training set of extraction, with k-means clustering algorithms to all SIFT Feature is clustered, and K cluster centre is obtained, these cluster centres are exactly the dictionary constructed, it is specified that K=500, i.e., 500 regard Feel word.

Optionally, described to include using the SIFT feature that bag of words Bow is indicated as the shallow-layer feature of extraction：

The SIFT feature of each image is expressed as a feature vector with Bow bag of words, passes through minimum distance method meter Which of dictionary visual vocabulary should be belonged to by calculating each SIFT feature, and statistics falls into the number of characteristic point in each dictionary, obtains To the statistic histogram of each image, feature vector that statistic histogram can be tieed up with one 500 indicate to get to the picture pair It should be indicated in the Bow of dictionary.

Optionally, further include：

It is finely adjusted on CK+ training sets using the AlexNet models of pre-training, the feature of the full articulamentum of extraction model is made For the further feature learnt.

Optionally, the AlexNet models using pre-training are finely adjusted on CK+ training sets, and extraction model connects entirely The feature for connecing layer includes as the further feature learnt：

Using the AlexNet convolutional neural networks model (AlexNet-CNN) of the pre-training on ImageNet, in training set On parameter is finely adjusted, the number of nodes of the full articulamentum of AlexNet layers last is revised as 500, initial learning rate is 0.001, when verification collection discrimination is no longer promoted, stops iteration, obtain the CNN models for extracting further feature.

Optionally, further include：

Feature using the full articulamentum of layer last of obtained CNN model extraction training set images is special as deep layer Sign, each single image obtain the feature vector of one 500 dimension.

Optionally, further include：

After the shallow-layer feature of 500 dimensions of step extraction to be connected on to the 500 dimension further features that step 8 is extracted, obtains one and melt Feature is closed, and all feature vectors are normalized in [- 1,1] range.

Optionally, it is described using SVM classifier be trained including：

The fusion feature of training sample data collection is trained using support vector machines, is tested using grid.py intersections Card selects optimal parameter c and g, and then svmtrain uses the optimal parameter c obtained and g, linear classifier, RBF kernel functions pair The feature vector of entire training dataset is trained, and obtains final target detection model.

Optionally, further include：

When test, equally uses the SIFT feature that Bow models indicate as shallow-layer feature test data set image zooming-out, carry Take the feature of the full articulamentum of CNN models layer last as further feature, by two feature vectors connect to obtain fusion feature to It measures and normalizes, using this fusion feature vector as the input of SVM, the class label of output is as recognition result.

The face image processing process based on depth Fusion Features based on the above-mentioned technical proposal, to facial expression image number Expanded using data enhancement methods according to collection, while the SIFT for extracting each image in the facial expression image data set is special Sign；Using the SIFT feature that bag of words Bow is indicated as the shallow-layer feature of extraction, the depth feature string that correspondence image is obtained Connection is a feature vector, and is trained using SVM classifier, so as to the extensive energy improved on finite data collection Power.

It should be understood that above general description and following detailed description is only exemplary and explanatory, not The disclosure can be limited.

Description of the drawings

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is the flow chart of the face image processing process based on depth Fusion Features shown in the embodiment of the present invention.

Specific implementation mode

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects being described in detail in claims, of the invention.

The embodiment of the present invention changes poor robustness for the feature of traditional method for extracting to the complexity such as illumination, expression, posture Etc. limitations, and general convolutional neural networks feature extraction scarce capacity in training on finite data collection cannot effective table Intelligent's face expressive features and the problem of influence generalization ability, it is proposed that a kind of feature representation mode based on depth Fusion Features, SIFT feature by extracting facial image respectively describes son and the full articulamentum feature vector of convolutional neural networks, and by two spies Sign carries out fused in tandem into a kind of new feature representation, reuses SVM classifier and is identified, by SIFT feature to CNN spies That levies is assisted and strengthened so that fusion feature has preferable generalization ability on finite data collection.

Embodiment 1

As shown in Figure 1, the embodiment of the present invention provides a kind of face image processing process based on depth Fusion Features, the packet It includes：

11, facial expression image data set is expanded using data enhancement methods, while extracts the facial expression Image data concentrates SIFT (Scale-invariant feature transform, the scale invariant feature change of each image Change) feature；

12, shallow-layer of the SIFT feature indicated using bag of words Bow (Bag-of-words, bag of words) as extraction Feature, the depth feature series connection that correspondence image is obtained is a feature vector, and utilizes SVM (Support Vector Machine, support vector machines) grader is trained.

Use opencv (OpenSource Computer Vision Library, computer vision of increasing income library) vision library Spatial normalization is carried out to image, the line between face two is adjusted and is allowed to holding level, face is snapped into same position；

Optionally, further include：

Optionally, it is described using SVM classifier be trained including：

The fusion feature of training sample data collection is trained using support vector machines, is tested using grid.py intersections Card selects optimal parameter c (penalty coefficient) and g (the parameter coefficient of kernel function), and then svmtrain uses the optimal parameter obtained C is with g, linear classifier, RBF (Radial Basis Function, radial basis function) kernel function to entire training dataset Feature vector is trained, and obtains final target detection model.

Optionally, further include：

The face image processing process based on depth Fusion Features of the embodiment of the present invention, to facial expression image data set Expanded using data enhancement methods, while extracting the SIFT feature of each image in the facial expression image data set； Using the SIFT feature that bag of words Bow is indicated as the shallow-layer feature of extraction, the depth feature that correspondence image is obtained is connected It for a feature vector, and is trained using SVM classifier, so as to the generalization ability improved on finite data collection.

Embodiment 2

The face image processing process based on depth Fusion Features of the present embodiment the present invention will be described in detail embodiment, this reality It is CK+ standard data sets, including 510 width facial expression images to apply data set used by example, and corresponding anger is detested, fears, certainly So, basic emotion in glad, sadness, surprised seven, expands CK+ training sets using certain data enhancement methods, uses The AlexNet models of pre-training are finely adjusted on CK+ training sets, and the feature of the full articulamentum of extraction model is as the depth learnt Layer feature；The SIFT feature for extracting each image simultaneously, using the SIFT feature that bag of words Bow is indicated as the shallow-layer of extraction Feature；The depth feature series connection that correspondence image is obtained is a feature vector, and SVM classifier is used in combination to be trained, and finally will Test set single image is identified in trained SVM models, and the general of this feature is verified on the data set of several mainstreams Change ability and robustness.

This approach includes the following steps：

201, CK+ data sets totally 510 facial images are obtained, by data set according to 8:1:1 ratio cut partition training set is tested Card collection, test set, and ensure that piece identity is not overlapped in each set.

202, all images concentrated to data pre-process, and pass through the Adaboost faces based on Haar features first Detection method carries out Face datection to every face picture, cuts human face region, removes background influence；Use opencv visions library Spatial normalization is carried out to image, the line between face two is adjusted and is allowed to holding level, face is snapped into same position； Histogram equalization is carried out to all images, enhances the contrast of image, the influence of brightness of image difference caused by weakening illumination；Most Afterwards by all image normalizations to 70*70 pixels.

203, using the method EDS extended data set for carrying out geometric transformation to raw image data, for the instruction of every 70*70 Practice image, cut its upper left, upper right, lower-left, bottom right, centre 64*64 sizes region, and to after cutting image carry out level Training set is expanded 10 times by mirror image.

204, to the training set image zooming-out SIFT feature after all expansions, it is one that each SIFT key point, which describes son, The vector of 4*4*8=128 dimensions.After extracting feature, each sample is converted to the eigenmatrix of n*128 dimensions, and wherein n is to carry The number for the characteristic point got.It is defined as each image and extracts 20 SIFT feature description, the shallow-layer feature of each training sample It is indicated with the 20*128 vectors tieed up.

205, dictionary is built using all characteristic points of the training set of extraction, with k-means clustering algorithms to all SIFT feature is clustered, and K cluster centre is obtained, these cluster centres are exactly the dictionary constructed, it is specified that K=500, i.e., 500 A visual word.

206, the SIFT feature of each image is expressed as a feature vector with Bow bag of words, passes through minimum distance Method, which calculates each SIFT feature, should belong to which of dictionary visual vocabulary, and statistics falls into of characteristic point in each dictionary Number, obtains the statistic histogram of each image, the feature vector that statistic histogram can be tieed up with one 500 indicate to get to this The Bow that picture corresponds to dictionary is indicated.

207, it using the AlexNet convolutional neural networks model (AlexNet-CNN) of the pre-training on ImageNet, is instructing Practice and parameter is finely adjusted on collection, the number of nodes of the full articulamentum of AlexNet layers last is revised as 500, initial study speed Rate is 0.001, when verification collection discrimination is no longer promoted, stops iteration, obtains the CNN models for extracting further feature.

208, using the feature of the full articulamentum of layer last of the CNN model extraction training set images obtained in step 7 As further feature, each single image obtains the feature vector of one 500 dimension.

209, it after the shallow-layer feature for 500 dimensions that step 6 is extracted being connected on to the 500 dimension further features that step 8 is extracted, obtains One fusion feature, and all feature vectors are normalized in [- 1,1] range.

210, the fusion feature of training sample data collection is trained using support vector machines, is handed over using grid.py Fork verification selects optimal parameter c and g, and then svmtrain uses the optimal parameter c obtained and g, linear classifier, RBF core letters Several feature vectors to entire training dataset are trained, and obtain final target detection model.

211, test phase equally uses the SIFT feature that Bow models indicate as shallow-layer test data set image zooming-out Feature extracts the feature of the full articulamentum of CNN models layer last as further feature, two feature vectors is connected and are merged Feature vector simultaneously normalizes, and using this fusion feature vector as the input of SVM, the class label of output is as recognition result.

The embodiment of the present invention is finely adjusted using the AlexNet models of pre-training on CK+ training sets, and extraction model connects entirely The feature of layer is connect as the further feature learnt；The SIFT feature for extracting each image simultaneously, is indicated using bag of words Bow SIFT feature as extraction shallow-layer feature；By shallow-layer feature and further feature carry out fused in tandem be a new feature to It measures and normalizes, the feature representation new as image, training SVM classifier.The more manual features of CNN can learn richer to image Rich advanced character representation, generally requires sufficient data volume, although the recognition effect of the craft feature such as SIFT is not so good as CNN, But they should not a large amount of training data generate useful feature, the method overcome human face expression data set it is insufficient in the case of The problem of CNN further features ability to express deficiency assists depth characteristic to improve the identity under small data using traditional characteristic Energy.

The fusion feature recognition effect that the embodiment of the present invention proposes is better than SIFT-Bow shallow-layers feature and the depth of CNN extractions The discrimination of further feature, by merging shallow-layer feature, can be improved 2% or so by layer feature on CK+ data sets；Simultaneously The generalization ability for introducing cross datasets experimental verification this feature, by extracting fusion feature on CK+ data sets in SVM classifier Training, is tested on JAFFE data sets, and recognition effect reaches 47.8%, and property rate has a distinct increment compared with classical scheme, Synthesis result is preferable.

Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or improvement to the technology in market for best explaining each embodiment, or make the art Other those of ordinary skill can understand each embodiment disclosed herein.

Those skilled in the art will readily occur to its of the disclosure after considering specification and putting into practice disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and includes the undocumented common knowledge in the art of the disclosure Or conventional techniques.

Claims

1. a kind of face image processing process based on depth Fusion Features, which is characterized in that including：

Facial expression image data set is expanded using data enhancement methods, while extracting the facial expression image data Concentrate the Scale invariant features transform SIFT feature of each image；

Using the SIFT feature that bag of words bag of words Bow is indicated as the shallow-layer feature of extraction, correspondence image is obtained The series connection of depth feature is a feature vector, and is trained using support vector machines grader.

2. according to the method described in claim 1, it is characterized in that, the facial expression image data set be CK+ data sets, institute It states that facial expression image data set using data enhancement methods expand and includes：

The CK+ data sets for including multiple facial images are obtained, by data set according to 8:1:1 ratio cut partition training set, verification collection, Test set, and ensure that piece identity is not overlapped in each set；

All images concentrated to data pre-process, by the Adaboost method for detecting human face based on Haar features to every It opens face picture and carries out Face datection, cut human face region, remove background influence；

Using increasing income, computer vision library opencv visions library carries out spatial normalization to image, adjusts the company between face two Line is allowed to holding level, and face is snapped to same position；

Histogram equalization is carried out to all images, enhances the contrast of image, the shadow of brightness of image difference caused by weakening illumination It rings；Finally by all image normalizations to 70*70 pixels；

The training image of every 70*70 is cut out using the method EDS extended data set for carrying out geometric transformation to raw image data Cut its upper left, upper right, lower-left, bottom right, centre 64*64 sizes region, and horizontal mirror image is carried out to the image after cutting, will instructed Practice collection and expands 10 times.

3. according to the method described in claim 1, it is characterized in that, described extract every width in the facial expression image data set The SIFT feature of image includes：

To the training set image zooming-out SIFT feature after all expansions, it is a 4*4*8=that each SIFT key point, which describes son, The vector of 128 dimensions；

After extracting feature, each sample is converted to the eigenmatrix of n*128 dimensions, and wherein n is the characteristic point extracted Number, it is specified that extract 20 SIFT features description for each image, the shallow-layer feature of each training sample with 20*128 tie up to Amount indicates；

Dictionary is built using all characteristic points of the training set of extraction, with k-means clustering algorithms to all SIFT features It is clustered, obtains K cluster centre, these cluster centres are exactly the dictionary constructed, it is specified that K=500, i.e. 500 visual words.

4. according to the method described in claim 1, it is characterized in that, the SIFT feature indicated using bag of words Bow is made Include for the shallow-layer feature of extraction：

The SIFT feature of each image is expressed as a feature vector with Bow bag of words, is calculated by minimum distance method every A SIFT feature should belong to which of dictionary visual vocabulary, and statistics falls into the number of characteristic point in each dictionary, obtain every The statistic histogram of width image, the feature vector that statistic histogram can be tieed up with one 500 indicate to correspond to get to the picture The Bow of dictionary is indicated.

5. according to the method described in claim 1, it is characterized in that, further including：

It is finely adjusted on CK+ training sets using the AlexNet models of pre-training, the feature of the full articulamentum of extraction model is as The further feature practised.

6. according to the method described in claim 5, it is characterized in that, the AlexNet models using pre-training are trained in CK+ It is finely adjusted on collection, the feature of the full articulamentum of extraction model includes as the further feature learnt：

It is right on training set using the AlexNet convolutional neural networks model (AlexNet-CNN) of the pre-training on ImageNet Parameter is finely adjusted, and the number of nodes of the full articulamentum of AlexNet layers last is revised as 500, initial learning rate is 0.001, when verification collection discrimination is no longer promoted, stops iteration, obtain the CNN models for extracting further feature.

7. according to the method described in claim 6, it is characterized in that, further including：

Using obtained CNN model extraction training set images the full articulamentum of layer last feature as further feature, often A single image obtains the feature vector of one 500 dimension.

8. according to the method described in claim 4, it is characterized in that, further including：

After the shallow-layer feature of 500 dimensions of step extraction to be connected on to the 500 dimension further features that step 8 is extracted, it is special to obtain a fusion Sign, and all feature vectors are normalized in [- 1,1] range.

9. according to the method described in claim 1, it is characterized in that, it is described using SVM classifier be trained including：

The fusion feature of training sample data collection is trained using support vector machines, is selected using grid.py cross validations Optimal parameter c and g are selected, then svmtrain uses the optimal parameter c obtained and g, linear classifier, radial basis function RBF cores The feature vector of the entire training dataset of function pair is trained, and obtains final target detection model.

10. method according to any one of claim 1 to 9, which is characterized in that further include：

When test, equally use the SIFT feature that Bow models indicate as shallow-layer feature, extraction test data set image zooming-out Two feature vectors are connected to obtain fusion feature vector by the feature of the full articulamentum of CNN models layer last as further feature And normalize, using this fusion feature vector as the input of SVM, the class label of output is as recognition result.