The expression recognition accelerated based on GPU and interactive approach
Technical field
The present invention relates to computer digit multimedia, augmented reality and body sense interaction technique field, particularly a kind of human face expression based on common camera detects and interactive approach, espespecially a kind of expression recognition based on GPU acceleration and interactive approach.
Background technology
Face is one of expressive passage of most in nonverbal communication, and human face expression is research mood, intention, personality provide clue.The ability of precise and high efficiency identification human face expression can provide support for a lot of application, and imagination space is huge.
In the past few decades, a lot of computer system is fabricated out, is used for understanding human expressions or interactive with it.Expression Recognition is all that some prototypes express one's feelings (glad, surprised, angry, sad, frightened, nauseating) by most systems.In daily life, these prototypes expression seldom occurs, and has exchanged a lot by little one or two change on face characteristic, such as closes lip and represent indignation, and lip tilts down to represent sadness.The change of some independent features, particularly in the change in eyebrow and eyelid region, a kind of paralanguage especially, represents as lifted eyebrow happily.In order to obtain exchanging of these meticulous human emotion and paralanguage, the automatic detection of human expressions's slight change is very important.
Facial Coding System (FACS) be most popular, based on the coded system of anatomy principle, it, by observing the instantaneous light microvariations of facial appearance, is encoded into the motion of different muscle.Use FACS, researcher by any facial expression feasible anatomically, can be decoded into Action Unit.To happy, sad, surprised, frightened, angry, these six kinds of basic facial expressions of feeling sick, then the combination of AU can be used to represent, such as happiness is the combination of AU6 and AU12.
Mainly be divided into the identification of prototype expression and the identification to face Action Unit to the identification of human face expression, shan uses LBP to carry out discriminator to expression prototype.Zhao proposes a kind of dynamic texture recognizer based on LBP, and has carried out discriminator by this algorithm to prototype expression.Cohn uses Gabor filtering identification smiling face.Sebe employs the mode that PBVD model specified by user and obtains facial feature points, then obtains expressive features by model deformation, uses SVM support vector machine to classify to feature thus obtains prototype to express one's feelings.
Have a lot to the algorithm that face Action Unit identifies, Lucey uses the movable contour model of AAM to mate 63 of face unique points, carries out discriminator by SVM support vector machine, NN arest neighbors and LDA to AU.The LPQ-TOP method that Jiang proposes is a kind of improving one's methods based on LBP-TOP and LBQ, and achieve the discriminator to 9 AU, accuracy rate reaches 84.6%.Bartlett and Littlewort uses Gabor filtering to identify expression prototype and AU.Littlewort also uses Gabor filtering to carry out feature extraction to facial image, and uses SVM to carry out AU discriminator.
In the research of expression recognition, Many researchers pays close attention to the accuracy rate identified, and be not take notice of especially to efficiency, therefore, develop a kind of can carry out in real time expression mutual system be very necessary, this can allow user pass judgment on interactive system better on the one hand, on the other hand, also can carry out more Mutual effect based on this.
Real-time expression interactive system is a Software tool that may be used for Real time identification human face expression AU.System by camera process real-time video, also can process video file or single picture.By the identification to 16 facial AU, system both can respond single AU, also by combination, can identify, or customize interactive action these six kinds of prototype expressions happy, sad, surprised, angry, nauseating and frightened.The output of this systems all shows at graphical interfaces, is directly output as file.By the interprocess communication means based on socket, expression interactive system can provide AU recognition function for other application in real time, contributes to the application and development based on AU.
Summary of the invention
The object of the present invention is to provide a kind of expression recognition based on GPU acceleration and interactive approach, solve the problems such as the recognition of face inefficiency of prior art existence.Based on the Gabor filtering identification human face expression that GPU accelerates, the method can allow user before common camera, comes to carry out alternately with computing machine, in digital home, game and medically have broad application prospects by the change of expression.
Above-mentioned purpose of the present invention is achieved through the following technical solutions:
The expression recognition accelerated based on GPU and interactive approach, comprise the following steps:
Step (1), obtain dynamic human face expression by common camera or video:
Common camera is connected to computing machine, and common camera is positioned over player face dead ahead, distance face 50-60 centimetre, obtains the image comprising front face by camera;
Step (2), use the recognition methods based on Haar characteristic sum AdaBoost cascade classifier to detect face, extract from camera nearest, namely occupy the facial image that picture is maximum:
Step (3), facial image by extracting, use the recognition methods based on Haar feature and AdaBoost, identify the position coordinates such as pupil, nose.
Step (4), face is divided into several critical areas:
By the analysis to the micro-expression of face, face is divided into several regions by us, wherein, centered by pupil position, 15 pixels left, to the right 15 pixels, upwards 35 pixels, the region of downward 15 pixels is eyebrow expression region, is used for detecting the slight change of eyebrow, also has the region covering the micro-expression of eyes, the micro-expression of cheek and the micro-expression of lip in addition respectively;
Step (5), the Gabor filtering that whole face realization is accelerated based on GPU:
Gabor filtering needs the convolution operation respectively Gabor being examined to portion and imaginary part in a large number, and Gabor core is larger, and convolved image is larger, consuming time longer, and we select the Gabor core of 21*21 pixel size, and convolved image is 150*150.For the characteristic of convolution algorithm length consuming time, employ the method based on FFT, the convolution operation of spatial domain is converted into the multiplication operations of frequency field, each convolution only needs 1 FFT conversion, is multiplied for 1 time and 1 inverse FFT conversion, time complexity is nlog (n), speed faster can be reached, use GPU to accelerate parallel processing technique simultaneously, FFT conversion is accelerated, the FFT transformation results of Gabor core is kept in video memory simultaneously, reduces the time of computing.After Gabor filtering, each pixel has 40 amplitudes as feature.
Step (6), feature extraction is carried out to the critical area near human face characteristic point:
By the eyebrow expression region ROI and other ROI will obtained in step (4), the pixel in each ROI is arranged according to order from left to right, from top to bottom, then 40 amplitudes are substituted into, obtain the feature of this ROI;
Step (7), if be now training mode, then whether the feature extracted is occurred stamping label according to micro-expression, and generate model of cognition by the mode that increment SVM trains: CK, CK+, in the Facial expression databases such as MMI, comprise the picture of expression and micro-expression information of wright's manual markings, by the image of correspondence is passed through step (1) to step (6), obtain characteristic of correspondence, by the micro-expression information according to wright's manual markings, using concrete micro-expression as the micro-expression provincial characteristics of classification set, trained by the SVM support vector machine of band punishment, wherein punish that parameter is 10, obtain micro-Expression Recognition model, altogether 16 expressions are identified, so generate 16 micro-Expression Recognition models,
Step (8) is if be now recognition mode, then the feature extracted is dropped in corresponding model of cognition, obtain the specifying information of micro-expression: by step (7), 16 micro-Expression Recognition models can be generated, by the feature will generated in step (6), put in corresponding C-SVM, the information whether micro-expression occurs can be obtained accurately.
Use the recognition methods based on Haar characteristic sum AdaBoost cascade classifier to detect face in described step (2), extract from camera nearest, namely occupy the facial image that picture is maximum, the steps include:
(2.1), in the image basis, in step (1) obtained, what use OpenCV to carry carries out Face datection based on Haar characteristic sum AdaBoost cascade classifier method, and wherein scale parameter is 1.1, minNeighbors parameter is 3;
(2.2), all faces detected are sorted from big to small with face size, calculate the median of face size, and delete the face of larger than median 30% and less than median 30%, in remaining face, select maximum one and record its coordinate;
(2.3), Image semantic classification: in Expression Recognition, source picture has a lot of difference in size, illumination, position, and desirable input is pure expression region, so need the pre-service through following steps:
(2.3.1), unitary of illumination, namely Nogata is balanced;
(2.3.2), geometrical normalization, be namely converted to 150*150 resolution;
(2.4), an oval mask is covered on image with the center of face, long axis length is 47% of picture altitude, and minor axis length is 41.6% of picture traverse, marks pure expression region, effectively to exclude the noise outside face.
Beneficial effect of the present invention is: the problem that the invention solves recognition of face inefficiency, and being connected by Socket allows user can carry out alternately with other programs, user is without the need to using the interactive devices such as mouse-keyboard, only need the expression using oneself just can carry out alternately with computer, have broad prospects at digital home, game and medical field, and there is higher availability and realistic meaning.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms a application's part, and illustrative example of the present invention and explanation thereof, for explaining the present invention, do not form inappropriate limitation of the present invention.
Fig. 1 is process flow diagram of the present invention;
Fig. 2 is software interface figure of the present invention.
Embodiment
Detailed content of the present invention and embodiment thereof is further illustrated below in conjunction with accompanying drawing.
Shown in Fig. 1 and Fig. 2, the expression recognition based on GPU acceleration of the present invention and interactive approach, comprise the following steps:
Step 1. obtains dynamic human face expression by common camera or video: common camera is connected to computing machine, and common camera is positioned over player face dead ahead, distance face 50-60 centimetre, obtains the image comprising front face by camera.
Step 2. uses the recognition methods based on Haar characteristic sum AdaBoost cascade classifier to detect face, extracts the facial image from camera nearest (occupying picture maximum):
2.1 in the image basis of step 1 acquisition, and what use OpenCV to carry carries out Face datection based on Haar characteristic sum AdaBoost cascade classifier method, and wherein scale parameter is 1.1, minNeighbors parameter is 3;
All faces detected sort with face size by 2.2 from big to small, calculate the median of face size, and delete the face of excessive (larger than median 30%) and too small (less than median 30%), in remaining face, select maximum one and record its coordinate;
2.3 Image semantic classification: in Expression Recognition, source picture has a lot of difference in size, illumination, position, and the input of our ideal is pure expression region, so need the pre-service through following steps:
2.3.1 unitary of illumination (Nogata is balanced);
2.3.2 geometrical normalization (being converted to 150*150 resolution);
2.4 one oval mask covers (long axis length is 47% of picture altitude, and minor axis length is 41.6% of picture traverse) on image with the center of face, marks pure expression region, effectively can exclude the noise outside face.
The facial image of step 3. by extracting, uses the recognition methods based on Haar feature and AdaBoost, identifies the position coordinates such as pupil, nose.
Face is divided into several critical areas by step 4.:
By the analysis to the micro-expression of face, face is divided into several regions, wherein, centered by pupil position, 15 pixels left, to the right 15 pixels, upwards 35 pixels, the region of downward 15 pixels is used for detecting the slight change of eyebrow, also has the region covering the micro-expression of eyes, the micro-expression of cheek and the micro-expression of lip in addition respectively.
Step 5. realizes the Gabor filtering accelerated based on GPU to whole face:
Gabor filtering needs the convolution operation respectively Gabor being examined to portion and imaginary part in a large number, and Gabor core is larger, and convolved image is larger, consuming time longer, and we select the Gabor core of 21*21 pixel size, and convolved image is 150*150.For the characteristic of convolution algorithm length consuming time, employ the method based on FFT, the convolution operation of spatial domain is converted into the multiplication operations of frequency field, each convolution only needs 1 FFT conversion, is multiplied for 1 time and 1 inverse FFT conversion, time complexity is nlog (n), speed faster can be reached, use GPU to accelerate parallel processing technique simultaneously, FFT conversion is accelerated, the FFT transformation results of Gabor core is kept in video memory simultaneously, reduces the time of computing.After Gabor filtering, each pixel has 40 amplitudes as feature.
Critical area near step 6. pair human face characteristic point carries out feature extraction:
By the eyebrow expression region ROI and other ROI will obtained in step 4, the pixel in each ROI is arranged according to order from left to right, from top to bottom, then 40 amplitudes are substituted into, obtain the feature of this ROI.
If step 7. is now training mode, then whether the feature extracted is occurred stamping label according to micro-expression, and generate model of cognition by the mode that increment SVM trains: CK, CK+, in the Facial expression databases such as MMI, comprise the picture of expression and micro-expression information of wright's manual markings, by the image of correspondence is passed through step 1 to step 6, obtain characteristic of correspondence, by the micro-expression information according to wright's manual markings, using concrete micro-expression as the micro-expression provincial characteristics of classification set, trained by the SVM support vector machine of band punishment, wherein punish that parameter is 10, obtain micro-Expression Recognition model.Altogether 16 expressions are identified, so generate 16 micro-Expression Recognition models.
If step 8. is now recognition mode, then the feature extracted is dropped in corresponding model of cognition, obtain the specifying information of micro-expression: by step 7,16 micro-Expression Recognition models can be generated, by the feature will generated in step 6, put in corresponding C-SVM, the information whether micro-expression occurs can be obtained accurately.
The foregoing is only preferred embodiment of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.All any amendments made for the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.