CN111626113A

CN111626113A - Facial expression recognition method and device based on facial action unit

Info

Publication number: CN111626113A
Application number: CN202010312602.6A
Authority: CN
Inventors: 姚辉; 芦燕云; 陈晓华; 孙广宇; 李欣; 才智; 章莉
Original assignee: Chengdu Zhongyunwei Technology Co ltd; Beijing Xicheng District Center School For Mental Retardation
Current assignee: Chengdu Zhongyunwei Technology Co ltd; Beijing Xicheng District Center School For Mental Retardation
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2020-09-04

Abstract

The embodiment of the invention discloses a facial expression recognition method and a device based on a facial action unit, wherein the method comprises the following steps: preprocessing the collected picture data containing the face region by using a face key point detection technology, and subdividing the face region into a plurality of face action unit trigger regions; identifying the facial action unit by using a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit; performing feature fusion on the obtained global facial features and the features of the facial action units to obtain facial expression fusion features; and inputting the facial expression fusion features into a preset Gaussian process classifier, identifying the facial expressions of the human face by using the Gaussian process classifier, and outputting facial expression identification results. By adopting the method, the expression recognition can be carried out based on the facial action unit, the recognition of the micro expression is more facilitated, and the facial expression recognition performance and robustness are improved.

Description

Facial expression recognition method and device based on facial action unit

Technical Field

The embodiment of the invention relates to the field of computer vision, in particular to a facial expression recognition method and device based on a facial action unit. In addition, an electronic device and a storage medium are also related.

Background

In recent years, with the rapid development of economic society, the demand of people for intelligent life is increasingly increased. The artificial intelligence technology is developed continuously, and high-quality human-computer interaction experience is a necessary condition for improving and perfecting the intelligent life quality, so that emotion analysis of a user in the human-computer interaction process is a key link. Since the facial expression plays a very important role in human emotional expression, abundant emotional information can be obtained by analyzing the facial expression. In addition, the facial expression is acquired only by using a camera, and the acquisition mode is convenient and fast. Therefore, the analysis of the facial expression has wide application prospect in the current intelligent products.

However, the existing facial expression recognition algorithm has a good effect only in basic expressions with macroscopical sizes, and sometimes microscopic expressions are not obvious in facial expression. Because the facial features vary slightly from those of a macroscopic expression and the differences between expressions are not significant. Therefore, the problem of identifying the micro-expressions cannot be effectively solved only by the traditional machine learning method. The analysis of the micro expression is not only a proprietary subject of the artificial intelligence field, but also a large number of research results of the anatomical field are obtained on the subject, the facial activity unit is a research result of the subject of the anatomical field about the analysis of the micro expression, the correlation between the facial muscle movement and the facial expression is further known through the facial activity unit, a large amount of priori knowledge can be provided for the analysis of the micro expression, and the recognition of the facial expression can be effectively assisted by applying the facial activity unit to the artificial intelligence field. Therefore, the application of the facial activity unit to the research of facial expression recognition is receiving wide attention, and the technology of facial expression recognition based on the facial activity unit is also one of the important research subjects in the field of computer vision. Therefore, how to accurately recognize facial expression information based on a facial action unit is a problem that is currently urgently needed to be solved.

Disclosure of Invention

Therefore, the embodiment of the invention provides a facial expression recognition method based on a facial action unit, so as to solve the problem that the machine learning method in the prior art cannot effectively solve the recognition of micro-expressions.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

according to the facial expression recognition method based on the facial action unit, provided by the embodiment of the invention, the method comprises the following steps: collecting picture data containing a face area; preprocessing picture data containing a face region by using a face key point detection technology, and subdividing the face region into a plurality of face action unit trigger regions; identifying the facial action unit by using a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit; performing feature fusion on the obtained global facial features and the features of the facial action units to obtain facial expression fusion features; and inputting the facial expression fusion features into a preset Gaussian process classifier, identifying the facial expressions of the human face by using the Gaussian process classifier, and outputting facial expression identification results.

Further, utilize face key point detection technology to carry out the preliminary treatment to the picture data that contain the face region, subdivide the face region into a plurality of face action unit trigger areas, specifically include: detecting a preset number of key points on the face area according to the face gouging structure and a face key point detection algorithm; clipping the picture data according to a preset number of key points to obtain a target face area, and scaling the target face area to a preset pixel size to realize the normalization of the face image; and subdividing the target human face area into three facial action unit trigger areas, wherein the three facial action unit trigger areas comprise an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area.

Further, the identifying the facial action unit by using the preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit specifically includes: selecting a facial action unit having an association relation with the expression, and constructing a corresponding relation between the facial action unit and the three facial action unit trigger areas; extracting local features of the face action unit in a corresponding face action unit trigger area by using the three-dimensional convolutional neural network identification model to obtain local features of the face action unit trigger area; inputting the local features of the trigger area of the facial action unit into a softmax layer for identifying the facial action unit, and obtaining the features of the facial action unit.

Further, the performing feature fusion on the obtained global facial feature and the feature of the facial action unit to obtain a facial expression fusion feature includes: the target face area is cut into blocks and input into a preset three-dimensional convolutional neural network recognition model to extract features of a frontal face, and the global features of the face are obtained; and performing feature level fusion on the features of the facial action unit and the global facial features to obtain facial expression fusion features.

Further, the gaussian process classifier is based on an associative relationship constraint between the facial action units and the expressions.

Further, the recognizing facial expressions of the human face by using the gaussian process classifier and outputting facial expression recognition results specifically include: inputting the facial expression fusion features into the Gaussian process classifier to obtain a primary facial expression recognition result; and further constraining the preliminary facial expression recognition result by using the characteristics of the facial action unit and the association relationship between the facial action unit and the facial expression, and outputting a facial expression recognition result.

Correspondingly, an embodiment of the present application further provides a facial expression recognition apparatus based on a facial action unit, including: the data acquisition unit is used for acquiring picture data containing a face area; the preprocessing unit is used for preprocessing the picture data containing the face area by using a face key point detection technology and subdividing the face area into a plurality of face action unit trigger areas; the face action unit identification unit is used for identifying the face action unit by utilizing a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the face action unit; the feature fusion unit is used for carrying out feature fusion on the obtained global facial features and the features of the facial action unit to obtain facial expression fusion features; and the facial expression recognition unit is used for inputting the facial expression fusion characteristics into a preset Gaussian process classifier, recognizing the facial expression of the human face by using the Gaussian process classifier and outputting a facial expression recognition result.

Further, the preprocessing unit is specifically configured to: detecting a preset number of key points on the face area according to the face gouging structure and a face key point detection algorithm; clipping the picture data according to a preset number of key points to obtain a target face area, and scaling the target face area to a preset pixel size to realize the normalization of the face image; and subdividing the target human face area into three facial action unit trigger areas, wherein the three facial action unit trigger areas comprise an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area.

Further, the facial action unit recognition unit is specifically configured to: selecting a facial action unit having an association relation with the expression, and constructing a corresponding relation between the facial action unit and the three facial action unit trigger areas; extracting local features of the face action unit in a corresponding face action unit trigger area by using the three-dimensional convolutional neural network identification model to obtain local features of the face action unit trigger area; inputting the local features of the trigger area of the facial action unit into a softmax layer for identifying the facial action unit, and obtaining the features of the facial action unit.

Further, the feature fusion unit is specifically configured to: the target face area is cut into blocks and input into a preset three-dimensional convolutional neural network recognition model to extract features of a frontal face, and the global features of the face are obtained; and performing feature level fusion on the features of the facial action unit and the global facial features to obtain facial expression fusion features.

Further, the facial expression recognition unit is specifically configured to: inputting the facial expression fusion features into the Gaussian process classifier to obtain a primary facial expression recognition result; and further constraining the preliminary facial expression recognition result by using the characteristics of the facial action unit and the association relationship between the facial action unit and the facial expression, and outputting a facial expression recognition result.

Correspondingly, the present application also provides an electronic device, comprising: a processor and a memory; the memory is used for storing a program of a facial expression recognition method based on a facial action unit, and the electronic equipment is powered on and executes the program of the facial expression recognition method based on the facial action unit through the processor, so that any one of the facial expression recognition methods based on the facial action unit is executed.

Accordingly, the present application also provides a computer-readable storage medium containing one or more program instructions for executing the facial expression recognition method based on a facial action unit as described in any one of the above by a server.

The facial expression recognition method based on the facial action unit realizes an end-to-end algorithm from a user visual portrait picture to a user visual facial expression recognition result, has stronger robustness to different test objects, can perform expression recognition based on the facial action unit, and uses a microscopic facial action unit to restrict macroscopic expression compared with macroscopic expression recognition, thereby being more beneficial to recognition of tiny expression.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

Fig. 1 is a flowchart of a facial expression recognition method based on a facial action unit according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a facial expression recognition apparatus based on a facial action unit according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an identification model in a data enhancement method based on image background style conversion according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes an embodiment of the facial expression recognition method based on the facial action unit according to the present invention in detail. As shown in fig. 1, which is a flowchart of a facial expression recognition method based on a facial action unit according to an embodiment of the present invention, a specific implementation process includes the following steps:

step S101: and collecting picture data containing the face area.

Step S102: and preprocessing the picture data containing the face region by using a face key point detection technology, and subdividing the face region into a plurality of face action unit trigger regions.

After the image data containing the face region is collected in step S101, the image data containing the face region may be preprocessed by using a face key point detection technique in this step.

In the embodiment of the present invention, the preprocessing is performed on the picture data containing the face region by using the face keypoint detection technology, and the face region is subdivided into a plurality of face action unit trigger areas, and the specific implementation process may include: detecting a preset number of key points on the face area according to the face gouging structure and a face key point detection algorithm; clipping the picture data according to a preset number of key points to obtain a target face area, and scaling the target face area to a preset pixel size to realize the normalization of the face image; and subdividing the target human face area into three facial action unit trigger areas, wherein the three facial action unit trigger areas comprise an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area. For example, in the specific implementation process, 68 key points on the face can be detected and extracted based on the face gouging and topology structure and the face key point (feature point) detection algorithm, the face region is cut from the original portrait picture according to the 68 key points, and the face region is scaled to 250 × 250 pixels to realize the normalization of the face image; the human face area is subdivided into three face action unit trigger areas, namely an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area.

Step S103: and identifying the facial action unit by using a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit.

After the face region is subdivided into a plurality of face action unit trigger regions in step S102, in this step, the face action units may be identified by using a preset three-dimensional convolutional neural network identification model, so as to further obtain features of the face action units.

As shown in fig. 4, in the embodiment of the present invention, the identifying the facial action unit by using the preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit may include: selecting a facial action unit having an association relation with the expression, and constructing a corresponding relation between the facial action unit and the three facial action unit trigger areas; extracting local features of the face action unit in a corresponding face action unit trigger area by using the three-dimensional convolutional neural network identification model to obtain local features of the face action unit trigger area; inputting the local features of the trigger area of the facial action unit into a softmax layer for identifying the facial action unit, and obtaining the features of the facial action unit. For example, in the specific implementation process, 68 feature points on the face are extracted according to the face gouging and topological structure, 13 face Action units (Action units, AUs) with strong expression association relationship are selected from 41 expression-related face Action units (AU1-AU41) according to the association relationship between the face Action units and the expressions, the triggering areas of the 13 AUs are integrated to divide the face into three Action Unit triggering areas, namely, an eye Action Unit triggering area, a T area Action Unit triggering area and a lip Action Unit triggering area, and the correspondence relationship between the 13 face Action units and the three face Action Unit triggering areas is established. Correspondingly, the local features of the 13 face action units in the corresponding face action unit trigger areas are respectively extracted by using 13 three-dimensional Convolutional Neural network recognition models (3D Convolutional Neural Networks; 3D CNN) to obtain the local features of the 13 action units trigger areas; and introducing the 13 local features of the trigger area of the action unit into a softmax layer to perform identification on 13 facial action features, and obtaining an identification result of the 13 facial action features, that is, obtaining the features of the facial action unit, which is not specifically limited herein. The three-dimensional convolutional neural network recognition model can be divided into 13 parallel 3DCNN networks. The 4 eye receiving action unit trigger areas serve as input, and recognition results of AU1 (corresponding to the characteristic that the inner angle of eyebrows is raised), AU2 (corresponding to the characteristic that the outer angle of eyebrows is raised), AU4 (corresponding to the characteristic that eyebrows are wrinkled) and AU7 (corresponding to the characteristic that the muscle circles of the orbicularis rings are tightened) serve as output; 2 receiving T area action unit trigger areas as input, and using the identification results of AU9 (corresponding to characteristic of raised inner angle of eyebrow) and AU17 (corresponding to characteristic of raised inner angle of eyebrow) as output; 7 lip action unit trigger areas as input, AU10 (corresponding to the feature of upward movement of the pulling mouth angle), AU12 (corresponding to the feature of upward inclination of the pulling mouth angle), AU15 (corresponding to the feature of downward movement of the pulling mouth angle), AU20 (corresponding to the feature of upward pushing of the lower lip), AU24 (corresponding to the feature of stretching of the mouth angle), AU25 (corresponding to the feature of mutual pressing of the mouth angles), and AU26 (corresponding to the feature of open mouth) as output.

In addition, it should be noted that, in the AU-based facial expression recognition process, it is important to select an AU having a strong correlation with a facial expression from among a plurality of AUs to recognize the facial expression, and therefore, an AU having a weak correlation should be excluded in the selection process to reduce the influence on the facial expression recognition. According to the statistical association relationship between AUs and 6 basic facial expressions in FACS, 13 AUs with association degrees of more than 70% with 6 basic facial expressions (namely 13 AUs with the maximum association degree) are selected. The 13 AUs are mainly distributed in the eyebrow area, nose area and lip area. Wherein, U1 (corresponding to the characteristic of raised inner brow angle), AU2 (corresponding to the characteristic of raised outer brow angle), AU4 (corresponding to the characteristic of frown ridge), AU7 (corresponding to the characteristic of tightened eye ring muscle circle) describes the muscle movement of the brow eye region, AU9 (corresponding to the characteristic of raised inner brow angle), AU17 (corresponding to the characteristic of raised inner brow angle) describes the muscle movement of the nose region, AU10 (corresponding to the characteristic of pulled mouth angle upward movement), AU12 (corresponding to the characteristic of pulled mouth angle upward inclination), AU15 (corresponding to the characteristic of pulled mouth angle downward movement), AU20 (corresponding to the characteristic of pushed lower lip upward), AU24 (corresponding to the characteristic of pulled mouth angle), AU25 (corresponding to the characteristic of mutual pressing of mouth angles), and AU26 (corresponding to the characteristic of opened mouth) describes the muscle movement of the lip region.

In computer vision research, the use of two-dimensional convolution is common, but when video data needs to be processed, dynamic information between a plurality of consecutive frames is required. Three-dimensional convolution is performed in the convolution stage of the CNN, so that feature calculation is performed from two dimensions of space and time. Three-dimensional convolution is the extraction of convolution features on a cube formed by the stacking of successive video frames by using a three-dimensional convolution kernel. The structure enables the feature mapping in the convolutional layer to be connected with a plurality of continuous frames in the previous layer, so that the motion information in the video can be captured, and the specific implementation process is not expanded to be repeated.

Step S104: and performing feature fusion on the obtained global facial features and the features of the facial action units to obtain facial expression fusion features.

After the features of the face action unit are obtained in step S103 described above, the obtained global features of the face and the features of the face action unit may be subjected to a feature fusion process in this step.

The feature fusion is performed on the obtained global facial features and the features of the facial action units to obtain the facial expression fusion features, and the specific implementation process may include: the target face area is cut into blocks and input into a preset three-dimensional convolutional neural network recognition model to extract features of a frontal face, and the global features of the face are obtained; and performing feature level fusion on the features of the facial action unit and the global facial features to obtain facial expression fusion features. Specifically, the obtained normalized face region cut block can be introduced into another preset three-dimensional convolutional neural network recognition model (3D CNN) to extract the features of the frontal face. And performing feature level fusion on the obtained 13 local action unit trigger area features and the obtained whole face features. In a specific implementation process, the normalized face region can be input into an independent 3DCNN network for extracting the whole face features; and then, carrying out feature layer fusion on the extracted whole face features and the local features of the previous 13 AUs to obtain fusion features combining the whole face and the local action unit trigger area.

Step S105: and inputting the facial expression fusion features into a preset Gaussian process classifier, identifying the facial expressions of the human face by using the Gaussian process classifier, and outputting facial expression identification results.

After the facial expression fusion features are obtained in step S104, the gaussian process classifier may be used to identify the facial expressions of the human face in this step, and a facial expression identification result is output.

Wherein the Gaussian process classifier is based on an associative relationship constraint between facial action units and expressions. Therefore, the identifying the facial expression of the human face by using the gaussian process classifier and outputting a facial expression identifying result may include: inputting the facial expression fusion features into the Gaussian process classifier to obtain a primary facial expression recognition result; and further constraining the preliminary facial expression recognition result by using the characteristics of the facial action unit and the association relationship between the facial action unit and the facial expression, and outputting a facial expression recognition result. Specifically, the fusion features obtained in step S104 may be introduced into a gaussian process classifier to obtain a preliminary facial expression recognition result. And (4) performing further constraint on the obtained preliminary facial expression recognition result by using the facial motion characteristic recognition result obtained in the step (S103) and the incidence relation between the facial motion characteristic and the facial expression, so as to obtain more accurate expression recognition.

In a specific embodiment of the invention, a human face key point detection technology is utilized for picture data containing a human face area, and then the picture is cut according to the planning resolving structure of a key point and a face action unit, so that the human face is divided into three parts, namely an eye part, a nose part and a mouth part; multitask facial action unit recognition: respectively inputting the cutting data of the eyes, the nose and the mouth into 13 deep Convolutional Neural Networks (CNN) in different areas for AU identification; AU-based facial expression recognition: and performing feature fusion on the AU features extracted in the AU identification process and the facial features extracted from the facial image to obtain the facial expression features. On the basis of the design of a basic facial expression recognition Gaussian process classifier, symbiotic relation is added as constraint, the prior knowledge of the relation between AU (AU) -expressions is utilized, the facial expression recognition performance is improved, an end-to-end algorithm from a user visual portrait picture to a user visual facial expression recognition result is realized, and the method has strong robustness to different test objects.

By adopting the facial expression recognition method based on the facial action unit, the end-to-end algorithm from the visual portrait picture of the user to the visual facial expression recognition result of the user is realized, and has strong robustness to different test objects, can perform expression recognition based on facial action units, and compared with macroscopic expression recognition, the microscopic facial action units are used for restraining the macroscopic expression, which is more beneficial to the recognition of the tiny expression, meanwhile, facial features which have more representation capability on facial expressions are formed by combining the whole face features and the local features of the action units, and by utilizing the face topology and the relationship between the facial action units and the expressions, the system consumption can be reduced while the recognition effect is not influenced, and the facial expression recognition performance is improved.

Corresponding to the facial expression recognition method based on the facial action unit, the invention also provides a facial expression recognition device based on the facial action unit. Since the embodiment of the device is similar to the embodiment of the method, the description is simple, and please refer to the description of the embodiment of the method, and the following description of the embodiment of the facial expression recognition device based on the facial action unit is only illustrative. Fig. 2 is a schematic diagram of a facial expression recognition apparatus based on a facial action unit according to an embodiment of the present invention.

The invention relates to a facial expression recognition device based on a facial action unit, which comprises the following parts:

the data acquisition unit 201 is configured to acquire image data including a face region.

The preprocessing unit 202 is configured to preprocess the image data containing the face region by using a face keypoint detection technology, and subdivide the face region into a plurality of face action unit trigger areas.

And the face action unit identification unit 203 is used for identifying the face action unit by using a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the face action unit.

And the feature fusion unit 204 is configured to perform feature fusion on the obtained global facial features and the features of the facial action unit to obtain facial expression fusion features.

And the facial expression recognition unit 205 is configured to input the facial expression fusion features into a preset gaussian process classifier, recognize facial expressions of the human face by using the gaussian process classifier, and output a facial expression recognition result.

By adopting the facial expression recognition device based on the facial action unit and the facial expression recognition method based on the facial action unit, the end-to-end algorithm from the visual portrait picture of the user to the visual facial expression recognition result of the user is realized, and has strong robustness to different test objects, can perform expression recognition based on facial action units, and compared with macroscopic expression recognition, the microscopic facial action units are used for restraining the macroscopic expression, which is more beneficial to the recognition of the tiny expression, meanwhile, facial features which have more representation capability on facial expressions are formed by combining the whole face features and the local features of the action units, and by utilizing the face topology and the relationship between the facial action units and the expressions, the system consumption can be reduced while the recognition effect is not influenced, and the facial expression recognition performance is improved.

Corresponding to the facial expression recognition method based on the facial action unit, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is relatively simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 3 is a schematic view of an electronic device according to an embodiment of the present invention.

The electronic device specifically includes: a processor 301 and a memory 302; wherein the memory 302 is used for executing one or more program instructions for storing a program of the facial expression recognition method based on the facial action unit, and the server is powered on and executes the program of the facial expression recognition method based on the facial action unit through the processor 301 to execute any one of the facial expression recognition methods based on the facial action unit. The electronic device of the present invention may be a server.

Corresponding to the facial expression recognition method based on the facial action unit, the invention also provides a computer storage medium. Since the embodiment of the computer storage medium is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the computer storage medium described below is only schematic.

The computer storage medium contains one or more program instructions for executing the facial action unit-based facial expression recognition method described above by a server.

In an embodiment of the invention, the processor or processor module may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The processor reads the information in the storage medium and completes the steps of the method in combination with the hardware.

The storage medium may be a memory, for example, which may be volatile memory or nonvolatile memory, or which may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory.

The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (ddr Data Rate SDRAM), Enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and Direct memory bus RAM (DRRAM).

The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory. Those skilled in the art will appreciate that the functionality described in the present invention may be implemented in a combination of hardware and software in one or more of the examples described above. When software is applied, the corresponding functionality may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A facial expression recognition method based on a facial action unit is characterized by comprising the following steps:

collecting picture data containing a face area;

preprocessing picture data containing a face region by using a face key point detection technology, and subdividing the face region into a plurality of face action unit trigger regions;

identifying the facial action unit by using a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the facial action unit;

performing feature fusion on the obtained global facial features and the features of the facial action units to obtain facial expression fusion features;

and inputting the facial expression fusion features into a preset Gaussian process classifier, identifying the facial expressions of the human face by using the Gaussian process classifier, and outputting facial expression identification results.

2. The method according to claim 1, wherein the preprocessing the image data including the face region by using the face keypoint detection technique to subdivide the face region into a plurality of face motion unit trigger areas comprises:

detecting a preset number of key points on the face area according to the face gouging structure and a face key point detection algorithm;

clipping the picture data according to a preset number of key points to obtain a target face area, and scaling the target face area to a preset pixel size to realize the normalization of the face image;

and subdividing the target human face area into three facial action unit trigger areas, wherein the three facial action unit trigger areas comprise an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area.

3. The method for recognizing facial expressions based on facial action units according to claim 2, wherein the recognizing facial action units by using a preset three-dimensional convolutional neural network recognition model to obtain the characteristics of the facial action units specifically comprises:

selecting a facial action unit having an association relation with the expression, and constructing a corresponding relation between the facial action unit and the three facial action unit trigger areas;

extracting local features of the face action unit in a corresponding face action unit trigger area by using the three-dimensional convolutional neural network identification model to obtain local features of the face action unit trigger area;

inputting the local features of the trigger area of the facial action unit into a softmax layer for identifying the facial action unit, and obtaining the features of the facial action unit.

4. The facial expression recognition method based on the facial action unit as claimed in claim 3, wherein the feature fusion of the obtained global facial features and the features of the facial action unit to obtain the facial expression fusion features comprises:

the target face area is cut into blocks and input into a preset three-dimensional convolutional neural network recognition model to extract features of a frontal face, and the global features of the face are obtained;

and performing feature level fusion on the features of the facial action unit and the global facial features to obtain facial expression fusion features.

5. A facial action unit based facial expression recognition method as claimed in claim 4, wherein the Gaussian process classifier is based on an associative relationship constraint between facial action units and expressions.

6. The facial expression recognition method based on facial action unit as claimed in claim 5, wherein the recognizing facial expressions of the human face by using the gaussian process classifier and outputting the recognition results of the facial expressions specifically comprises:

inputting the facial expression fusion features into the Gaussian process classifier to obtain a primary facial expression recognition result;

and further constraining the preliminary facial expression recognition result by using the characteristics of the facial action unit and the association relationship between the facial action unit and the facial expression, and outputting a facial expression recognition result.

7. A facial expression recognition apparatus based on a facial action unit, comprising:

the data acquisition unit is used for acquiring picture data containing a face area;

the preprocessing unit is used for preprocessing the picture data containing the face area by using a face key point detection technology and subdividing the face area into a plurality of face action unit trigger areas;

the face action unit identification unit is used for identifying the face action unit by utilizing a preset three-dimensional convolutional neural network identification model to obtain the characteristics of the face action unit;

the feature fusion unit is used for carrying out feature fusion on the obtained global facial features and the features of the facial action unit to obtain facial expression fusion features;

and the facial expression recognition unit is used for inputting the facial expression fusion characteristics into a preset Gaussian process classifier, recognizing the facial expression of the human face by using the Gaussian process classifier and outputting a facial expression recognition result.

8. A facial action unit based facial expression recognition device as claimed in claim 7, wherein the pre-processing unit is specifically configured to: detecting a preset number of key points on the face area according to the face gouging structure and a face key point detection algorithm; clipping the picture data according to a preset number of key points to obtain a target face area, and scaling the target face area to a preset pixel size to realize the normalization of the face image; and subdividing the target human face area into three facial action unit trigger areas, wherein the three facial action unit trigger areas comprise an eye action unit trigger area, a T area action unit trigger area and a lip action unit trigger area.

9. An electronic device, comprising:

a processor; and

a memory for storing a program of a facial expression recognition method based on a facial action unit, the electronic device being powered on and executing the program of the facial expression recognition method based on a facial action unit through the processor to execute the facial expression recognition method based on a facial action unit according to any one of claims 1 to 6.

10. A computer-readable storage medium having embodied therein one or more program instructions for execution by a server of the facial action unit-based facial expression recognition method of any one of claims 1-6.