CN117275060A

CN117275060A - Facial expression recognition method and related equipment based on emotion grouping

Info

Publication number: CN117275060A
Application number: CN202311151485.XA
Authority: CN
Inventors: 张浩洋
Original assignee: Guangzhou Pixel Solutions Co ltd
Current assignee: Guangzhou Pixel Solutions Co ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-12-22

Abstract

The invention discloses a facial expression recognition method and related equipment based on emotion grouping, and relates to the technical field of facial expression recognition, wherein the method comprises the following steps: acquiring a portrait image, and performing correction pretreatment on the portrait image; inputting the face image after correction pretreatment to a feature detection model constructed by a neural network to obtain a group of feature images, and simultaneously inputting the feature images to three fully-connected networks respectively, wherein each emotion set also comprises subdivided emotion belonging to the emotion set; outputting probabilities of various sub-divided emotions in the corresponding emotion set, and identifying the facial expression according to the probabilities of the various sub-divided emotions. The invention can improve the feature extraction capability of the algorithm model for emotion categories with similar feature expression, and achieve the purpose of increasing the distance between the emotion categories.

Description

Facial expression recognition method and related equipment based on emotion grouping

Technical Field

The invention relates to the technical field of facial expression recognition, in particular to a facial expression recognition method and related equipment based on emotion grouping.

Background

Facial expressions play a very important role in human communication, especially in daily life. It is an important non-verbal message that conveys not only emotions, but also profound emotion information of the communication, which is more effective in some respects than verbal communication. With the rapid development of computer technology, research on facial expression recognition technology has become very active. The facial expression recognition technology is widely applied to the fields of intelligent customer service, emotion analysis, virtual reality and the like. In research methods, deep learning-based methods have become mainstream, for example, a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN) or the like model is used to extract features and classify them. In addition, some studies have attempted to fuse multimodal information (e.g., facial expressions, sounds, semantics, etc.) to improve accuracy and robustness of recognition.

In the research of scholars at home and abroad, 7 facial expressions (no expression, happiness, surprise, sadness, anger, aversion and fear) are mainly identified. Classical algorithms for facial expression recognition have been widely improved, such as deep learning-based methods, transfer learning-based methods, multimodal fusion-based methods, and the like. These approaches have achieved good results in different scenarios and data sets, but also have problems and limitations.

1. The expression category class-to-class distance is small: in the research of facial expression classification, the same still image may present various expressions, such as smile and surprise, anger and panic, and similar expressions. Some similar facial features may be shared between these expressions, such as mouth opening and eye opening, etc., making them difficult to accurately identify and distinguish. Such a problem of small inter-class distances may cause difficulty in accurate expression recognition of the model in a real scene, and may reduce generalization ability of the model.

2. Imbalance of data sets: because the number of samples of different expressions is different, the training data set may have a problem of unbalanced categories, so that the recognition effect of the model on certain expressions is poor.

3. Noise and interference of data sets: face images may be affected by factors such as illumination, pose, occlusion, etc., resulting in poor image quality. These factors can introduce noise and interference, reducing the accuracy and robustness of the model.

Disclosure of Invention

Aiming at the problem of small distance between emotion categories in the prior art, the invention provides a facial expression recognition method and related equipment based on emotion grouping, which are characterized in that data labels are firstly classified into large categories and then subdivided into small categories according to an emotion grouping training scheme so as to improve the feature extraction capability of an algorithm model for emotion categories with similar feature expression and achieve the purpose of increasing the distance between emotion categories.

In order to achieve the above purpose, the present invention provides the following technical solutions:

in a first aspect, the present invention provides a facial expression recognition method based on emotion grouping, which includes the steps of:

acquiring a portrait image, and performing correction pretreatment on the portrait image;

inputting the corrected and preprocessed portrait images into a feature detection model constructed by a neural network to obtain a group of feature images, and simultaneously inputting the feature images into three fully-connected networks respectively, wherein one fully-connected network predicts a first emotion set, the other fully-connected network predicts a second emotion set, and the rest fully-connected network predicts a third emotion set, wherein the first emotion set represents self-positive emotion, the second emotion set represents self-negative emotion, the third emotion set represents externally negative emotion, and each emotion set further comprises subdivided emotion belonging to the emotion set;

outputting probabilities of various sub-divided emotions in the corresponding emotion set, and identifying the facial expression according to the probabilities of the various sub-divided emotions.

In a second aspect, the present invention provides a facial expression recognition system based on emotion grouping, comprising the steps of:

the data acquisition unit is used for acquiring a portrait image and carrying out correction pretreatment on the portrait image;

a data processing unit for performing the steps of:

In a third aspect, the present invention also provides an electronic device, including a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

In a fourth aspect, the present invention also provides a computer-readable storage medium storing a program for execution by a processor to implement a method as described above.

In a fifth aspect, the present invention also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a facial expression recognition method and related equipment based on emotion grouping, which are used for improving the feature extraction capacity of an algorithm model for emotion categories with similar feature expression. The invention strengthens the distance between emotion categories from the data layer and the model layer:

(1) The 7 types of expressions are divided into three groups of [ no expression, happiness, surprise ], [ no expression, injury, panic ], [ aversion, no expression and anger ], and the problem that the traditional method is difficult to extract the fine-grained characteristics by directly using the multi-classification method to cause misidentification of similar expressions is avoided.

(2) Based on the recognition technology of the deep learning network to the digital image, probability prediction of facial expression is realized, the probability that the image belongs to self-active emotion, self-passive emotion and external negative emotion is predicted from large categories, and sub-divided emotion categories are predicted from the predicted large categories.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings needed in the embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is one of the flowcharts of a facial expression recognition method based on emotion grouping according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a facial expression recognition method based on emotion grouping according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a facial expression recognition system based on emotion grouping according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Examples:

it should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Referring to fig. 1 and fig. 2, a facial expression recognition method based on emotion grouping provided by an embodiment of the present invention may include the following processing flows:

step 101: and acquiring a portrait image, and performing correction pretreatment on the portrait image.

In the steps: preprocessing the human image, wherein the preprocessing comprises face detection, face key point detection, face alignment, clipping and scaling to obtain an aligned human image.

In some embodiments, the correction preprocessing for the portrait image specifically includes:

performing face detection operation by using the trained face detection model to obtain a face frame;

inputting the coordinates of the face frame into a ready-trained face key point detection model, and detecting the coordinates of left and right eyes, nose tips and left and right mouth corners in the face;

affine transformation operation is performed on the coordinates and the average face coordinates to correct and align the face.

Step 102: the face image after correction pretreatment is input into a feature detection model constructed by a neural network to obtain a group of feature images, the feature images are simultaneously and respectively input into three fully-connected networks, one fully-connected network predicts a first emotion set, the other fully-connected network predicts a second emotion set, the rest fully-connected network predicts a third emotion set, the first emotion set represents self-positive emotion, the second emotion set represents self-negative emotion, the third emotion set represents externally negative emotion, and each emotion set further comprises subdivision emotion belonging to the emotion set.

In the steps: the face image is input into a feature detection model to extract features, the feature detection model comprises but is not limited to a convolutional neural network model such as VGGNet, resNet, transformer, and a classified full-connection layer network connected with the tail of the model can be replaced by a regressor MLP. After feature extraction is completed, the face features are respectively input into three classifiers to output emotion prediction results (or regressors MLP).

The network training adopts a back propagation algorithm, the calculation of the network parameter gradient adopts a random gradient descent method, the calculation of the classification error adopts a Cross Entropy plus regularization term, and the process is iterated until the average error is not descended any more.

L is a loss function, y _j Is a true value label, s _j Is the j-th value of the output vector S of softmax, indicating the probability that this sample belongs to the j-th class. θ is the parameter vector of the model, λ _n Is the weight value of the regularization term.

Illustratively, training parameters and environment: the optimizer selects SGD (random gradient descent) and uses MultiStepLR for learning rate adjustment, and the image size 256*256,batch size is set to 1024; training environment Ubuntu,4 x assnd-910 (32 GB).

In certain embodiments, the first set of emotions includes a lack of expression, laughter, and/or surprise; the second set of moods comprises no expression, crying and/or terrorism; the third set of emotions includes aversion, lack of expression and/or anger.

Step 103: outputting probabilities of various sub-divided emotions in the corresponding emotion set, and identifying the facial expression according to the probabilities of the various sub-divided emotions.

In this step, illustratively, a probability of a laughing expression corresponding to the first set of emotions may be output, or a probability of a panic expression corresponding to the second set of emotions may be output.

To sum up, in the prior art and the schemes used in the leading edge paper are analyzed at present, a depth separable convolutional neural network or other neural network model framework is directly used for multi-classification tasks, and after the characteristics are directly extracted by using a model, the multi-classification is completed by using a classifier.

The embodiment of the invention groups the expression categories from the similarity of facial expression images, firstly, the facial expression images are divided into major categories (self-active emotion, self-passive emotion and external negative emotion), and the major categories are divided into minor categories [ no expression, happiness, surprise ], [ no expression, cry, surprise ], [ no expression, aversion, anger ]. The method is different from the prior art in that the prior art is connected with a multi-classifier after the feature extraction network, the embodiment of the invention creatively performs two-stage classification, namely, the first stage classifies the emotion of the major class, and the second stage classifies the subclass of each major class.

Example 2

The embodiment constructs a basic deep neural network model to be used as a 7-class task of facial expression, and uses the existing public data set Affect Net (http:// mohammadmahor. Com/AffectNet) and RAF-DB (Real-world Affective Faces Database) for training. The data set is divided into a training set (RAF-DB), a verification set (RAF-DB) and a test set (AffectNet), after iterative training, the accuracy of the model on the verification set reaches 78%, and the accuracy on the test set is 62%.

The classification result of the experimental data is analyzed, and the happy category is easy to be confused with the surprise category, and the main reason is that when the happy emotion degree is too high, the facial expression of the face of the person is exaggerated, and the happy emotion can have very high similarity with the surprise emotion, and the injury category is the same. And the disgust emotion and anger emotion are confused with each other in the experimental result. In order to enable the model to learn the features with finer granularity, 7 types of emotions are divided into three groups of [ no expression, happiness and surprise ], [ no expression, large cry and surprise ], [ no expression, aversion and anger ] according to the model classification false recognition rate.

On the other hand, the three groups of classification can be classified into self-positive emotion, self-negative emotion and external negative emotion, which is equivalent to classifying 6 expressions+1 expression-free classification tasks into 3 classification tasks.

Example 3

Referring to fig. 3, based on the same inventive concept, an embodiment of the present invention further provides a facial expression recognition system based on emotion grouping, which includes the steps of:

a data processing unit for performing the steps of:

inputting the face images subjected to correction pretreatment into a feature detection model constructed by a neural network to obtain a group of feature images, and simultaneously inputting the feature images into three fully-connected networks respectively, wherein one fully-connected network predicts a first emotion set, the other fully-connected network predicts a second emotion set, and the rest fully-connected network predicts a third emotion set, wherein the first emotion set represents self-positive emotion, the second emotion set represents self-negative emotion, the third emotion set represents externally negative emotion, and each emotion set further comprises subdivided emotion belonging to the emotion set;

Because the system is a system corresponding to the facial expression recognition method based on emotion grouping according to the embodiment of the present invention, and the principle of solving the problem of the system is similar to that of the method, the implementation of the system can refer to the implementation process of the above method embodiment, and the repetition is omitted.

Example 3

Referring to fig. 4, based on the same inventive concept, an embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where at least one instruction, at least one program, a code set, or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor, so as to implement the emotion grouping-based facial expression recognition method as described above.

It is understood that the Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Optionally, the memory includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). The memory may be used to store instructions, programs, code sets, or instruction sets. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described above, and the like; the storage data area may store data created according to the use of the server, etc.

The processor may include one or more processing cores. The processor uses various interfaces and lines to connect various portions of the overall server, perform various functions of the server, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and invoking data stored in memory. Alternatively, the processor may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU) and a modem etc. Wherein, the CPU mainly processes an operating system, application programs and the like; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor and may be implemented by a single chip.

Because the electronic device is the electronic device corresponding to the facial expression recognition method based on emotion grouping according to the embodiment of the present invention, and the principle of solving the problem of the electronic device is similar to that of the method, the implementation of the electronic device can refer to the implementation process of the embodiment of the method, and the repetition is omitted.

Example 4

Based on the same inventive concept, the embodiments of the present invention also provide a computer-readable storage medium having at least one instruction, at least one program, a code set, or an instruction set stored therein, the at least one instruction, the at least one program, the code set, or the instruction set being loaded and executed by a processor to implement the emotion grouping-based facial expression recognition method as described above.

Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data that is readable by a computer.

Because the storage medium is a storage medium corresponding to the facial expression recognition method based on emotion grouping according to the embodiment of the present invention, and the principle of solving the problem by the storage medium is similar to that of the method, the implementation of the storage medium can refer to the implementation process of the embodiment of the method, and the repetition is omitted.

Example 5

In some possible implementations, aspects of the methods of the embodiments of the present invention may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of the emotion grouping based facial expression recognition method according to the various exemplary embodiments of the present application as described herein above, when the program product is run on the computer device. Wherein executable computer program code or "code" for performing the various embodiments may be written in a high-level programming language such as C, C ++, c#, smalltalk, java, javaScript, visual Basic, structured query language (e.g., act-SQL), perl, or in a variety of other programming languages.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

The above embodiments are only for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, and are not intended to limit the scope of the present invention. All equivalent changes or modifications made in accordance with the essence of the present invention are intended to be included within the scope of the present invention.

Claims

1. The facial expression recognition method based on emotion grouping is characterized by comprising the following steps:

2. The emotion grouping-based facial expression recognition method of claim 1, wherein the correction preprocessing is performed on the portrait image, and specifically includes:

3. The emotion grouping-based facial expression recognition method of claim 1, wherein the first emotion set includes no expression, laughter, and/or surprise; the second set of moods comprises no expression, crying and/or terrorism; the third set of emotions includes aversion, lack of expression and/or anger.

4. A facial expression recognition system based on emotion grouping, comprising the steps of:

a data processing unit for performing the steps of:

5. The emotion grouping-based facial expression recognition system of claim 4, wherein the correction preprocessing of the portrait image specifically includes:

6. The emotion grouping based facial expression recognition system of claim 4, wherein the first set of emotions includes no expression, laughter, and/or surprise; the second set of moods comprises no expression, crying and/or terrorism; the third set of emotions includes aversion, lack of expression and/or anger.

7. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, code set, or instruction set being loaded and executed by the processor to implement the emotion grouping-based facial expression recognition method of any of claims 1-3.

8. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the emotion grouping-based facial expression recognition method of any of claims 1 to 3.