CN115035566B

CN115035566B - Expression recognition method, apparatus, computer device and computer readable storage medium

Info

Publication number: CN115035566B
Application number: CN202210492682.7A
Authority: CN
Inventors: 吴雅林; 石宇; 胡阿珍; 张勤俭; 闫林杨; 尉明华
Original assignee: Peking University Shenzhen Hospital
Current assignee: Peking University Shenzhen Hospital
Priority date: 2022-05-07
Filing date: 2022-05-07
Publication date: 2023-07-04
Anticipated expiration: 2042-05-07
Also published as: CN115035566A

Abstract

The application relates to the field of image recognition, and provides an expression recognition method, an expression recognition device, computer equipment and a computer readable storage medium. The method comprises the following steps: acquiring a plurality of face images representing facial expressions; identifying an eye area and a mouth area of each face image to obtain an eye area image and a mouth area image of the corresponding face image; extracting global features of each face image to obtain corresponding global feature vectors; respectively extracting partial features of the eye region diagram and the mouth region diagram corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors; and carrying out expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain an expression category corresponding to the face image. The method can provide higher recognition rate in real-time recognition and can effectively reduce the calculated amount of the computer.

Description

Expression recognition method, apparatus, computer device and computer readable storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an expression recognition method, apparatus, computer device, and computer readable storage medium.

Background

Expression recognition is the recognition of facial expressions of the current face that express the user's different emotional states and current physiological-psychological reactions, which are part of the human body language, and a way to convey the current individual state to the outside.

The existing facial expression recognition method mainly comprises a method based on geometric features. The method based on the geometric features mainly comprises the steps of encoding geometric positions and shapes of different facial five sense organs of a user to obtain features representing facial expressions of the user, and recognizing the facial expressions according to the encoded facial expression features. However, this approach suffers from lower recognition rates under complex light and varying facial movements.

Disclosure of Invention

In view of the foregoing, it is necessary to provide an expression recognition method, apparatus, computer device, and computer-readable storage medium capable of reducing the amount of calculation and guaranteeing real-time recognition without a decrease in recognition rate.

The embodiment of the application provides an expression image recognition method, which comprises the following steps:

Acquiring a plurality of face images representing facial expressions;

identifying an eye area and a mouth area of each face image to obtain an eye area image and a mouth area image of the corresponding face image;

extracting global features of each face image to obtain corresponding global feature vectors;

respectively extracting partial features of the eye region diagram and the mouth region diagram corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors;

and carrying out expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain an expression category corresponding to the face image.

In one embodiment, the acquiring a plurality of facial images characterizing facial expressions includes:

acquiring a real-time facial expression image shot by a shooting assembly;

combining the real-time facial expression image and the historical facial expression image to form a facial expression image set; the facial expression image set comprises a plurality of initial facial images which represent facial expressions and contain image backgrounds;

and recognizing the human face in the initial human face image by a human face feature point method to obtain a plurality of human face images corresponding to the human face expression image set.

In one embodiment, the extracting the partial feature of the eye region map and the mouth region map corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors includes:

converting the eye region image and the mouth region image corresponding to each face image into gray images to obtain corresponding eye region gray images and mouth region gray images;

extracting features of the eye region gray level map and the mouth region gray level map according to a preset sliding distance through a preset window to obtain a plurality of corresponding eye feature maps and a plurality of corresponding mouth feature maps;

performing convolution processing on each eye feature image and the corresponding mouth feature image through a mask to obtain a plurality of convolved eye mask feature images and a plurality of mouth mask feature images;

generating an eye feature vector corresponding to the face image based on the eye feature images and the eye mask feature images;

and generating a mouth characteristic vector corresponding to the face image based on the mouth characteristic diagrams and the mouth mask characteristic diagrams.

In one embodiment, the mask includes a Kirsch operator and a second derivative gaussian operator; the step of carrying out convolution processing on each eye feature map through a mask to obtain a plurality of convolved eye mask feature maps comprises the following steps:

Carrying out convolution processing on each eye feature map through the Kirsch operator to obtain corresponding edge features;

carrying out convolution processing on each eye feature map through the second derivative Gaussian operator to obtain a corresponding center feature;

and combining the edge feature and the central feature to form a convolved eye mask feature map.

In one embodiment, the generating an eye feature vector corresponding to the face image based on the plurality of eye feature maps and the plurality of eye mask feature maps includes:

calculating an average characteristic value of each eye characteristic graph according to the pixel value of each eye characteristic graph;

and calculating an eye feature vector of the face image based on the average feature value and each pixel value of the eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map.

In one embodiment, the ocular feature vector comprises an ocular ternary mode vector; the pixel values include a center pixel value and an edge pixel value; the calculating the eye feature vector of the face image based on the average feature value and each pixel value of each eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map includes:

Based on the average characteristic value, the central pixel value and the edge pixel value of each eye characteristic image and the central pixel value and the edge pixel value of the eye mask characteristic image corresponding to the eye characteristic image, calculating to obtain an eye ternary mode vector of the face image through a local direction ternary mode formula;

wherein ELDTP _p For the eye ternary mode vector, mu is the average characteristic value, SI _c 、SI _p Center pixel value and edge pixel value, ER, of the eye feature map, respectively _c 、ER _p And respectively obtaining a central pixel value and an edge pixel value of the eye mask feature map, wherein sigma is a conditional function.

In one embodiment, the ocular ternary pattern vector comprises an ocular ternary pattern vector and an ocular ternary pattern vector; the eye ternary mode vector of the face image obtained by calculation through the local direction ternary mode formula comprises the following steps:

when (when)

When the three-dimensional model vector is calculated according to the three-dimensional model formula of the local direction, the three-dimensional model vector of the human face image under the eyes is obtained;

when (when)

And calculating to obtain the ternary mode vector on the eyes of the face image through the local direction ternary mode formula.

An expression recognition apparatus, the apparatus comprising:

The facial image acquisition module is used for acquiring a plurality of facial images representing facial expressions;

the region image recognition module is used for recognizing the eye region and the mouth region of each face image to obtain an eye region image and a mouth region image of the corresponding face image;

the global feature extraction module is used for carrying out global feature extraction on each face image to obtain a corresponding global feature vector;

the partial feature extraction module is used for respectively carrying out partial feature extraction on the eye region graph and the mouth region graph corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors;

and the expression recognition module is used for carrying out expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain an expression category corresponding to the face image.

The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the expression recognition method provided by any embodiment of the application when executing the computer program.

Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the expression recognition method provided by any embodiment of the present application.

According to the expression recognition method, the device, the computer equipment and the computer readable storage medium, based on the obtained face images representing the facial expressions, the regions with the largest contribution to expression recognition in the face images comprise the eye regions and the mouth regions, so that the eye regions and the mouth regions in each face image are recognized, and an eye region diagram and a mouth region diagram of the corresponding face image are obtained; further, global feature extraction is carried out on each face image, and global feature vectors corresponding to each face image are obtained; then, partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram of each face image, so as to obtain eye feature vectors and mouth feature vectors corresponding to each face image; and further carrying out expression recognition on the facial image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain a corresponding expression category. The facial expression type of the face image is identified through the global feature vector and the local feature vector of the face image, the facial expression is identified through multi-feature fusion, a large amount of sample data is not needed, the influence of the environment is avoided, and the low identification rate caused by single features is avoided. Therefore, the method of the scheme can provide higher recognition rate in real-time recognition, and can effectively reduce the calculated amount of the computer.

Drawings

FIG. 1 is a flow chart of a method for identifying a form in one embodiment;

FIG. 2 is a schematic diagram of expression categories of a face image in a method of expression recognition in one embodiment;

FIG. 3 is a schematic diagram of a face feature point method of a table condition recognition method according to an embodiment;

FIG. 4A is a schematic diagram of a Kirsch operator in a table condition recognition method according to an embodiment;

FIG. 4B is a schematic diagram of a second derivative Gaussian operator in a table condition recognition method according to one embodiment;

FIG. 4C is a schematic diagram of an eye feature diagram in a method of identifying a condition in one embodiment;

FIG. 4D is a schematic diagram of an eye mask feature map generated in a table condition recognition method according to one embodiment;

FIG. 5 is a block diagram of a table condition recognition device according to one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, an expression recognition method is provided, where the method is applied to a terminal to illustrate, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

Step 102, obtaining a plurality of facial images representing facial expressions.

Wherein the facial image is an image characterizing a facial expression. The face image may be a face image including only a face region after the image background is removed. As shown in fig. 2, the expression categories of the face image mainly include 7 basic expressions of human, i.e., calm, happy, sad, surprise, fear, anger and disgust.

In one embodiment, taking a plurality of facial images representing facial expressions includes: acquiring a real-time facial expression image shot by a shooting assembly; combining the real-time facial expression image and the historical facial expression image to form a facial expression image set; the facial expression image set comprises a plurality of initial facial images which represent facial expressions and contain image backgrounds; and recognizing the face in the initial face image by a face feature point method to obtain a plurality of face images corresponding to the face expression image set. The camera shooting component is a component for carrying out photosensitive imaging on a target object according to an optical principle, such as a camera.

In one embodiment, as shown in fig. 3, a 68-face feature point method may be used to identify face regions in the face image and reject background regions in the face image.

Step 104, identifying an eye area and a mouth area of each face image, and obtaining an eye area diagram and a mouth area diagram of the corresponding face image.

The eye area comprises eyebrows, eyes and an area where nose bridges are located. The mouth region includes the area where the nostrils and mouth are located. Specifically, the terminal recognizes an eye region and a mouth region in each face image through an image recognition method, and obtains an eye region diagram and a mouth region diagram corresponding to each face image. The image recognition method may be a face feature point method, such as a 68-face feature point method.

And 106, extracting global features of each face image to obtain corresponding global feature vectors.

Specifically, the terminal may perform global feature extraction on each face image by using a Principal Component Analysis (PCA) method to obtain a corresponding feature vector, and further extract the first 31-dimensional features of the feature vector to form a global feature vector of the corresponding face image.

And step 108, respectively carrying out partial feature extraction on the eye region diagram and the mouth region diagram corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors.

Specifically, after obtaining an eye region diagram and a mouth region diagram of each face image, the terminal further performs partial feature extraction on the eye region diagram and the mouth region diagram to obtain an eye feature vector and a mouth feature vector corresponding to each face image.

Step 110, performing expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain the expression category of the corresponding face image.

Specifically, the terminal performs multi-feature fusion based on the global feature vector, the eye feature vector and the mouth feature vector to obtain a fusion feature vector corresponding to the face image; and further inputting the fusion feature vector of each face image into a Support Vector Machine (SVM) classifier for carrying out expression recognition to obtain the expression category of the corresponding face image.

In the embodiment, based on a plurality of obtained face images representing facial expressions, since the region with the largest contribution to facial expression recognition in the face images comprises an eye region and a mouth region, the eye region and the mouth region in each face image are recognized, and thus an eye region map and a mouth region map corresponding to the face images are obtained; further, global feature extraction is carried out on each face image, and global feature vectors corresponding to each face image are obtained; then, partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram of each face image, so as to obtain eye feature vectors and mouth feature vectors corresponding to each face image; and further carrying out expression recognition on the facial image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain a corresponding expression category. The facial expression type of the face image is identified through the global feature vector and the local feature vector of the face image, the facial expression is identified through multi-feature fusion, a large amount of sample data is not needed, the influence of the environment is avoided, and the low identification rate caused by single features is avoided. Therefore, the method of the scheme can provide higher recognition rate in real-time recognition, and can effectively reduce the calculated amount of the computer.

In one embodiment, performing partial feature extraction on the eye region map and the mouth region map corresponding to each face image, to obtain corresponding eye feature vectors and mouth feature vectors includes: converting the eye region image and the mouth region image corresponding to each face image into gray images to obtain corresponding eye region gray images and mouth region gray images; extracting features of the eye region gray level map and the mouth region gray level map according to a preset sliding distance through a preset window to obtain a plurality of corresponding eye feature maps and a plurality of corresponding mouth feature maps; carrying out convolution processing on each eye feature image and the corresponding mouth feature image through a mask to obtain a plurality of convolved eye mask feature images and a plurality of mouth mask feature images; generating an eye feature vector corresponding to the face image based on the plurality of eye feature images and the plurality of eye mask feature images; based on the plurality of mouth feature images and the plurality of mouth mask feature images, mouth feature vectors corresponding to the face images are generated.

The preset window may be a 3×3 pixel block, and the preset sliding distance range includes [1, the length of the gray scale map), which may be one pixel.

Specifically, after obtaining the eye region map and the mouth region map, the terminal further converts the obtained region map into a gray map to obtain an eye region gray map and a mouth region gray map corresponding to each face image. And then carrying out feature extraction on the gray level images according to a preset sliding distance by adopting a preset window to obtain a plurality of eye feature images and a plurality of mouth feature images corresponding to each face image, wherein the eye feature images and the mouth feature images of each face image are in one-to-one correspondence. Then, for each face image, the terminal convolves each eye feature image by using a mask to obtain a corresponding eye mask feature image, and convolves the mouth feature image corresponding to the eye feature image by using the same mask to obtain a corresponding mouth mask feature image. Finally, the terminal generates an eye feature vector corresponding to the face image based on the eye feature images and the eye mask feature images; based on the plurality of mouth feature images and the plurality of mouth mask feature images, mouth feature vectors corresponding to the face images are generated.

In one embodiment, the mask includes a Kirsch operator and a second derivative gaussian operator; carrying out convolution processing on each eye feature map through a mask, and obtaining a plurality of convolved eye mask feature maps comprises the following steps: carrying out convolution processing on each eye feature map through a Kirsch operator to obtain corresponding edge features; carrying out convolution processing on each eye feature map through a second derivative Gaussian operator to obtain a corresponding center feature; and combining the edge features and the center features to form a convolved eye mask feature map.

Wherein, the central feature refers to the pixel value in the middle position in the feature map. Edge features refer to other pixel values in the feature map than center features.

Specifically, for each eye feature map of each face image, the terminal convolves each mask feature map in the Kirsch operator with the eye feature map to obtain Kirsch convolution features corresponding to the mask feature maps; and replacing the features in the eye feature map according to the preset azimuth of the mask feature map corresponding to the Kirsch convolution features in the Kirsch operator to obtain the corresponding edge features after mask processing is carried out on the eye features. Likewise, for each eye feature map of each face image, the terminal convolves the eye feature map with a second derivative Gaussian operator to obtain Gaussian convolution features; and replacing the central feature in the eye feature map with the Gaussian convolution feature to obtain a corresponding central feature after mask processing of the eye feature. And finally, the updated eye feature map is used as a convolved eye mask feature map by the terminal.

For example, as shown in a schematic diagram of a Kirsch operator in fig. 4A, the Kirsch operator includes 8 3x3 mask feature maps of KM0 to KM7, where KM0 to KM7 are mask feature maps distributed in a counterclockwise direction and in a preset orientation; wherein the predetermined orientations include east, south, west, north, southeast, northeast, northwest, and southwest. As shown in fig. 4B, a schematic diagram of a second derivative gaussian operator, which includes a 3x3 mask feature map FGMc.

As shown in fig. 4C, the feature map SI is a feature map of 3x3, where the center feature is SIc and the edge features are SIi (i=0:7).

As shown in fig. 4D, kirsch convolution feature ERp is calculated by equation 1, gaussian convolution feature ERc is calculated by equation 2,

in one embodiment, generating an eye feature vector corresponding to a face image based on the plurality of eye feature maps and the plurality of eye mask feature maps comprises: calculating an average characteristic value of the eye characteristic map according to the pixel value of each eye characteristic map; and calculating to obtain the eye feature vector of the face image based on the average feature value and each pixel value of the eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map.

In one embodiment, the ocular feature vector comprises an ocular ternary mode vector; the pixel values include a center pixel value and an edge pixel value; based on the average feature value and each pixel value of each eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map, the calculating an eye feature vector of the face image includes: based on the average characteristic value, the central pixel value and the edge pixel value of each eye characteristic image and the central pixel value and the edge pixel value of the eye mask characteristic image corresponding to the eye characteristic images, an eye ternary mode vector of the face image is obtained through calculation of a local direction ternary mode formula;

Wherein ELDTP _p Is eye ternary mode vector, mu is average characteristic value, SI _c 、SI _p Center pixel value and edge pixel value, ER, of the eye feature map, respectively _c 、ER _p The center pixel value and the edge pixel value of the eye mask feature map are respectively, and sigma is a conditional function.

Wherein the average eigenvalue is calculated by the formula 4, and the formula 4 is as follows:

in one embodiment, the ocular ternary pattern vector comprises an ocular ternary pattern vector and an ocular ternary pattern vector; the eye ternary mode vector of the face image is calculated by a local direction ternary mode formula and comprises the following steps:

when (when)

And calculating the eye lower ternary mode vector of the face image through a local direction ternary mode formula.

When (when)

And calculating to obtain the ternary mode vector on the eyes of the face image through a local direction ternary mode formula.

Wherein T is a preset threshold value, and the value of the preset threshold value is determined according to experience through repeated experiments. In this embodiment, T may be 5.

Similarly, the mask is used for carrying out convolution processing on the mouth feature images corresponding to each eye feature image, and the obtaining of the convolved mouth mask feature images comprises the following steps: carrying out convolution processing on each mouth feature image through a Kirsch operator to obtain edge features corresponding to the mouth feature images; carrying out convolution processing on each eye feature image through a second derivative Gaussian operator to obtain a center feature corresponding to the mouth feature image; and combining the edge features and the center features corresponding to the mouth feature map to form a convolved eye mask feature map.

In one embodiment, generating a mouth feature vector for a corresponding face image based on the plurality of mouth feature maps and the plurality of mouth mask feature maps comprises: calculating an average characteristic value of the mouth characteristic map according to the pixel value of each mouth characteristic map; and calculating to obtain the mouth feature vector of the face image based on the average feature value and each pixel value of the mouth feature map and each pixel value of the mouth mask feature map corresponding to the mouth feature map.

In one embodiment, the mouth feature vector comprises a mouth triplet pattern vector; the pixel values include a center pixel value and an edge pixel value; based on the average feature value and each pixel value of each mouth feature image and each pixel value of the mouth mask feature image corresponding to the mouth feature image, the calculating a mouth feature vector of the face image comprises:

based on the average characteristic value, the central pixel value and the edge pixel value of each mouth characteristic image and the central pixel value and the edge pixel value of the mouth mask characteristic image corresponding to the mouth characteristic image, a mouth ternary mode vector of the face image is obtained through calculation of a local direction ternary mode formula;

wherein,,

for the mouth triplet vector, " >

For the mean feature value of the mouth feature map, +.>

Center pixel value and edge pixel value of the mouth feature map, respectively, < >>

Center pixel value and edge pixel value of the mouth mask feature map, respectively, < >>

Is a conditional function. In this embodiment, <' > a->

As is the functional expression for sigma.

In one embodiment, the mouth triplet pattern vector includes an upper mouth triplet pattern vector and a lower mouth triplet pattern vector; the calculating the mouth ternary mode vector of the face image through the local direction ternary mode formula comprises the following steps:

when (when)

And calculating to obtain a ternary mode vector under the mouth of the face image through a local direction ternary mode formula.

When (when)

And calculating to obtain the ternary mode vector on the mouth of the face image through a local direction ternary mode formula.

Further, after the global feature vector, the eye ternary mode vector, the mouth ternary mode vector and the mouth ternary mode vector of each face image are obtained, normalization processing is carried out on the obtained feature vectors, feature fusion is further carried out on the normalized feature vectors, and fusion feature vectors are generated.

In one embodiment, the normalization processing method may be: and calculating standard deviation values of the feature vectors, and dividing each feature vector by the corresponding standard deviation value to obtain the normalized feature vector.

In the embodiment, an upper ternary mode vector and a lower ternary mode vector are respectively extracted from an eye feature map and a corresponding mouth feature map of a face image, so that a global feature vector, an eye upper ternary mode vector, an eye lower ternary mode vector, a mouth upper ternary mode vector and a mouth lower ternary mode vector of each face image are obtained; the face image is characterized by the multidimensional feature vector, so that richer information can be obtained aiming at a single feature extraction method, and the face image can be better expressed. Further, the global feature vector is extracted by adopting PCA, and although the PCA is used for reducing the dimension of the fused local feature, the robustness of PCA feature extraction is lower under the condition of uneven illumination; but the scheme also adopts the local direction ternary mode ELDTP to extract partial characteristics, improves the robustness of non-monotonic illumination and the identification capability of codes, and effectively solves the problem of lower robustness of PCA under the condition of uneven illumination.

It should be understood that, although the steps in the flowcharts of fig. 1-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1-4 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.

In one embodiment, as shown in fig. 5, there is provided an expression recognition apparatus 500 including: a face image acquisition module 502, a region image recognition module 504, a global feature extraction module 506, a partial feature extraction module 508, and an expression recognition module 510, wherein:

the facial image obtaining module 502 is configured to obtain a plurality of facial images representing facial expressions.

The region image recognition module 504 is configured to recognize an eye region and a mouth region of each face image, and obtain an eye region map and a mouth region map of the corresponding face image.

The global feature extraction module 506 is configured to perform global feature extraction on each face image, so as to obtain a corresponding global feature vector.

And the partial feature extraction module 508 is configured to perform partial feature extraction on the eye region map and the mouth region map corresponding to each face image, so as to obtain a corresponding eye feature vector and a corresponding mouth feature vector.

The expression recognition module 510 is configured to perform expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector, and obtain an expression category of the corresponding face image.

In one embodiment, the facial image acquisition module is further used for acquiring real-time facial expression images shot by the camera shooting assembly; combining the real-time facial expression image and the historical facial expression image to form a facial expression image set; the facial expression image set comprises a plurality of initial facial images which represent facial expressions and contain image backgrounds; and recognizing the face in the initial face image by a face feature point method to obtain a plurality of face images corresponding to the face expression image set.

In one embodiment, the partial feature extraction module is further configured to convert an eye region map and a mouth region map corresponding to each face image into a gray map, so as to obtain a corresponding eye region gray map and a corresponding mouth region gray map; extracting features of the eye region gray level map and the mouth region gray level map according to a preset sliding distance through a preset window to obtain a plurality of corresponding eye feature maps and a plurality of corresponding mouth feature maps; carrying out convolution processing on each eye feature image and the corresponding mouth feature image through a mask to obtain a plurality of convolved eye mask feature images and a plurality of mouth mask feature images; generating an eye feature vector corresponding to the face image based on the plurality of eye feature images and the plurality of eye mask feature images; based on the plurality of mouth feature images and the plurality of mouth mask feature images, mouth feature vectors corresponding to the face images are generated.

In one embodiment, the partial feature extraction module is further configured to perform convolution processing on each eye feature map by using a Kirsch operator to obtain a corresponding edge feature; carrying out convolution processing on each eye feature map through a second derivative Gaussian operator to obtain a corresponding center feature; and combining the edge features and the center features to form a convolved eye mask feature map.

In one embodiment, the partial feature extraction module is further configured to calculate an average feature value of the eye feature map according to the pixel value of each eye feature map; and calculating to obtain the eye feature vector of the face image based on the average feature value and each pixel value of the eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map.

In one embodiment, the partial feature extraction module is further configured to calculate, according to a partial direction ternary mode formula, an eye ternary mode vector of the face image based on an average feature value, a center pixel value, and an edge pixel value of each eye feature map, and a center pixel value and an edge pixel value of an eye mask feature map corresponding to the eye feature map;

wherein ELDTP _p Is three elements of eyesMode vector, μ is average eigenvalue, SI _c 、SI _p Center pixel value and edge pixel value, ER, of the eye feature map, respectively _c 、ER _p The center pixel value and the edge pixel value of the eye mask feature map are respectively, and sigma is a conditional function.

In one embodiment, the partial feature extraction module is further configured to:

when (when)

When the three-dimensional model vector is calculated by a local direction three-dimensional model formula, the three-dimensional model vector under the eyes of the face image is obtained; wherein T is a preset threshold.

When (when)

In the embodiment, based on a plurality of obtained face images representing facial expressions, since the region with the largest contribution to facial expression recognition in the face images comprises an eye region and a mouth region, the eye region and the mouth region in each face image are recognized, and therefore an eye region map and a mouth region map of the corresponding face image are obtained; further, global feature extraction is carried out on each face image, and global feature vectors corresponding to each face image are obtained; then, partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram of each face image, so as to obtain eye feature vectors and mouth feature vectors corresponding to each face image; and further carrying out expression recognition on the facial image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain a corresponding expression category. The facial expression type of the face image is identified through the global feature vector and the local feature vector of the face image, the facial expression is identified through multi-feature fusion, a large amount of sample data is not needed, the influence of the environment is avoided, and the low identification rate caused by single features is avoided. Therefore, the method of the scheme can provide higher recognition rate in real-time recognition, and can effectively reduce the calculated amount of the computer.

For specific limitations of the expression recognition apparatus, reference may be made to the above limitations of the expression recognition method, and the description thereof will not be repeated here. The above expression recognition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, near Field Communication (NFC) or other technologies. The computer program is executed by a processor to implement a method of expression recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of: acquiring a plurality of face images representing facial expressions; identifying an eye area and a mouth area of each face image to obtain an eye area image and a mouth area image of the corresponding face image; extracting global features of each face image to obtain corresponding global feature vectors; partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram corresponding to each face image, so as to obtain corresponding eye feature vectors and mouth feature vectors; and carrying out expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain the expression category of the corresponding face image.

In one embodiment, the processor when executing the computer program further performs the steps of: the step of obtaining a plurality of facial images representing facial expressions comprises the following steps: acquiring a real-time facial expression image shot by a shooting assembly; combining the real-time facial expression image and the historical facial expression image to form a facial expression image set; the facial expression image set comprises a plurality of initial facial images which represent facial expressions and contain image backgrounds; and recognizing the face in the initial face image by a face feature point method to obtain a plurality of face images corresponding to the face expression image set.

In one embodiment, the processor when executing the computer program further performs the steps of: partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram corresponding to each face image, and the obtaining of the corresponding eye feature vector and mouth feature vector comprises the following steps: converting the eye region image and the mouth region image corresponding to each face image into gray images to obtain corresponding eye region gray images and mouth region gray images; extracting features of the eye region gray level map and the mouth region gray level map according to a preset sliding distance through a preset window to obtain a plurality of corresponding eye feature maps and a plurality of corresponding mouth feature maps; carrying out convolution processing on each eye feature image and the corresponding mouth feature image through a mask to obtain a plurality of convolved eye mask feature images and a plurality of mouth mask feature images; generating an eye feature vector corresponding to the face image based on the plurality of eye feature images and the plurality of eye mask feature images; based on the plurality of mouth feature images and the plurality of mouth mask feature images, mouth feature vectors corresponding to the face images are generated.

In one embodiment, the processor when executing the computer program further performs the steps of: the mask comprises a Kirsch operator and a second derivative Gaussian operator; carrying out convolution processing on each eye feature map through a mask, and obtaining a plurality of convolved eye mask feature maps comprises the following steps: carrying out convolution processing on each eye feature map through a Kirsch operator to obtain corresponding edge features; carrying out convolution processing on each eye feature map through a second derivative Gaussian operator to obtain a corresponding center feature; and combining the edge features and the center features to form a convolved eye mask feature map.

In one embodiment, the processor when executing the computer program further performs the steps of: generating an eye feature vector corresponding to the face image based on the plurality of eye feature maps and the plurality of eye mask feature maps comprises: calculating an average characteristic value of the eye characteristic map according to the pixel value of each eye characteristic map; and calculating to obtain the eye feature vector of the face image based on the average feature value and each pixel value of the eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map.

In one embodiment, the processor when executing the computer program further performs the steps of: the eye feature vector comprises an eye ternary pattern vector; the pixel values include a center pixel value and an edge pixel value; based on the average feature value and each pixel value of each eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map, the calculating an eye feature vector of the face image includes: based on the average characteristic value, the central pixel value and the edge pixel value of each eye characteristic image and the central pixel value and the edge pixel value of the eye mask characteristic image corresponding to the eye characteristic images, an eye ternary mode vector of the face image is obtained through calculation of a local direction ternary mode formula;

In one embodiment, the processor when executing the computer program further performs the steps of: the eye ternary pattern vector comprises an eye upper ternary pattern vector and an eye lower ternary pattern vector; the eye ternary mode vector of the face image is calculated by a local direction ternary mode formula and comprises the following steps:

when (when)

When (when)

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a plurality of face images representing facial expressions; identifying an eye area and a mouth area of each face image to obtain an eye area image and a mouth area image of the corresponding face image; extracting global features of each face image to obtain corresponding global feature vectors; partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram corresponding to each face image, so as to obtain corresponding eye feature vectors and mouth feature vectors; and carrying out expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain the expression category of the corresponding face image.

In one embodiment, the computer program when executed by the processor further performs the steps of: the step of obtaining a plurality of facial images representing facial expressions comprises the following steps: acquiring a real-time facial expression image shot by a shooting assembly; combining the real-time facial expression image and the historical facial expression image to form a facial expression image set; the facial expression image set comprises a plurality of initial facial images which represent facial expressions and contain image backgrounds; and recognizing the face in the initial face image by a face feature point method to obtain a plurality of face images corresponding to the face expression image set.

In one embodiment, the computer program when executed by the processor further performs the steps of: partial feature extraction is respectively carried out on the eye region diagram and the mouth region diagram corresponding to each face image, and the obtaining of the corresponding eye feature vector and mouth feature vector comprises the following steps: converting the eye region image and the mouth region image corresponding to each face image into gray images to obtain corresponding eye region gray images and mouth region gray images; extracting features of the eye region gray level map and the mouth region gray level map according to a preset sliding distance through a preset window to obtain a plurality of corresponding eye feature maps and a plurality of corresponding mouth feature maps; carrying out convolution processing on each eye feature image and the corresponding mouth feature image through a mask to obtain a plurality of convolved eye mask feature images and a plurality of mouth mask feature images; generating an eye feature vector corresponding to the face image based on the plurality of eye feature images and the plurality of eye mask feature images; based on the plurality of mouth feature images and the plurality of mouth mask feature images, mouth feature vectors corresponding to the face images are generated.

In one embodiment, the computer program when executed by the processor further performs the steps of: the mask comprises a Kirsch operator and a second derivative Gaussian operator; carrying out convolution processing on each eye feature map through a mask, and obtaining a plurality of convolved eye mask feature maps comprises the following steps: carrying out convolution processing on each eye feature map through a Kirsch operator to obtain corresponding edge features; carrying out convolution processing on each eye feature map through a second derivative Gaussian operator to obtain a corresponding center feature; and combining the edge features and the center features to form a convolved eye mask feature map.

In one embodiment, the computer program when executed by the processor further performs the steps of: generating an eye feature vector corresponding to the face image based on the plurality of eye feature maps and the plurality of eye mask feature maps comprises: calculating an average characteristic value of the eye characteristic map according to the pixel value of each eye characteristic map; and calculating to obtain the eye feature vector of the face image based on the average feature value and each pixel value of the eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map.

In one embodiment, the computer program when executed by the processor further performs the steps of: the eye feature vector comprises an eye ternary pattern vector; the pixel values include a center pixel value and an edge pixel value; based on the average feature value and each pixel value of each eye feature map and each pixel value of the eye mask feature map corresponding to the eye feature map, the calculating an eye feature vector of the face image includes: based on the average characteristic value, the central pixel value and the edge pixel value of each eye characteristic image and the central pixel value and the edge pixel value of the eye mask characteristic image corresponding to the eye characteristic images, an eye ternary mode vector of the face image is obtained through calculation of a local direction ternary mode formula;

In one embodiment, the computer program when executed by the processor further performs the steps of: the eye ternary pattern vector comprises an eye upper ternary pattern vector and an eye lower ternary pattern vector; the eye ternary mode vector of the face image is calculated by a local direction ternary mode formula and comprises the following steps:

when (when)

When (when)

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, or the like. Volatile memory can include random access memory (RandomAccess Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM is available in a variety of forms, such as static random access memory (StaticRandomAccessMemory, SRAM) and dynamic random access memory (DynamicRandomAccessMemory, DRAM).

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. An expression recognition method, the method comprising:

acquiring a plurality of face images representing facial expressions;

performing expression recognition on the face image based on the global feature vector, the eye feature vector and the mouth feature vector to obtain an expression category corresponding to the face image;

The eye feature vector comprises an eye ternary pattern vector; calculating to obtain an eye ternary mode vector of the face image through a local direction ternary mode formula;

wherein ELDTP _p For the eye ternary mode vector, mu is the average of the eye feature mapCharacteristic value, SI _c 、SI _p Center pixel value and edge pixel value, ER, of the eye feature map, respectively _c 、ER _p Respectively a center pixel value and an edge pixel value of the eye mask feature map, wherein sigma is a conditional function; the eye feature map is obtained by extracting the eye region map.

2. The method of claim 1, wherein the acquiring a plurality of facial images representing facial expressions comprises:

acquiring a real-time facial expression image shot by a shooting assembly;

3. The method according to claim 1, wherein the performing partial feature extraction on the eye region map and the mouth region map corresponding to each face image respectively, to obtain corresponding eye feature vectors and mouth feature vectors includes:

4. A method according to claim 3, wherein the mask comprises a Kirsch operator and a second derivative gaussian operator; the step of carrying out convolution processing on each eye feature map through a mask to obtain a plurality of convolved eye mask feature maps comprises the following steps:

5. The method of claim 3, wherein the generating an eye feature vector corresponding to the face image based on the plurality of eye feature maps and the plurality of eye mask feature maps comprises:

6. The method of claim 5, wherein the ocular triplet pattern vector comprises an ocular triplet pattern vector and an ocular triplet pattern vector; the eye ternary mode vector of the face image obtained by calculation through the local direction ternary mode formula comprises the following steps:

When (when)

When the three-dimensional model vector is calculated according to the three-dimensional model formula of the local direction, the three-dimensional model vector of the human face image under the eyes is obtained; wherein T is a preset threshold;

when (when)

7. An expression recognition apparatus, the apparatus comprising:

the partial feature extraction module is used for respectively carrying out partial feature extraction on the eye region graph and the mouth region graph corresponding to each face image to obtain corresponding eye feature vectors and mouth feature vectors; the eye feature vector comprises an eye ternary pattern vector; calculating to obtain an eye ternary mode vector of the face image through a local direction ternary mode formula;

Wherein ELDTP _p For the eye ternary mode vector, mu is the average eigenvalue of the eye characteristic diagram, SI _c 、SI _p The center pixel value and the edge pixel value of the eye feature map, WR, respectively _c 、ER _p Center pixel value and edge of eye mask feature mapEdge pixel values, σ is a conditional function; the eye feature map is obtained by extracting the eye region map;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.