CN109344744B - Face micro-expression action unit detection method based on deep convolutional neural network - Google Patents

Face micro-expression action unit detection method based on deep convolutional neural network Download PDF

Info

Publication number
CN109344744B
CN109344744B CN201811076388.8A CN201811076388A CN109344744B CN 109344744 B CN109344744 B CN 109344744B CN 201811076388 A CN201811076388 A CN 201811076388A CN 109344744 B CN109344744 B CN 109344744B
Authority
CN
China
Prior art keywords
face
layer
neural network
action
action unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811076388.8A
Other languages
Chinese (zh)
Other versions
CN109344744A (en
Inventor
樊亚春
税午阳
邓擎琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201811076388.8A priority Critical patent/CN109344744B/en
Publication of CN109344744A publication Critical patent/CN109344744A/en
Application granted granted Critical
Publication of CN109344744B publication Critical patent/CN109344744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention discloses a face micro-expression action unit detection method based on a deep convolutional neural network, which comprises the following steps: step 1: designing a deep convolutional neural network structure; step 1.1: marking the rectangular areas of the face and different action units in the face; step 1.2: designing and implementing a deep convolutional neural network, wherein the neural network comprises a convolutional layer, a shortcut layer and an action unit detection layer so as to learn the regional information of a face and different expression action units of the face and acquire a network forward propagation parameter; step 1.3: taking sample data in the face sample data set as neural network input data; step 2: realizing the detection of the facial expression action unit according to the network parameters learned in the step 1; and step 3: and performing visual output according to the human face action unit detected in the step 2. The detection method provided by the invention relies on the deep convolutional neural network to detect and identify the action units in the face image, and the detection accuracy rate and speed can be improved.

Description

Face micro-expression action unit detection method based on deep convolutional neural network
Technical Field
The invention relates to the technical field of face recognition and emotion calculation, in particular to a face micro-expression action unit detection method based on a deep convolutional neural network.
Background
The human face micro expression is natural exposure of human mind emotion, compared with common expressions, the micro expression is not easy to perceive, and has the characteristics of small action amplitude and short retention time, and the mouth angle is tilted upwards to express the pleasure of the mind; the inadvertent tilting of the mouth corner at one side can hide the slight bamboo strip at the inner core; the upper lip is lifted by the lower lip, possibly hiding a little discontent; meanwhile, the micro expression recognition can also be applied to various service fields related to the man-machine interaction technology, such as automatic driving, entertainment, shopping and the like.
The automatic recognition of micro expressions not only relates to computer technology problems, but also relates to physiology and psychology, and american psychologist Paul Ekman generalizes micro expressions into combinations of different Action Units (AU), so that the detection of Action units is the basis of micro expression recognition, a human face Action Unit is richer and more subtle than other parts of the body and is difficult to detect and recognize, and Paul Ekman proposes a Facial Action Unit System (FACS) according to the accumulation of the micro expressions in psychology and physiology for many years, and different micro expressions of a human face can be decomposed into one or more different Action units, so that the recognition of the micro expressions of the human face can be realized according to the detection results of different Action units in the human face. The detection accuracy of the action units by using the traditional geometric shape feature method is very low, and the action units are often influenced by typical geometric structures such as human faces, mouths, noses, eyebrows and the like and are difficult to detect.
The human face micro-expression action unit depicts different actions of the forehead, eyebrow, eye, nose, cheek, mouth and jaw of the human face, forms an effect with different geometric structures in different local areas, can divide the action unit into an upper area, a middle area and a lower area according to the difference of the areas, wherein the upper area mainly takes the actions of the forehead, eyebrow and eye as the main part, the middle area mainly takes the action of the nose and cheek as the main part, and the lower area mainly takes the action of the mouth and chin as the main part.
In view of this, the invention provides a method for detecting a facial micro-expression action unit based on a deep convolutional neural network.
Disclosure of Invention
The invention provides a human face micro-expression action unit detection method based on a deep convolutional neural network.
In order to achieve the purpose, the invention adopts the following technical scheme:
a face micro-expression action unit detection method based on a deep convolutional neural network comprises the following steps:
step 1: designing a deep convolutional neural network structure, taking a face sample data set as input, taking an automatically marked micro expression action unit as output, training the network structure, and learning appropriate network parameters;
step 1.1: marking a face and rectangular areas of different action units in the face according to images in the sample data set;
step 1.2: designing and implementing a deep convolutional neural network, wherein the neural network comprises a convolutional layer, a shortcut layer and an action unit detection layer so as to learn the regional information of a face and different expression action units of the face and acquire a network forward propagation parameter;
step 1.3: taking sample data in a face sample data set as neural network input data, wherein the sample data comprises face image data and action unit mark xml data;
step 2: according to the network parameters learned in the step 1, facial expression action unit detection is realized, an image to be detected is used as the input of the neural network in the step 1.2, the convolution layer and the detection layer of the input image are calculated by utilizing the network parameters, whether a face exists in the image or not is judged from the output of the detection layer, if no face exists, the effectiveness of an action unit does not exist, if a face exists, the area position identified by the action unit is corrected according to the face and the position relation of different action units, wherein the judgment threshold value of the probability value identified by the action unit is set to be 0.4, and the missing detection of the action unit with small intensity is avoided;
and step 3: performing visual output according to the face action unit detected in the step 2, and calculating and outputting the micro expression expressed by the face;
step 3.1: judging the action units contained in the input human face according to the probability value and the threshold range of each action unit in the detection layer in the step 2, reading the class names of the action units in the detection layer when the judgment threshold is larger than the probability value, calculating the absolute pixel positions of the action units on the image according to the human face position and the relative positions of the action units, drawing the absolute positions of the action units on the image by using a rectangular frame, and simultaneously drawing the names of the action units;
step 3.2: outputting the micro-expression state of the current face according to the combination of action units appearing in the face;
step 3.3: and outputting the micro-expression state of the human face according to the identification result of the action unit of the human face in the current image.
Further, in step 1.1, the face is marked by defining the local rectangular region positions of different action units and defining the rectangular region positions of the face according to the definitions of different action units and the muscle changes of the face on the basis of the calculation of the feature points of the face.
Further, step 1.1 comprises the steps of:
step 1.1.1: detecting a human face and the positions of characteristic points thereof according to a supervised descending method, numbering each characteristic point of the human face, wherein the contour points of the human face are numbered from the left upper contour of the human face as a starting point, then the characteristic points on the eyebrows and eyes from left to right are numbered, then the characteristic points of the nose bridge and the nose wing part are numbered, and finally all the characteristic points of the mouth part are numbered;
step 1.1.2: according to the positions of the characteristic points of the human face, defining a human face and an action unit area based on the positions of the characteristic points, wherein the action unit area can reflect the actions of the forehead, the eyebrow, the eyes, the nose, the cheeks, the mouth and the jaw of the face;
step 1.1.3: and calculating a face region as a sample region by using the positions of the feature points for model learning.
Further, in step 3.2, the micro-expression states of the face include happy, depressed, surprised, afraid, angry, dislike, and neutral expressions.
Further, in step 1.2, each convolution layer performs convolution operation on the previous layer of feature image through a group of convolution parameter templates, and obtains feature images with the same number as the convolution parameter templates as output layers, and the activation function of the convolution layer adopts a linear rectification function with leakage:
Figure GDA0003244275140000031
in the formula (1), x is an input value of the activation function,
Figure GDA0003244275140000032
is the output value of the activation function;
for the shortcut layer, in order to weaken the influence of the gradient disappearance problem in the backward propagation process, a shortcut layer is added between every two convolution layers, namely, an initial input is added into an output layer of the three convolution layers, and the formalization of the shortcut layer is described as the following formulas (2) and (3):
Figure GDA0003244275140000033
Figure GDA0003244275140000034
in the formula (2) and the formula (3),
Figure GDA0003244275140000035
convolution template parameters of the 3 rd, 2 nd and 1 st convolution layers, x is the input of the convolution layer,
Figure GDA0003244275140000036
outputting the shortcut layer output after the input is superposed for the three-layer convolution layer operation output.
Compared with the prior art, the invention has the following advantages:
1. according to the method for detecting the human face micro-expression action units based on the deep convolutional neural network, in the design of the deep neural network, besides the fact that the convolutional layer is used for learning the geometric characteristics of the bottom layer, the shortcut layer is used for solving the problem of network gradient dispersion, a plurality of detection layers with different scales are designed for learning different action unit classifications and detection parameters, the detection layers with the multiple scales are used for improving the detection accuracy, effective action unit omission is avoided, and the method for detecting the action units based on the deep neural network with high accuracy is realized;
2. according to the method for detecting the human face micro-expression action unit based on the deep convolutional neural network, the human face is aligned in the network parameter learning stage, and the human face does not need to be aligned in the identification stage, so that the detection efficiency is greatly improved.
Drawings
Fig. 1 is a schematic diagram of distribution of feature points of a human face in an embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments, it being understood that the embodiments and features of the embodiments of the present application can be combined with each other without conflict.
Example 1
A face micro-expression action unit detection method based on a deep convolutional neural network comprises the following steps:
step 1: designing a deep convolutional neural network structure, taking a face sample data set as input, taking an automatically marked micro expression action unit as output, training the network structure, and learning appropriate network parameters;
step 1.1: marking a face and rectangular areas of different action units in the face according to images in the sample data set;
step 1.2: designing and implementing a deep convolution neural network, wherein the neural network comprises convolution layers, a shortcut layer and an action unit detection layer, learning the region information of a face and different expression action units thereof to obtain network forward propagation parameters, each convolution layer performs convolution operation on a previous layer of feature images through a group of convolution parameter templates and obtains the feature images with the same number as the convolution parameter templates as an output layer, the activation function of the convolution layers adopts a leaked linear rectification function, wherein x is an input value of the activation function,
Figure GDA0003244275140000041
for the output value of the activation function:
Figure GDA0003244275140000042
for the shortcut layer, in order to weaken the influence of the gradient disappearance problem in the backward propagation process, a shortcut layer is added between every two convolution layers, namely, an initial input is added into an output layer of the three convolution layers, and the formalization of the shortcut layer is described as the following formulas (2) and (3):
Figure GDA0003244275140000051
Figure GDA0003244275140000052
in the formula (2) and the formula (3),
Figure GDA0003244275140000053
convolution template parameters of the 3 rd, 2 nd and 1 st convolution layers, x is the input of the convolution layer,
Figure GDA0003244275140000054
outputting the shortcut layer output after the input is superposed for the three-layer convolution layer operation output.
The detection layer is the output of the detection result of the action unit of the method, and is different from most convolution networks in that the method does not set a full connection layer for characteristic classification, the output of the last convolution layer is used as the input of the detection layer, the activation function of the detection layer selects a Logistic function, the output result classifies a total of seventy-five neurons according to the action unit, wherein the first neuron represents whether the pixel point position of the corresponding characteristic image detects the human face, if so, the detection result is 1, and if not, the detection result is 0; next, absolute position information of the human face on the image is obtained, wherein the absolute position information comprises coordinates of a top left vertex and the length and the width of a rectangular area; the rest seventy neurons are divided into fourteen parts, the information of fourteen action units is recorded respectively, each action unit records the detected probability value and the relation information of the probability value and the face position, wherein the position information is the relative value of the horizontal and vertical coordinate offset length relative to the upper left point of the face area and the face length and width and the length and width ratio relative to the face area respectively.
In the method, the number of the neural network convolution layers and the shortcut layers can be set as much as possible on the basis of being supported by hardware calculation, namely, the network depth is not limited, the detection layer is only set as one layer of network output, in order to improve the detection accuracy of an action unit, the detection layer can be set into two layers, and the convolution layers and the shortcut layers are spaced between the two layers, so that the detection layer setting with multiple scales is formed, and the network hierarchy scheme is set as follows: the method comprises the steps that fifteen turns of convolution layers and shortcut layers are counted, then three convolution layers and one detection layer are added for outputting, then the shortcut layer is set after the last turn of the nearest convolution layer, the output of one shortcut layer of three convolution layers of four turns is carried out, then the three convolution layers and the detection layer are added, and the sampling interval of the convolution layers and the size of a filter are set according to requirements;
step 1.3: taking sample data in a face sample data set as neural network input data, wherein the sample data comprises face image data and action unit mark xml data, the marked action data is used for carrying out iterative correction on result data predicted by the network, and a loss function has great influence on the convergence speed of an iteration result;
step 2: according to the network parameters learned in the step 1, facial expression action unit detection is realized, an image to be detected is used as the input of the neural network in the step 1.2, the convolution layer and the detection layer of the input image are calculated by utilizing the network parameters, whether a face exists in the image or not is judged from the output of the detection layer, if no face exists, the effectiveness of an action unit does not exist, if a face exists, the area position identified by the action unit is corrected according to the face and the position relation of different action units, in the embodiment, the probability value judgment threshold value identified by the action unit is set to be 0.4, and the missing detection of the action unit with small intensity is avoided;
and step 3: performing visual output according to the face action unit detected in the step 2, and calculating and outputting the micro expression expressed by the face;
step 3.1: judging the action units contained in the input human face according to the probability value and the threshold range of each action unit in the detection layer in the step 2, reading the class names of the action units in the detection layer when the judgment threshold is larger than the probability value, calculating the absolute pixel positions of the action units on the image according to the human face position and the relative positions of the action units, drawing the absolute positions of the action units on the image by using a rectangular frame, and simultaneously drawing the names of the action units;
step 3.2: outputting the micro-expression state of the current face according to the combination of action units appearing in the face;
step 3.3: and outputting the micro-expression state of the human face according to the identification result of the action unit of the human face in the current image.
In step 1.1 of this embodiment, to achieve the purpose of automatic labeling, based on the calculation of the human face feature point, the local rectangular region positions of different action units are defined according to the definitions of different action units and the changes of human face muscles, and the positions of the human face rectangular regions are defined at the same time, which is specifically implemented by the following method: firstly, using supervised descending method to detect human face and its characteristic point position, in the method 66 human face characteristic points are detected, as shown in fig. 1, the characteristic point distribution of human face and the number of each characteristic point are set, using left upper contour as starting point, firstly numbering human face contour points, then numbering characteristic points on eyebrow and eye from left to right, then numbering characteristic points of nose bridge and nose wing part, finally numbering all characteristic points of mouth, secondly, using human face characteristic point position to define human face and action unit region based on characteristic point position, calculating human face region and 14 action unit regions of sample image, 14 action units are all from Ackman action unit description of corresponding human face, respectively AU1, AU2, AU4, AU5, AU6, AU7, AU9, AU12, AU15, AU17, AU20, AU23, AU25, AU26, reflecting face, brow, eye, cheek, nose, mouth and mouth, The jaw, the motion of each part, according to the local area shape change of the face that each action unit activity involves, and the position of the characteristic point define the face action unit shape area as follows:
the action unit described in AU1 is characterized by upward tilting of the middle eyebrow, and includes not only the main action of the eyebrow but also the variation of forehead wrinkles, and its local area is defined as follows: the X coordinate of the feature point No. 17 is taken as the X coordinate of the top left vertex of the local rectangular region, the Y coordinate of the feature point No. 19 is taken as the Y coordinate of the top left vertex, the X coordinate of the feature point No. 26 is taken as the X coordinate of the bottom right vertex of the rectangular region, and the Y coordinate of the feature point No. 41 is taken as the Y coordinate of the bottom right vertex of the rectangular region.
The action unit described in AU2 is mainly characterized by pulling the outer part of the eyebrow upwards, and has a local area similar to AU1, and the definition method of the two areas is the same.
The action units described in AU4 are mainly for lowering eyebrows, and the local areas are defined as follows: the X coordinate of the feature point No. 17 is taken as the X coordinate of the upper left vertex of the rectangular area, the difference between the Y coordinates of the feature point No. 30 and the feature point No. 27 shifted to the left by the Y coordinate of the feature point No. 19 is taken as the Y coordinate of the upper left vertex of the rectangular area, the X coordinate of the feature point No. 26 is taken as the X coordinate of the lower right vertex of the rectangular area, and the Y coordinate of the feature point No. 41 is taken as the Y coordinate of the lower right vertex of the rectangular area.
The action unit described in AU5 is mainly characterized by widening of eyelid fissure, and its local area is mainly the area around the eye, and is defined as follows: and taking the X coordinate of the No. 17 characteristic point as the X coordinate of the upper left vertex of the rectangular area, taking the Y coordinate of the No. 19 characteristic point to move up by two pixels as the Y coordinate of the upper left vertex of the rectangular area, taking the X coordinate of the No. 26 characteristic point as the X coordinate of the lower right vertex of the rectangular area, and taking the Y coordinate of the No. 41 characteristic point to move right by five pixels as the Y coordinate of the lower right vertex of the rectangular area.
The action units described in AU6 are mainly based on eye contraction and pull the skin from temples and cheeks to the eyes, with local areas defined as follows: the X coordinate of the feature point No. 0 is taken as the X coordinate of the upper left vertex of the rectangular region, the Y coordinate of the feature point No. 19 is taken as the Y coordinate of the upper left vertex of the rectangular region, the X coordinate of the feature point No. 16 is taken as the X coordinate of the lower right vertex of the rectangular region, and the Y coordinate of the feature point No. 33 is taken as the Y coordinate of the lower right vertex of the rectangular region.
The action unit described in AU7 is mainly based on eyelid variation, and forms eyelid covering area variation of upper eyelid and lower eyelid, and its local area is defined as follows: and taking the X coordinate of the No. 17 feature point as the X coordinate of the upper left vertex of the rectangular area, upwards moving five pixel points by the Y coordinate of the No. 38 feature point as the Y coordinate of the upper left vertex of the rectangular area, taking the X coordinate of the No. 26 feature point as the X coordinate of the lower right vertex of the rectangular area, and downwards moving five pixel points by the Y coordinate of the No. 40 feature point as the Y coordinate of the lower right vertex of the rectangular area.
AU9 describes an action unit that is primarily nasal-crinkling, pulling the skin up the nasal root along both sides of the nose, forming folds on both sides of the nose and across the nasal root, and causing the fold in the meibomian folds, with local areas defined as follows: the X coordinate of the No. 36 feature point is the X coordinate of the upper left vertex of the rectangular region, the Y coordinate of the No. 22 feature point is the Y coordinate of the upper left vertex of the rectangular region, the X coordinate of the No. 45 feature point is the X coordinate of the lower right vertex of the rectangular region, and the Y coordinate of the No. 51 feature point is the Y coordinate of the lower right vertex of the rectangular region.
AU12 describes an action unit mainly comprising a mouth portion becoming an upper arc, resulting in deepening of nasolabial folds and lifting of lower triangular space, and its local area is defined as follows: the X coordinate of the feature point No. 2 is the X coordinate of the upper left vertex of the rectangular region, the Y coordinate of the feature point No. 28 is the Y coordinate of the upper left vertex of the rectangular region, the X coordinate of the feature point No. 14 is the X coordinate of the lower right vertex of the rectangular region, and the Y coordinate of the feature point No. 57 is the Y coordinate of the lower right vertex of the rectangular region.
The action units described in AU15 are dominated by lip stretching and mouth corner sinking, while the skin action changes of the skin under the pulling of the mouth, with local areas defined as follows: and taking the X coordinate of the No. 4 characteristic point as the X coordinate of the upper left vertex of the rectangular area, the Y coordinate of the No. 3 characteristic point as the Y coordinate of the upper left vertex of the rectangular area, the X coordinate of the No. 12 characteristic point as the X coordinate of the lower right vertex of the rectangular area, and the Y coordinate of the No. 5 characteristic point as the Y coordinate of the lower right vertex of the rectangular area.
The action unit described in AU17 is mainly characterized by the fact that the lower lip is facing upwards, which causes the chin to wrinkle, and the local area is defined as follows: and taking the X coordinate of the No. 4 characteristic point as the X coordinate of the upper left vertex of the rectangular area, the Y coordinate of the No. 3 characteristic point as the Y coordinate of the upper left vertex of the rectangular area, the X coordinate of the No. 12 characteristic point as the X coordinate of the lower right vertex of the rectangular area, and the Y coordinate of the No. 5 characteristic point as the Y coordinate of the lower right vertex of the rectangular area.
The action units described in AU20 are mainly stretched laterally of the lips, the mouth is mainly elongated, the mouth is flat and extended, and results in pulling the skin over the corners of the mouth, with the local areas defined as follows: the X coordinate of the No. 3 characteristic point is taken as the X coordinate of the upper left vertex of the rectangular area, the Y coordinate of the No. 30 characteristic point is taken as the Y coordinate of the upper left vertex of the rectangular area, the X coordinate of the No. 13 characteristic point is taken as the X coordinate of the lower right vertex of the rectangular area, and the Y coordinate of the No. 10 characteristic point is taken as the Y coordinate of the lower right vertex of the rectangular area.
The action unit described in AU23 is a pinch lips, the local area of which is defined as follows: the X coordinate of the feature point No. 48 is shifted to the left by five pixel points as the X coordinate of the upper left vertex of the rectangular region, the Y coordinate of the feature point No. 33 is the Y coordinate of the upper left vertex of the rectangular region, the X coordinate of the feature point No. 54 is shifted to the right by five pixel points as the X coordinate of the lower right vertex of the rectangular region, and the Y coordinate of the feature point No. 10 is the Y coordinate of the lower right vertex of the rectangular region.
AU25 describes action units mainly based on labial separation and tooth/gum exposure, and its local areas are defined as follows: the X coordinate of the feature point No. 48 is the X coordinate of the upper left vertex of the rectangular region, the Y coordinate of the feature point No. 3 is the Y coordinate of the upper left vertex of the rectangular region, the X coordinate of the feature point No. 54 is the X coordinate of the lower right vertex of the rectangular region, and the Y coordinate of the feature point No. 5 is the Y coordinate of the lower right vertex of the rectangular region.
The action unit described in AU26 is mainly dominated by jaw relaxation, leading to labial separation and tooth separation, the local area of which is defined the same as AU25, and in order to learn a face area, which is calculated as a sample area using the feature point positions for model learning, the face area is defined as follows: the X coordinate of the No. 0 characteristic point is taken as the X coordinate of the upper left vertex of the rectangular area, the Y coordinate of the No. 19 characteristic point is moved upwards by the difference between the Y coordinate of the No. 33 characteristic point and the Y coordinate of the No. 27 characteristic point to be the Y coordinate of the upper left vertex of the rectangular area, the X coordinate of the No. 16 characteristic point is taken as the X coordinate of the lower right vertex of the rectangular area, the Y coordinate of the No. 8 characteristic point is taken as the Y coordinate of the lower right vertex of the rectangular area, the marks of the local areas of different action units are in a full automatic mode, and the marking result in each image is stored in an xml file, wherein the file not only comprises the names and area coordinates of the action units in the image, but also comprises the area coordinates of the human face.
In step 3.2 of this example, the micro-expression states are divided into seven, respectively happy, depressed, surprised, afraid, angry, disgust and neutral expressions:
the evaluation criterion for happy expressions is that the action units on the face must include AU12, and in combination with AU6, the intensity is large.
The evaluation criterion for the depressed expressions is that the action unit on the human face must include AU15 and one of AU1 or AU4, and the inclusion is strong.
The evaluation criterion for the surprise expression is that the action units on the face must include AU26, and must include one of AU1, AU2 or AU5, and the inclusion is strong.
The evaluation criterion for fear of expression is that the action units on the face must include one of AU20 or AU26, and must include one of AU1 or AU2 or AU4 or AU5 or AU7, and the intensity is high when the action units are included.
The evaluation criterion of the anger expression is that the action units on the face must include AU23 and must include one of AU4, AU5 or AU7, and the intensity is high if the action units are included.
The evaluation criterion for aversive expression is that for fear expression, the action unit on the face must include AU9 and must include one of AU15 or AU16, and the intensity is large when the action unit is included.
In the case of not the above six expressions, the neutral expression is assumed.
The present invention is not limited to the above-described embodiments, which are described in the specification and illustrated only for illustrating the principle of the present invention, but various changes and modifications may be made within the scope of the present invention as claimed without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims.

Claims (5)

1. The method for detecting the human face micro-expression action unit based on the deep convolutional neural network is characterized by comprising the following steps of:
step 1: designing a deep convolutional neural network structure, taking a face sample data set as input, taking an automatically marked micro expression action unit as output, training the network structure, and learning appropriate network parameters;
step 1.1: marking a face and rectangular areas of different action units in the face according to images in the sample data set;
step 1.2: designing and implementing a deep convolutional neural network, wherein the neural network comprises a convolutional layer, a shortcut layer and an action unit detection layer so as to learn the regional information of a face and different expression action units of the face and acquire a network forward propagation parameter;
step 1.3: taking sample data in a face sample data set as neural network input data, wherein the sample data comprises face image data and action unit mark xml data;
step 2: according to the network parameters learned in the step 1, realizing the detection of the facial expression action units, taking the image to be detected as the input of the neural network in the step 1.2, calculating a convolution layer and a detection layer of the input image by using the network parameters, judging whether a face exists in the image or not from the output of the detection layer, if no face exists, the effectiveness of the action units does not exist, and if the face exists, correcting the area positions identified by the action units according to the face and the position relationship of different action units;
and step 3: performing visual output according to the face action unit detected in the step 2, and calculating and outputting the micro expression expressed by the face;
step 3.1: judging the action units contained in the input human face according to the probability value and the threshold range of each action unit in the detection layer in the step 2, reading the class names of the action units in the detection layer when the judgment threshold is larger than the probability value, calculating the absolute pixel positions of the action units on the image according to the human face position and the relative positions of the action units, drawing the absolute positions of the action units on the image by using a rectangular frame, and simultaneously drawing the names of the action units;
step 3.2: outputting the micro-expression state of the current face according to the combination of action units appearing in the face;
step 3.3: and outputting the micro-expression state of the human face according to the identification result of the action unit of the human face in the current image.
2. The method for detecting the facial micro-expression action units based on the deep convolutional neural network as claimed in claim 1, wherein in step 1.1, the marking of the face is realized by defining the positions of the local rectangular regions of different action units and defining the positions of the rectangular regions of the face according to the definitions of different action units and the change of facial muscles based on the calculation of the characteristic points of the face.
3. The method for detecting the facial micro-expression action unit based on the deep convolutional neural network as claimed in claim 1, wherein the step 1.1 comprises the following steps:
step 1.1.1: detecting the human face and the positions of the characteristic points thereof according to a supervised descending method, and numbering each characteristic point of the human face;
step 1.1.2: according to the positions of the characteristic points of the human face, defining a human face and an action unit area based on the positions of the characteristic points, wherein the action unit area can reflect the actions of the forehead, the eyebrow, the eyes, the nose, the cheeks, the mouth and the jaw of the face;
step 1.1.3: and calculating a face region as a sample region by using the positions of the feature points for model learning.
4. The method for detecting the facial micro expression action unit based on the deep convolutional neural network of claim 1, wherein in step 3.2, the micro expression states of the face comprise happy, depressed, surprised, afraid, angry, disgust and neutral expressions.
5. The method for detecting the facial micro-expression action unit based on the deep convolutional neural network as claimed in claim 1, wherein in step 1.2, each convolutional layer performs convolutional operation on the previous layer of feature images through a group of convolutional parameter templates, and obtains the feature images with the same number as the convolutional parameter templates as the number of the feature images as output layers, and the activation function of the convolutional layer adopts a leaky linear rectification function:
Figure FDA0003244275130000021
in the formula (1), x is an input value of the activation function,
Figure FDA0003244275130000022
is the output value of the activation function;
for the shortcut layer, in order to weaken the influence of the gradient disappearance problem in the backward propagation process, a shortcut layer is added between every two convolution layers, namely, an initial input is added into an output layer of the three convolution layers, and the formalization of the shortcut layer is described as the following formulas (2) and (3):
Figure FDA0003244275130000023
Figure FDA0003244275130000024
in the formula (2) and the formula (3),
Figure FDA0003244275130000025
convolution template parameters of the 3 rd, 2 nd and 1 st convolution layers, x is the input of the convolution layer,
Figure FDA0003244275130000026
outputting the shortcut layer output after the input is superposed for the three-layer convolution layer operation output.
CN201811076388.8A 2018-09-14 2018-09-14 Face micro-expression action unit detection method based on deep convolutional neural network Active CN109344744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811076388.8A CN109344744B (en) 2018-09-14 2018-09-14 Face micro-expression action unit detection method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811076388.8A CN109344744B (en) 2018-09-14 2018-09-14 Face micro-expression action unit detection method based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN109344744A CN109344744A (en) 2019-02-15
CN109344744B true CN109344744B (en) 2021-10-29

Family

ID=65305306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811076388.8A Active CN109344744B (en) 2018-09-14 2018-09-14 Face micro-expression action unit detection method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN109344744B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902656B (en) * 2019-03-12 2020-10-23 吉林大学 Method and system for identifying facial action unit
CN110147822B (en) * 2019-04-16 2021-04-02 北京师范大学 Emotion index calculation method based on face action unit detection
CN111081375B (en) * 2019-12-27 2023-04-18 北京深测科技有限公司 Early warning method and system for health monitoring
CN111209867A (en) * 2020-01-08 2020-05-29 上海商汤临港智能科技有限公司 Expression recognition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654049A (en) * 2015-12-29 2016-06-08 中国科学院深圳先进技术研究院 Facial expression recognition method and device
CN107273876A (en) * 2017-07-18 2017-10-20 山东大学 A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning
CN107679526A (en) * 2017-11-14 2018-02-09 北京科技大学 A kind of micro- expression recognition method of face
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10515393B2 (en) * 2016-06-30 2019-12-24 Paypal, Inc. Image data detection for micro-expression analysis and targeted data services

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654049A (en) * 2015-12-29 2016-06-08 中国科学院深圳先进技术研究院 Facial expression recognition method and device
CN107273876A (en) * 2017-07-18 2017-10-20 山东大学 A kind of micro- expression automatic identifying method of ' the grand micro- transformation models of to ' based on deep learning
CN107679526A (en) * 2017-11-14 2018-02-09 北京科技大学 A kind of micro- expression recognition method of face
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks

Also Published As

Publication number Publication date
CN109344744A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN109344744B (en) Face micro-expression action unit detection method based on deep convolutional neural network
CN104331151B (en) Gesture motion direction recognizing method based on optical flow method
Tian et al. Recognizing lower face action units for facial expression analysis
Tian et al. Recognizing action units for facial expression analysis
Pantic et al. Expert system for automatic analysis of facial expressions
JP6788264B2 (en) Facial expression recognition method, facial expression recognition device, computer program and advertisement management system
CN107194347A (en) A kind of method that micro- expression detection is carried out based on Facial Action Coding System
Cid et al. A real time and robust facial expression recognition and imitation approach for affective human-robot interaction using gabor filtering
CN106934375A (en) The facial expression recognizing method of distinguished point based movement locus description
CN108805216A (en) Face image processing process based on depth Fusion Features
Sarode et al. Facial expression recognition
CN112800903A (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN113657168B (en) Student learning emotion recognition method based on convolutional neural network
CN113807265B (en) Diversified human face image synthesis method and system
KR20180093632A (en) Method and apparatus of recognizing facial expression base on multi-modal
CN112949560B (en) Method for identifying continuous expression change of long video expression interval under two-channel feature fusion
Huang et al. Robust symbolic dual-view facial expression recognition with skin wrinkles: local versus global approach
KR20130015958A (en) Emotion recognition apparatus using facial expression, emotion recognition method using the same, and recording medium thereof
CN109360179A (en) A kind of image interfusion method, device and readable storage medium storing program for executing
CN114758382A (en) Face AU detection model establishing method and application based on adaptive patch learning
CN112990007B (en) Facial expression recognition method and system based on regional grouping and internal association fusion
CN110598719A (en) Method for automatically generating face image according to visual attribute description
Khemakhem et al. Facial expression recognition using convolution neural network enhancing with pre-processing stages
CN107066928A (en) A kind of pedestrian detection method and system based on grader
Liu et al. Automatic facial expression recognition based on local binary patterns of local areas

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant