CN112633153A - Facial expression motion unit identification method based on space-time graph convolutional network - Google Patents

Facial expression motion unit identification method based on space-time graph convolutional network Download PDF

Info

Publication number
CN112633153A
CN112633153A CN202011528440.6A CN202011528440A CN112633153A CN 112633153 A CN112633153 A CN 112633153A CN 202011528440 A CN202011528440 A CN 202011528440A CN 112633153 A CN112633153 A CN 112633153A
Authority
CN
China
Prior art keywords
time
space
graph
relation
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011528440.6A
Other languages
Chinese (zh)
Inventor
刘志磊
张庆阳
董威龙
陈浩阳
都景舜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011528440.6A priority Critical patent/CN112633153A/en
Publication of CN112633153A publication Critical patent/CN112633153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a facial expression motion unit identification method based on a space-time graph convolution network. The invention applies a space-time graph convolution network to identify the face motion unit, models the space-time dependency relationship between AUs by using a directionless space-time graph model, and learns AU depth representation characteristics by using the space-time graph convolution network so as to improve the accuracy of AU identification. The method can effectively solve the problems of poor robustness, low accuracy and the like of an AU detection model, and can be widely applied to expression analysis, emotion calculation and man-machine interaction.

Description

Facial expression motion unit identification method based on space-time graph convolutional network
Technical Field
The invention relates to the technical field of computer vision and emotion calculation, in particular to human facial expression motion unit (AU) recognition based on a space-time Graph Convolutional network (ST-GCN).
Background
Facial expressions can reveal a person's mental activities, mental states, and social behaviors communicated outward. With the development of artificial intelligence, human-centered facial expression recognition has gradually received widespread attention from the industry and academia. Expression analysis using a facial motion coding system is one of the common methods of facial expression recognition.
The Facial motion Coding System (FACS) anatomically divides a human face into 44 Facial motion units according to the motion of muscles, and is used for representing the motion of muscles in different areas of the human face. For example AU7 indicates whether the eyelids are tightened, AU23 indicates whether the lips are tightened, etc. Compared with using six basic expressions (anger, disgust, fear, happiness, sadness and surprise), the AU-based facial expression description method is more objective, more granular, and can avoid annotation ambiguity introduced by subjective judgment of observers. The automatic AU detection based on the computer can accurately analyze the facial expression to further understand the individual emotion, and has good application prospect in the fields of driver fatigue detection, patient pain estimation, psychological research and the like.
With the development of deep learning related theories and technologies, the detection technology of AU also makes remarkable progress. However, many challenges still face to the AU detection, the data of the AU in the practical application scene has high complexity, and the factors such as the head posture, the shielding, the complex illumination and the like cause the obvious reduction of the performance of the AU identification model. In addition, due to individual differences in human race, skin color, age, gender and the like, the AU identification has huge intra-class differences, and the accuracy of the AU identification is also significantly influenced. Meanwhile, the marking of AU can be completed only by professional personnel spending more time, so that a data set for training can not meet the requirement of detection of each race in a complex scene far away, a data sample is small, and overfitting is easy to occur.
In the existing AU detection model, only the association relation between different AUs in the same time point is considered, but the correlation between space and time is ignored.
Disclosure of Invention
To overcome the deficiencies of the prior art, the present invention aims to utilize ST-GCN (space-time graph convolution model) for AU recognition.
The invention discloses a facial expression motion unit identification method based on a space-time graph convolutional network.
The method comprises the following specific steps:
first, feature extraction is performed on an AU local area by an auto encoder. The method comprises the steps of obtaining the center position of each AU for each frame image in an image frame sequence based on face key point (landmark) information, and dividing a region with the size of n x n according to the center position of the AU to serve as a local region where the corresponding AU is located. All local Regions (ROI) where AUs are located are input to a self-encoder (AEs) specific to each AU for encoding, thereby obtaining d0The dimension fully contains the depth representation of the AU information.
Next, an undirected spatio-temporal relationship graph of the AU sequences is constructed, thereby modeling the spatio-temporal relationship between AUs. Each node in the AU space-time relationship graph is composed of depth expression vectors of one AU extracted in the step (1), and the nodes in the AU relationship graph are connected according to the degree of mutual connection.
And (1) constructing a spatial relationship. And constructing a relation matrix M to represent the closeness degree of the correlation between the AUs by counting the co-occurrence probability of the AUs in the training set. Further, a threshold value h is set, and AUs with the association closeness degree larger than h are connected, so that the spatial adjacency relation of AUs is modeled.
And constructing a time relation. And setting a time threshold tau, and connecting nodes which have time intervals not exceeding tau image frames and belong to the same AU in the image frame sequence, thereby modeling the relation of AU nodes in time.
Finally, AU recognition is performed based on ST-GCN (space-time graph convolution model). And (3) carrying out multiple graph convolution operations on the AU sequence space-time relation graph constructed in the step (2) by using ST-GCN to obtain depth AU characteristic representation containing space and time information. And finally, classifying the depth AU feature representation through a fully-connected neural network to obtain an AU identification result.
Step (1) specifically, the input video is divided to extract an AU local area (ROI) on each frame image. Firstly, taking each AU key point on each frame image as a center, and extracting a region with the size of n x n as a local region where the AU is located. Then, each extracted AU partial area is sent to a separate self-encoder (AEs) for encoding to obtain a feature vector containing specific AU related information. In the self-encoder learning process of each AU local region, the following two loss functions are used for constraint.
The first is the pixel level reconstruction loss function LR
Figure BDA0002851518930000021
Where n is the size of each AU ROI, IGTIs the true value of AU ROI, IRIs the reconstructed AU ROI image.
Second is ROI level multi-label AU detection loss function
Figure BDA0002851518930000022
Figure BDA0002851518930000023
Where C is the number of categories, R is the number of ROIs acquired in the previous step, YROIE {0,1}, and R × C is the true value of the AU label.
Figure BDA0002851518930000031
Indicating that AU j is not active in AU ROI i,
Figure BDA0002851518930000032
indicating that AUj is active in AUROI i. For measuring whether the current ROI contains a particular AU.
Finally, the two loss functions are combined by using a trade-off parameter lambda, and the loss function finally used for AU depth representation extraction is obtained as follows:
LROI=LROI_softmax1LR
and (2) further constructing an undirected spatiotemporal relationship graph of the AU sequence. The neighbor relations in the AU sequence spatio-temporal relation graph are divided into three categories: the neighbor relation between the AU node and the AU node, the spatial neighbor relation and the temporal neighbor relation. For a certain node vtiIts neighbor set B (v)ti) The definition is as follows:
Figure BDA0002851518930000033
vtiis the ith node of the t frame, B (v)ti) Is v istiThe node is a neighbor set of nodes, d (x, y) refers to the co-occurrence probability between two nodes in the same frame, | x-y | is the time interval distance between two nodes, and K and Γ are the threshold values of the co-occurrence probability and the time distance respectively. K and gamma serve as hyper-parameters of the model, and proper values are selected through training. Finally, a space-time relation adjacency matrix of the AU is obtained, each node in the relation graph represents an AU characteristic vector, and adjacent nodes represent AUs closely related to the current AU node in time or space. The space-time relationship diagram of the AU can be represented by the AU space-time relationship adjacency matrix.
Step (3) specifically, a graph convolution operation is performed on the AU spatio-temporal relationship graph constructed in step (3) using the ST-GCN model. Namely, graph convolution operation is respectively carried out on each node in the spatio-temporal relationship graph. In this step, in order to facilitate the convolution operation, the neighbor set is divided into different subsets, and a mapping function l of a single frame is defined first:
l:B(vti)→{0,1,…,K-1}
mapping function I will vtiThe neighbor set B of (a) is mapped into K different subsets, each subset corresponding to a tag number. To compute subsets in the time dimension simultaneously, define lSTThe mapping function:
Figure BDA0002851518930000034
i.e. adding the time interval distance, l, to the original single-frame subset labelSTThe parameters of (1) are the current node and the neighbor node to be mapped, respectively.
The formula for the graph convolution operation is as follows:
Figure BDA0002851518930000035
fin(*),fout(x) the convolved input eigenvalue and output eigenvalue of the node, respectively; zti(vtj)=|{vtk|lti(vtk)=lti(vtj) I.e. vtjThe subset-to-subset potential, used here as a regularization term to balance the effect of different subset sizes on the result; w (, x) is the own weight function of each neighbor subset, and is obtained through learning.
Then, convolution operation is carried out on the adjacent matrixes of the time-space relational graph by using the formula, and after the graph convolution operation is carried out repeatedly, the depth feature vectors can fully represent the relation among all AUs and the features of the AUs.
And finally, classifying the vectors represented by the depth features by using a full convolution neural network to obtain the classification result of AU identification. The invention has the beneficial effects that the invention adopts a space-time graph convolution network to identify the facial motion unit, models the space-time dependence relationship between AUs by using a directionless space-time graph model, and learns AU depth representation characteristics by using the space-time graph convolution network so as to improve the accuracy of AU identification. The method can effectively solve the problems of poor robustness, low accuracy and the like of an AU detection model, and can be widely applied to expression analysis, emotion calculation, man-machine interaction and the like.
1. The biggest innovation of the method is that the spatio-temporal graph convolution network model is applied to the expression motion unit (AU) recognition, and the AU recognition can be carried out based on the image frame sequence. Compared with an AU identification method based on a single frame, the method considers the space-time relationship between AUs and has important research and application values;
2. the AU recognition algorithm constructed by the artificial intelligence frontier technology-deep learning method can realize AU detection through modeling of the time-space relation of the facial motion unit, and provides an important theoretical basis and a technical platform for facial expression recognition in the field of artificial intelligence.
3. The method can be simultaneously applied to the research of emotion calculation, man-machine interaction and facial expression recognition.
Drawings
FIG. 1 is a diagram illustrating the steps of the present invention.
Fig. 2 depth Autoencoder (AEs) model.
Fig. 3 is a diagram of the final effect of the present invention.
Detailed Description
The invention extracts the characteristics of the face motion unit AU area through the self-encoder, then constructs the space-time relationship graph of the AU sequence based on the AU space-time relationship, finally uses the space-time graph convolution network model to carry out graph convolution operation on the AU space-time relationship graph, and uses the full-connection network to carry out AU identification so as to detect the occurrence condition and the strength of the AU.
The method comprises the following specific steps:
first, an input image frame sequence is sliced and an AU local area (ROI) in each frame image is extracted. The extraction of depth features is performed on the key Regions (ROIs) of the face AU using a self-encoder.
And then, taking the depth expression vector of the AU extracted in the previous step as a node, and constructing an undirected spatiotemporal relationship graph of the AU sequence. The nodes are connected in space and time according to the contact degree of the nodes, and the space-time relation between the nodes is modeled. And constructing a relation matrix M to express the closeness degree of the association between the AUs by counting the conditional probability appearing between the AUs in the training set. Further, a threshold value h is set, and AU nodes with the association closeness degree larger than h are connected. And setting a time threshold tau, and connecting nodes which have time intervals not exceeding tau and belong to the same AU.
Finally, AU recognition is performed based on ST-GCN (space-time graph convolution model). And performing multiple graph convolution operations on an undirected space-time relation graph of an AU sequence by using ST-GCN to obtain a depth AU characteristic representation containing space and time information, and classifying the depth AU characteristic representation through a fully-connected neural network to obtain an AU identification result.
The present invention will be described in further detail with reference to the accompanying drawings and specific examples.
The specific implementation steps of the invention are shown in fig. 1, and mainly comprise the following three steps:
(1) AU local region depth representation feature extraction based on convolution self-encoder:
the input image frame sequence is divided to extract AU local Regions (ROIs) on each frame image. Firstly, taking each AU key point on each frame image as a center, and extracting a region with the size of n x n as a local region where the AU is located. Then, each extracted AU partial area is sent to a separate self-encoder (AEs) for encoding to obtain a feature vector containing specific AU related information. In the self-encoder learning process of each AU local region, the following two loss functions are used for constraint.
The first is a reconstruction loss function L at the pixel levelR
Figure BDA0002851518930000051
Where n is the size of each AU ROI, IGTIs the true value of AU ROI, IRIs the reconstructed AU ROI image.
The second is a multi-label AU detection loss function at ROI level:
Figure BDA0002851518930000052
where C is the number of categories, R is the number of ROIs acquired in the previous step, YROIE {0,1}, and R × C is the true value of the AU label.
Figure BDA0002851518930000053
Indicating that AU j is not active in AU ROI i,
Figure BDA0002851518930000054
indicating that AUj is active in AUROI i. For measuring whether the current ROI contains a particular AU.
Finally, the two loss functions are combined by using a trade-off parameter lambda, and the loss function finally used for AU depth representation extraction is obtained as follows:
LROI=LROI_softmax1LR
(2) and (3) construction of an AU space-time relation graph model:
modeling the spatio-temporal relationship of the AU by using a undirected graph model, wherein each node in the AU spatio-temporal relationship graph is composed of one AU depth representation vector in (1). The neighborhood in the spatio-temporal relationship diagram of AU sequences is divided into three categories: the neighbor relation between the AU node and the AU node, the spatial neighbor relation and the temporal neighbor relation. For a certain node vtiIts neighbor set B (v)ti) The definition is as follows:
Figure BDA0002851518930000055
vtiis the ith node of the t frame, B (v)ti) Is v istiThe node is a neighbor set of nodes, d (x, y) refers to the co-occurrence probability between two nodes in the same frame, | x-y | is the time interval distance between two nodes, and K and Γ are the threshold values of the co-occurrence probability and the time distance respectively. K and gamma serve as hyper-parameters of the model, and proper values are selected through training. Finally, a space-time relation adjacency matrix of the AU is obtained, and each node in the relation graph represents one AU characteristic directionVolume, the neighboring nodes represent AUs that have an affinity in time or space with the current AU node. The space-time relationship diagram of the AU can be represented by the AU space-time relationship adjacency matrix.
(3) AU identification based on space-time graph convolutional network
And performing graph convolution operation on the AU space-time relation graph constructed in the previous step through a space-time graph convolution model. Namely, graph convolution operation is respectively carried out on each node in the spatio-temporal relationship graph. In this step, in order to facilitate the convolution operation, the neighbor set is divided into different subsets, and a mapping function l of a single frame is defined first:
l:B(vti)→{0,1,…,K-1}
mapping function I will vtiThe neighbor set B of (a) is mapped into K different subsets, each subset corresponding to a tag number. To compute subsets in the time dimension simultaneously, define lSTThe mapping function:
Figure BDA0002851518930000061
i.e. adding the time interval distance, l, to the original single-frame subset labelSTThe parameters of (1) are the current node and the neighbor node to be mapped, respectively.
The formula for the graph convolution operation is as follows:
Figure BDA0002851518930000062
fin(*),fout(x) the convolved input eigenvalue and output eigenvalue of the node, respectively; zti(vtj)=|{vtk|lti(vtk)=lti(vtj) I.e. vtjThe subset-to-subset potential, used here as a regularization term to balance the effect of different subset sizes on the result; w (, x) is the own weight function of each neighbor subset, and is obtained through learning.
And then carrying out convolution operation on the adjacent matrixes of the time-space relationship graph by using the formula, and after the graph convolution operation is repeatedly carried out, the depth feature vectors can fully represent the connection among AUs and the features of the AUs.
And finally, classifying the vectors represented by the depth features by using a full convolution neural network to obtain the classification result of AU identification.
To summarize:
the invention provides a detection and extraction method of a facial expression unit based on space-time image convolution, which can be used for application of emotion recognition and the like by using AU (AU). The method comprises the steps of obtaining an AU center through a face key point, constructing a spatiotemporal relation graph of an AU sequence, and then identifying and extracting intensity values of the AU by using a spatiotemporal graph convolution model, so that the fast identification and detection of the AU can be realized, and the problems of emotion identification by using the AU and the like are solved. The method can be widely applied to the emotion recognition of people by machines in different scenes, so that different interactions can be made according to the emotion types of people, and the method has important value in popularization and application of emotion recognition interaction based on facial expressions.

Claims (5)

1. The facial expression motion unit identification method based on the space-time graph convolution network is characterized in that feature extraction is carried out on an AU region of a facial motion unit through a convolution self-encoder, then a space-time relation graph of an AU sequence is constructed according to the close degree of the space-time relation of the AU, and finally AU identification is carried out based on ST-GCN.
2. The facial expression motion unit identification method based on the space-time graph convolutional network as claimed in claim 1, characterized in that the specific steps are as follows:
1) feature extraction of AU local areas by the self-encoder:
acquiring the central position of each AU for each frame image in the image frame sequence based on the face key point information, and dividing an n x n area according to the central position of the AU as a local area where the corresponding AU is located;
all local areas where AUs are positioned are input into a self-encoder specific to each AU for encoding, so that d is obtained0Dimension fully contains depth representation of AU information;
2) constructing an undirected spatio-temporal relationship graph of AU sequences, thereby modeling the spatio-temporal relationship between AUs: each node in the AU space-time relationship graph is composed of depth expression vectors of one AU extracted in 1), and the nodes in the AU relationship graph are connected according to the degree of closeness of mutual connection;
constructing a spatial relationship: constructing a relationship matrix M to represent the association affinity degree between AUs by counting the co-occurrence probability of AUs in a training set, setting a threshold value h, and connecting AUs with the association affinity degree larger than h, thereby modeling the spatial adjacency relation of the AUs;
constructing a time relation: setting a time threshold tau, and connecting nodes which have time intervals not exceeding tau image frames and belong to the same AU in an image frame sequence, so as to model the relation of AU nodes in time;
3) AU identification is carried out based on an ST-GCN space-time diagram convolution model: performing multiple graph convolution operations on the AU sequence space-time relation graph constructed in the step 2) by using ST-GCN to obtain depth AU characteristic representation containing space and time information, and classifying the depth AU characteristic representation through a full-connection neural network to obtain an AU identification result.
3. The method of recognizing facial expression motion units of a space-time graph convolutional network as claimed in claim 2, wherein step 1) specifically divides the input video to extract AU local regions on each frame image:
firstly, taking each AU key point on each frame image as a center, and extracting an n x n area as a local area where the AU is located;
then, each extracted AU local area is sent to a separate self-encoder to be encoded so as to obtain a feature vector containing specific AU related information, and in the self-encoder learning process of each AU local area, the following two loss functions are used for constraint:
the first is the pixel level reconstruction loss function LR
Figure FDA0002851518920000021
Where n is the size of each AU ROI, IGTIs the true value of AU ROI, IRIs a reconstructed AU ROI image;
second is ROI level multi-label AU detection loss function
Figure FDA0002851518920000022
Figure FDA0002851518920000023
Wherein: c is the number of categories, R is the number of ROIs acquired in the previous step, YROIE.g. {0,1}, R × C is the true value of the AU label,
Figure FDA0002851518920000024
indicating that AU j is not active in AU ROI i,
Figure FDA0002851518920000025
indicating that AUj is active in AUROI i, and used for measuring whether the current ROI contains a specific AU;
finally, the two loss functions are combined by using a trade-off parameter lambda, and the loss function finally used for AU depth representation extraction is obtained as follows:
LROI=LROI_softmax1LR
4. the method for recognizing facial expression motion units of a space-time graph convolutional network as claimed in claim 2, wherein the step 2) is specifically: the neighbor relations in the AU sequence spatio-temporal relation graph are divided into three categories: the neighbor relation between the AU node and the AU node, the spatial neighbor relation and the temporal neighbor relation;
for a certain node vtiIts neighbor set B (v)ti) The definition is as follows:
Figure FDA0002851518920000026
vtiis the ith node of the t frame, B (v)ti) Is v istiA neighbor set of nodes, d (x, y) refers to the co-occurrence probability between two nodes in the same frame, | x-y | is the spacing distance between two nodes in time, and K and Γ are the threshold values of the co-occurrence probability and the time distance respectively;
k, gamma is used as a hyper-parameter of the model, and a proper value is selected through training;
finally, a space-time relation adjacency matrix of the AU is obtained, each node in the relation graph represents an AU characteristic vector, adjacent nodes represent the AU which has close relation with the current AU node in time or space, and the space-time relation graph of the AU can be represented through the AU space-time relation adjacency matrix.
5. The method for recognizing facial expression motion units of a space-time graph convolutional network as claimed in claim 2, wherein step 3) is specifically: using ST-GCN model to perform graph convolution operation on the AU space-time relationship graph constructed in the step 3), namely performing graph convolution operation on each node in the space-time relationship graph respectively, dividing the neighbor set into different subsets, and firstly defining a mapping function l of a single frame:
l∶B(vti)→{0,1,…,K-1}
mapping function I will vtiThe neighbor set B is mapped into K different subsets, and each subset corresponds to a label number;
to compute subsets in the time dimension simultaneously, define lSTThe mapping function:
Figure FDA0002851518920000031
i.e. adding the time interval distance, l, to the original single-frame subset labelSTThe parameters of (1) are respectively a current node and a neighbor node to be mapped;
the formula for the graph convolution operation is as follows:
Figure FDA0002851518920000032
fin(*),fout(x) the convolved input eigenvalue and output eigenvalue of the node, respectively; zti(vtj)=|{vtk|lti(vtk)=lti(vtj) I.e. vtjThe subset-to-subset potential, used here as a regularization term to balance the effect of different subset sizes on the result; w (—) is the own weight function of each neighbor subset, and is obtained through learning;
then, carrying out convolution operation on the adjacent matrix of the time-space relationship graph by using the formula, and after repeatedly carrying out graph convolution operation, fully representing the relation among AUs and the characteristics of AUs by depth characteristic vectors;
and finally, classifying the vectors represented by the depth features by using a full convolution neural network to obtain the classification result of AU identification.
CN202011528440.6A 2020-12-22 2020-12-22 Facial expression motion unit identification method based on space-time graph convolutional network Pending CN112633153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011528440.6A CN112633153A (en) 2020-12-22 2020-12-22 Facial expression motion unit identification method based on space-time graph convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011528440.6A CN112633153A (en) 2020-12-22 2020-12-22 Facial expression motion unit identification method based on space-time graph convolutional network

Publications (1)

Publication Number Publication Date
CN112633153A true CN112633153A (en) 2021-04-09

Family

ID=75320901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011528440.6A Pending CN112633153A (en) 2020-12-22 2020-12-22 Facial expression motion unit identification method based on space-time graph convolutional network

Country Status (1)

Country Link
CN (1) CN112633153A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496217A (en) * 2021-07-08 2021-10-12 河北工业大学 Method for identifying human face micro expression in video image sequence
CN114842542A (en) * 2022-05-31 2022-08-02 中国矿业大学 Facial action unit identification method and device based on self-adaptive attention and space-time correlation
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111652124A (en) * 2020-06-02 2020-09-11 电子科技大学 Construction method of human behavior recognition model based on graph convolution network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111652124A (en) * 2020-06-02 2020-09-11 电子科技大学 Construction method of human behavior recognition model based on graph convolution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHILEI LIU 等: ""Relation modeling with graph convolutional networks for facial action unit detection"", 《INTERNATIONAL CONFERENCE ON MULTIMEDIA MODELING》 *
李聪慧: ""基于显著性特征与图卷积的面部表情分析研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496217A (en) * 2021-07-08 2021-10-12 河北工业大学 Method for identifying human face micro expression in video image sequence
CN113496217B (en) * 2021-07-08 2022-06-21 河北工业大学 Method for identifying human face micro expression in video image sequence
CN114842542A (en) * 2022-05-31 2022-08-02 中国矿业大学 Facial action unit identification method and device based on self-adaptive attention and space-time correlation
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction

Similar Documents

Publication Publication Date Title
Kim et al. Deep generative-contrastive networks for facial expression recognition
Miao et al. Recognizing facial expressions using a shallow convolutional neural network
Li et al. Semantic relationships guided representation learning for facial action unit recognition
Mei et al. Unsupervised spatial–spectral feature learning by 3D convolutional autoencoder for hyperspectral classification
Tu et al. Edge-guided non-local fully convolutional network for salient object detection
Lo et al. MER-GCN: Micro-expression recognition based on relation modeling with graph convolutional networks
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
CN113496217B (en) Method for identifying human face micro expression in video image sequence
CN112633153A (en) Facial expression motion unit identification method based on space-time graph convolutional network
CN104992223B (en) Intensive population estimation method based on deep learning
Taheri et al. Structure-preserving sparse decomposition for facial expression analysis
CN108280397B (en) Human body image hair detection method based on deep convolutional neural network
Liu et al. Facial expression recognition using hybrid features of pixel and geometry
CN109902565B (en) Multi-feature fusion human behavior recognition method
CN105740915B (en) A kind of collaboration dividing method merging perception information
CN111028319B (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
Martinsson et al. Semantic segmentation of fashion images using feature pyramid networks
CN110827304B (en) Traditional Chinese medicine tongue image positioning method and system based on deep convolution network and level set method
Li et al. Deep representation of facial geometric and photometric attributes for automatic 3d facial expression recognition
CN115862120A (en) Separable variation self-encoder decoupled face action unit identification method and equipment
CN112990340B (en) Self-learning migration method based on feature sharing
CN112800979B (en) Dynamic expression recognition method and system based on characterization flow embedded network
Almowallad et al. Human emotion distribution learning from face images using CNN and LBC features
Jia et al. An action unit co-occurrence constraint 3DCNN based action unit recognition approach
Dembani et al. UNSUPERVISED FACIAL EXPRESSION DETECTION USING GENETIC ALGORITHM.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210409