CN112800903A - Dynamic expression recognition method and system based on space-time diagram convolutional neural network - Google Patents

Dynamic expression recognition method and system based on space-time diagram convolutional neural network Download PDF

Info

Publication number
CN112800903A
CN112800903A CN202110067161.2A CN202110067161A CN112800903A CN 112800903 A CN112800903 A CN 112800903A CN 202110067161 A CN202110067161 A CN 202110067161A CN 112800903 A CN112800903 A CN 112800903A
Authority
CN
China
Prior art keywords
space
key points
time
expression
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110067161.2A
Other languages
Chinese (zh)
Other versions
CN112800903B (en
Inventor
卢官明
缪远俊
卢峻禾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110067161.2A priority Critical patent/CN112800903B/en
Publication of CN112800903A publication Critical patent/CN112800903A/en
Application granted granted Critical
Publication of CN112800903B publication Critical patent/CN112800903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a dynamic expression recognition method and system based on a space-time diagram convolutional neural network. Firstly, detecting key points of a face of each frame of image in a dynamic expression sequence to obtain normalized coordinates and serial numbers of the key points; extracting local texture feature vectors of the key points, splicing the local texture feature vectors with the normalized coordinates of the key points to form local fusion feature vectors of the key points; then connecting key points between the same frames to form a space domain edge, connecting key points with the same number of adjacent frames to form a time domain edge, and forming a space-time topological graph by using the edges and the key points; then, constructing a space-time graph convolutional neural network, and training the space-time graph convolutional neural network by using the generated space-time topological graph; and finally, taking a space-time topological graph generated based on the new expression sequence as input, and carrying out expression recognition by using the trained network model. The method utilizes the position information of the key points of the face, can overcome the influence of illumination, skin color and posture change, and improves the accuracy and robustness of expression recognition.

Description

Dynamic expression recognition method and system based on space-time diagram convolutional neural network
Technical Field
The invention relates to a dynamic expression recognition method and system based on a space-time diagram convolutional neural network, and belongs to the field of image processing and pattern recognition.
Background
As computers become more and more important in people's daily life, human-computer interaction will also become an inevitable trend in technological development. To enhance the human-computer interaction experience, computers need to have the ability to recognize human emotions. In 1986, the scientific research of the psychologist mehraban shows that in daily life, human face expression is an important carrier in emotion transmission, and can transmit the most abundant information, which exceeds the sum of the information amount transmitted by language and sound. Therefore, expression recognition is an essential link in human-computer interaction, and human emotional states are judged by extracting human expression information, so that human emotional requirements are met.
With the continuous enrichment of facial expression recognition technology, facial expression recognition has become a research hotspot in the field of computer vision and pattern recognition. Aiming at the problem of how to effectively extract the time and space information of the dynamic expression sequence, the current mainstream method mostly adopts a Convolutional Neural Network (CNN) to extract the space information of each frame of image expression, and then utilizes a long-short term memory network (LSTM) to extract the time information of the dynamic expression sequence; or directly utilizing a three-dimensional convolutional neural network (3D-CNN) to simultaneously convolve the input sequence in a space dimension and a time dimension, wherein the extracted features not only contain information in the images, but also contain information between the images. These methods typically use raw images as input to learn features related to the expression recognition task through supervised training. However, the original image is rich in too much interference information irrelevant to expression recognition, such as age, gender, illumination and other information, and the process from the original image to the low-dimensional feature vector finally used for expression classification is equivalent to a supervised dimension reduction process for mining useful information, and the process is often complex and needs to train a large number of parameters. The face contour formed by the face key points is a higher-level expression compared with the whole image, and the change of the face contour of different individuals in different expression states has the same characteristic mode, so that the model trained by the face key points has certain robustness on the change of skin color, illumination and posture, and in addition, the number of the key points is obviously less than that of the pixels of the whole image, and a simpler model can be obtained.
With the development in recent years, the convolutional neural network can well process data with a graph structure, such as social network relationships, communication networks, molecular structures and the like, and can map the data onto low-dimensional vectors, and the data cannot be processed by the traditional Convolutional Neural Network (CNN), so that a spatio-temporal topological graph generated based on human face key points can be processed by using the convolutional neural network, and higher-level features can be learned to realize classification of dynamic expressions.
The chinese patent application "a facial expression recognition method based on a graph convolution neural network" (patent application No. 201910091261.1, publication No. CN 110008819A) uses each pixel point in a facial expression gray scale image as a node of the image, constructs a topological graph according to a certain rule, and inputs the topological graph into the constructed graph convolution neural network model to obtain the classification result of the expression. The method is too complex for a topological graph constructed by each pixel point in an image, and is not beneficial to information fusion between nodes with long distances; in addition, the method is only suitable for images and cannot be applied to video sequences, and classification of dynamic expression sequences is achieved.
Chinese patent application "a dynamic expression recognition method based on facial feature point data enhancement" (patent application No. 202010776415.3, publication No. CN 111931630A) obtains the recognition result of the facial dynamic expression by inputting the initial frame and the peak frame of the dynamic expression sequence and the trajectory graph constructed according to the facial key points into the convolutional neural network, respectively. The method has the problems that the trajectory graph is manually designed according to the key points of the human face, the characteristic extraction process is complicated, the complexity is high, and the real-time performance of the model is influenced.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defect that the existing expression recognition method cannot effectively utilize the information of key points of the human face, the invention provides a dynamic expression recognition method and a dynamic expression recognition system based on a space-time graph convolutional neural network, which can fully utilize the position information of the key points of the human face and overcome the influence of illumination, skin color and posture change, thereby effectively improving the accuracy and robustness of dynamic expression recognition.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:
a dynamic expression recognition method based on a space-time diagram convolutional neural network comprises the following steps:
(1) preprocessing each expression sequence in the dynamic expression data set to obtain expression sequences with equal length;
(2) detecting key points of the face of each frame of image in the preprocessed expression sequence to obtain position coordinates and serial numbers of each key point, and normalizing the coordinates of the key points;
(3) extracting local texture feature vectors of each key point in the expression sequence, and splicing the local texture feature vectors with the normalized coordinates of the key points to obtain local fusion feature vectors of the key points;
(4) connecting key points of the expression sequences and the frames to form space domain edges, connecting key points of the adjacent frames with the same number to form time domain edges, forming edge sets of a space-time topological graph, and constructing the space-time topological graph by taking the key point sets of the expression sequences as node sets of the space-time topological graph;
(5) constructing a spatiotemporal graph convolutional neural network, wherein the network comprises a plurality of spatiotemporal graph convolutional blocks which are sequentially connected, a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on a space domain, the space diagram convolution in the space-time diagram convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the space domain;
(6) training a space-time graph convolutional neural network by utilizing the constructed space-time topological graph and the corresponding expression categories to obtain a trained space-time graph convolutional neural network model;
(7) and taking a space-time topological graph generated based on the new expression sequence as input, identifying by using the trained network model, and outputting a final classification result.
Further, the preprocessing in the step (1) comprises the following sub-steps:
(1.1) intercepting each expression sequence into a frame sequence with the length of S, intercepting the last S frame of the sequence for sequences more than S frames, and expanding the sequence to the S frames by using the last frame of the sequence for sequences less than S frames; wherein S is the frame length of a set expression sequence;
(1.2) normalizing the size of each frame of image in the sequence to ensure that the size of each frame of image is m multiplied by n pixels; wherein m and n are the set image width and height.
Further, the normalizing the coordinates of the key points in the step (2) comprises the following sub-steps:
(2.1) carrying out face key point detection on each frame of image in the preprocessed expression sequence to obtain a set of key points V ═ { V ═ V }t,iL 1 is more than or equal to t and less than or equal to S, 1 is more than or equal to i and less than or equal to N, wherein vt,i=(xt,i,yt,i) The key point coordinates with the number i in the t frame image are represented, S represents the frame length of the expression sequence, N represents the number of key points in each frame image, and the key points are distributed at the mouth, eyes, eyebrows and nose parts;
(2.2) subtracting the coordinates of the nose tip key points in the first frame from the coordinates of all key points to obtain a key point set V ' ═ V ' after coordinate normalization 't,i|1≤t≤S,1≤i≤N}。
Further, the local fusion feature vector in the step (3) is specifically implemented as follows:
noting the local texture feature vector extracted from the key point with the number of i in the t-th frame as lt,iThe feature vector is compared with the normalized coordinates v 'of the key point't,iSplicing to obtain local fusion characteristic vector m of key pointst,i={v′t,i,lt,i) And performing the same operation on all key points in the dynamic expression sequence to obtain a key point set of M ═ Mt,i|1≤t≤S,1≤i≤N}。
Further, the method for connecting key points between the expression sequences and the frames in the step (4) to form space domain edges comprises the following steps: firstly, connecting key points distributed at the positions of the mouth, eyes, eyebrows and nose according to the geometric structure of the human face to form edges of sub-images of all the positions; then, in order to facilitate the circulation of information among subgraphs of each part, the subgraphs are connected with each other to form edges among the subgraphs.
Further, the spatio-temporal map volume block in the step (5) is specifically calculated as follows:
(5.1) carrying out dimension transformation on the input feature map:
f=g(fin)
finfor inputting feature maps, dimension CinX T x N, wherein CinRepresenting the number of channels of the node characteristics, T representing the time dimension of the characteristic diagram, and N representing the number of nodes of an airspace; g (-) represents a dimension transformation function, and the input feature graph f is transformed by g (-)inIs transformed into nxcinT;
(5.2) calculating normalized similarity matrix B ═ Bi,j|1≤i,j≤N},bi,jRepresenting the similarity degree of the nodes i and j, and generating a new edge for the space-time topological graph by adopting a similarity measurement mode:
Figure BDA0002904586790000041
wherein f isiThe ith row vector of the f matrix is represented, the internal | is absolute value operation, and the external | is modulus operation;
(5.3) constructing a spatial graph convolution, specifically expressed as:
Figure BDA0002904586790000042
wherein
Figure BDA0002904586790000043
A={ai,jI is less than or equal to 1, j is less than or equal to N, and the dimensionality is NxN, whereinai,j0 denotes that the key points i and j are not connected, ai,j1 represents that two key points are connected, and ai,i1 is ═ 1; Λ is diagonal matrix, diagonal element Λi=∑jai,j;finThe input feature map convolved for the spatial map is the same as the input of step (5.1), h (-) and u (-) both represent the dimension transformation function, h (-) transforms the dimensions of the input feature map to CinT × N, and u (-) transforms the dimension of the computation result into CinX T x N; a convolution kernel with W of 1 × 1 for transforming the number of channels of the node feature into Cout,foutIs the output of the spatial graph convolution and has an output dimension of Cout×T×N;
(5.4) sequentially passing the output result of the step (5.3) through a normalization layer BN and a ReLu activation function layer;
(5.5) constructing residual connection, and connecting the output characteristic diagram of the step (5.4) and the input characteristic diagram f of the step (5.1)inCarrying out residual error connection;
(5.6) constructing a time domain convolution layer, wherein the dimension of the output characteristic diagram of the step (5.5) is CoutX T x N, the size of the time domain convolution kernel is set to [ m x 1]Then, the convolution of 1 node and m key frames is completed every time, and m is selected from 2, 3 and 4 values; the step length is s, the s frames are moved every time, the convolution of the next key point is carried out after 1 key point is completed, and the dimensionality of an output characteristic graph of the time domain convolution is C through padding operationout×(T/s)×N;
(5.7) sequentially passing the output result of the step (5.6) through the normalization layer BN and the ReLu activation function layer.
Based on the same inventive concept, the invention discloses a dynamic expression recognition system based on a space-time diagram convolutional neural network, which comprises:
the preprocessing module is used for preprocessing each expression sequence in the dynamic expression data set to obtain expression sequences with equal length;
the key point detection module is used for detecting the key points of the face of each frame of image in the preprocessed expression sequence to obtain the position coordinates and the serial numbers of each key point and normalizing the coordinates of the key points;
the key point feature fusion module is used for extracting a local texture feature vector of each key point in the expression sequence, and splicing the local texture feature vector with the normalized coordinates of the key points to obtain a local fusion feature vector of the key points;
the space-time topological graph building module is used for connecting key points of the expression sequences and the frames to form space domain edges, connecting key points of the adjacent frames with the same number to form time domain edges to form edge sets of the space-time topological graph, and building the space-time topological graph by taking the key point sets of the expression sequences as node sets of the space-time topological graph;
the space-time graph convolutional neural network module comprises a plurality of space-time graph convolutional blocks which are sequentially connected, a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on a space domain, the space diagram convolution in the space-time diagram convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the space domain;
the network training module is used for training the spatiotemporal graph convolutional neural network by utilizing the constructed spatiotemporal topological graph and the corresponding expression categories to obtain a trained spatiotemporal graph convolutional neural network model;
and the expression recognition module is used for taking a space-time topological graph generated based on the new expression sequence as input, recognizing by using a trained network model and outputting a final classification result.
Based on the same inventive concept, the dynamic expression recognition system based on the space-time graph convolutional neural network comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the dynamic expression recognition method based on the space-time graph convolutional neural network when being loaded to the processor.
Has the advantages that: compared with the prior art, the invention has the following technical effects:
(1) the time domain and space domain characteristics of the key points of the face are extracted by adopting a space-time graph convolutional neural network, the characteristic extraction is expanded from a static image to an image sequence, the parameters are adaptively adjusted through a training network, the dynamic characteristics capable of reflecting time information can be automatically extracted, and the extracted characteristics can better represent the dynamic change of facial expression; by the method, expressions such as happiness, heart injury, anger and the like can be effectively identified, and a new way is provided for developing systems such as intelligent human-computer interaction and the like.
(2) The invention can generate new edges for the space-time topological graph by adopting a similarity measurement mode, and can effectively make up the defect of manually designing the space-time topological graph; by the method, each node can fuse the information of the adjacent nodes and the information of the similar nodes, and the flexibility and the applicability of the model can be improved.
(3) Firstly, generating a space-time topological graph according to a dynamic expression sequence, and then identifying by using a trained space-time graph convolutional neural network model; compared with the whole dynamic expression sequence, the spatio-temporal topological graph formed by the face key points is a higher-level expression, and because each frame of image in the sequence is rich in too much interference information irrelevant to expression recognition, such as age, gender, illumination and the like, the network model has certain robustness to changes of skin color and illumination.
(4) According to the method, the local texture feature vector of each key point is fused with the coordinate information, so that the expression capability of the key point features is enhanced, and the accuracy of dynamic expression classification can be improved by the method because only the motion information of the key points is considered in the key point features completely based on the coordinates.
(5) The number of the face key points is far lower than that of the pixel points of the whole image, so that the input of the network model is converted into a space-time topological graph formed by the face key points from the whole dynamic expression sequence, the complexity of the model is greatly reduced, and the real-time performance of the model is improved.
(6) The invention can accurately mark the key points of the face deflected by a certain angle by using a key point detection algorithm, and construct a space-time topological graph, thereby realizing the identification of dynamic expressions, so the method has certain robustness to the change of the posture.
Drawings
FIG. 1 is a general flow diagram of a method of an embodiment of the invention.
Fig. 2 is a spatial topology diagram constructed by an embodiment of the present invention. Fig. 3 is a diagram of a network model structure constructed according to an embodiment of the present invention.
Fig. 4 is a screenshot of a partial sequence of images of a CK + expression data set used in an embodiment of the invention.
Detailed Description
The technical solution of the present invention is further explained with reference to the accompanying drawings and the specific embodiments.
As shown in fig. 1, the method for identifying a dynamic expression based on a space-time graph convolutional neural network disclosed in the embodiment of the present invention specifically includes the following steps:
and (1) preprocessing each expression sequence in the dynamic expression data set, so that each expression sequence sample can be represented by a sequence with the same length, and the preprocessed expression sequence is obtained. The method specifically comprises the following substeps:
(1.1) intercepting each expression sequence into a frame sequence with the length of S, intercepting the last S frame of the sequence for sequences more than S frames, and expanding the sequence to the S frames by using the last frame of the sequence for sequences less than S frames; wherein S is the frame length of a set expression sequence;
(1.2) normalizing the size of each frame of image in the sequence to ensure that the size of each frame of image is m multiplied by n pixels; wherein m and n are the set image width and height.
In the embodiment, a CK + dynamic expression data set is used, and a part of samples in the data set are shown in fig. 4, and expression categories of the data set are anger, disgust, fear, happiness, sadness, surprise and neutrality; in practice, other video data sets may be used, or a camera may be used to capture facial expression videos to establish an expression video library including emotion category labels. Preprocessing expression sequences in the data set, intercepting each expression sequence into a frame sequence with the length of 16, intercepting the last 16 frames of the sequence for sequences with more than 16 frames, and expanding the last frame of the sequence to 16 frames by using the last frame of the sequence for sequences with less than 16 frames; meanwhile, the size of each frame of image in the sequence is normalized, so that the size of each frame of image is 64 x 64 pixels.
Step (2) face key point detection is carried out on each frame of image in the preprocessed expression sequence, the position coordinates and the serial numbers of each key point are returned, and the coordinates of the key point sequence are normalized, and the method comprises the following substeps:
(2.1) carrying out face key point detection on each frame of image in the preprocessed expression sequence to obtain a set of key points V ═ { V ═ V }t,iL 1 is more than or equal to t and less than or equal to S, 1 is more than or equal to i and less than or equal to N, wherein vt,i=(xt,i,yt,i) The key point coordinates with the number i in the t frame image are represented, S represents the frame length of the expression sequence, N represents the number of key points in each frame image, and the key points are distributed at the mouth, eyes, eyebrows and nose parts;
(2.2) subtracting the coordinates of the nose tip key points in the first frame from the coordinates of all key points to obtain a key point set V ' ═ V ' after coordinate normalization 't,i|1≤t≤S,1≤i≤N}。
In the embodiment, a Dlib open source toolkit is adopted to detect key points of each frame of image, and the coordinates and the numbers of 68 key points such as a nose tip, a mouth corner, an eye corner and the like are returned; to reduce complexity, the key points numbered 1-17 are deleted and the remaining 51 key points are renumbered in the original order.
Setting 51 key point sets obtained by detecting face key points of the t-th frame image as Vt={vt,iI is more than or equal to 1 and less than or equal to 51, and the coordinates of the key points with the number of 14 in the first frame are subtracted from the coordinates of all the key points; the same operation is carried out on each frame of image, and a key point set V 'after coordinate normalization is obtained't,i|1≤t≤16,1≤i≤51={vt,i-v1,14|1≤t≤16,1≤i≤51}。
Extracting local texture feature vectors of each key point in the expression sequence, and splicing the local texture feature vectors with the normalized coordinates of the key points to obtain local fusion feature vectors of the key pointsAmount of the compound (A). Specifically, let l be the local texture feature vector extracted from the key point numbered i in the t-th framet,iThe feature vector is compared with the normalized coordinates v 'of the key point't,iSplicing to obtain local fusion characteristic vector m of key pointst,i={v′t,i,lt,i) And performing the same operation on all key points in the dynamic expression sequence to obtain a key point set of M ═ Mt,iT is more than or equal to 1 and less than or equal to S, and i is more than or equal to 1 and less than or equal to N }. In this embodiment, a rotation invariant LBP operator with a radius of 1 and 8 sampling points is used to calculate the minimum LBP value of each key point as the local texture feature vector of the key point.
Step (4) constructing a space-time topological graph according to key points of the expression sequence, and comprising the following substeps:
(4.1) forming an airspace edge by connecting key points between the same frames, and firstly connecting the key points distributed at the eyebrows, the eyes, the nose and the mouth according to the geometric structure of the human face to form the edge of each organ subgraph; then, in order to facilitate the circulation of information among the organ subgraphs, the subgraphs are connected with each other to form edges among the organ subgraphs, as shown in fig. 2; forming time domain edges by connecting key points with the same number of adjacent frames, wherein the edges form an edge set E of the space-time topological graph;
(4.2) dynamic expression sequence key point set M ═ { M ═ Mt,iL 1 is more than or equal to t is less than or equal to 16, and i is more than or equal to 1 and less than or equal to 51 are used as a node set of the space-time topological graph;
and (4.3) forming a space-time topological graph Q (M, E) by using the edge set E and the node set M.
Step (5) constructing a spatio-temporal graph convolutional neural network, wherein the network comprises k spatio-temporal graph convolutional blocks which are sequentially connected, k is selected from 6, 8 and 10 numerical values, and the network comprises a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on the airspace, the spatial graph convolution in the spatial graph convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the airspace.
The spatio-temporal graph convolutional neural network constructed in the embodiment comprises six spatio-temporal graph convolutional blocks which are sequentially connected, a global average pooling layer, two full connection layers and a softmax classification layer. The spatio-temporal map rolling block is specifically as follows:
the space domain edge of the space-time topological graph is artificially designed according to the natural structure of the face, and a new edge cannot be generated in the whole network; for example, in the smiling process, a left mouth corner key point and a right mouth corner key point of the face are deformed similarly, the characteristics of the two key points have higher similarity, and if an edge is constructed between the two key points, the fusion of key point information is facilitated; therefore, a new edge can be generated for the space-time topological graph by adopting a similarity measurement mode, and the flexibility of the model is improved; the calculation steps of the spatio-temporal map volume block are as follows:
(5.1) carrying out dimension transformation on the input feature map:
f=g(fin)
finfor inputting feature maps, dimension CinX T x N, wherein CinRepresenting the number of channels of the node characteristics, T representing the time dimension of the characteristic diagram, and N representing the number of nodes of an airspace; g (-) represents a dimension transformation function, and the input feature graph f can be transformed through g (-)inIs transformed into nxcinT;
(5.2) calculating normalized similarity matrix B ═ Bi,j|1≤i,j≤51},bi,jRepresents the degree of similarity of nodes i and j:
Figure BDA0002904586790000091
wherein f isiThe ith row vector of the f matrix is represented, the internal | is absolute value operation, the external | is modulo operation, all key points are normalized by using the key points of the nose tip in the first frame, so that the key points of the face vertically symmetrical about the nose tip have similar position coordinates through absolute value operation, and the two key points have higher similarity;
(5.3) constructing a spatial graph convolution, specifically expressed as:
Figure BDA0002904586790000092
wherein
Figure BDA0002904586790000093
A={ai,jI is less than or equal to 1, j is less than or equal to 51, and the dimensionality is 51 multiplied by 51, wherein ai,j0 denotes that the key points i and j are not connected, ai,j1 represents two key points connected, ai,i1 is ═ 1; Λ is diagonal matrix, diagonal element Λi=∑jai,j(ii) a Fin is the input feature graph of the spatial graph convolution is the same as the input of the step (5.1), h (-) and u (-) both represent dimension transformation functions, h (-) transforms the dimension of the input feature graph into CinT × 51, and u (-) transforms the dimension of the computation result into CinX T x 51; a convolution kernel with W of 1 × 1 for transforming the number of channels of the node feature into Cout,foutIs the output of the spatial graph convolution and has an output dimension of Cout×T×51;
(5.4) sequentially passing the output result of the step (5.3) through a normalization layer (BN) and a ReLu activation function layer;
(5.5) constructing residual connection, wherein the dimension of the output characteristic diagram of the step (5.4) is CoutX T x 51, and step (5.1) input feature map finDimension of CinX T X51 when Cin=CoutThen, the two characteristic maps are directly added to fin+foutWhen C is presentin≠CoutWhen necessary, input feature map finIs converted into CoutThen, performing an addition operation;
(5.6) constructing a time domain convolution layer, wherein the convolution operation can only fuse the information of the adjacent and similar nodes of the node in the space domain and cannot fuse the information of the adjacent node in the time domain;
analogous to image convolution, in time-domain convolution, the size of the convolution kernel is [ m × 1 ]]Then, the convolution of m key frames is finished by 1 node every time, the step length is s, then the convolution moves by s frames every time, and 1 joint point is finishedPerforming convolution of the next joint point, wherein the dimension of an output characteristic graph of the time domain convolution is C through the padding operationout×(T/s)×51;
(5.7) sequentially passing the output result of the step (5.6) through a normalization layer (BN) and a ReLu activation function layer.
The space-time pattern convolutional neural network in this embodiment is shown in fig. 3, and each layer of specific information is as follows:
the space-time map convolution block C1 has an input channel number of 3, an output channel number of 32, and a step size of 1;
the space-time map convolution block C2 has an input channel number of 32, an output channel number of 32, and a step size of 1;
the space-time map convolution block C3 has an input channel number of 32, an output channel number of 64, and a step size of 2;
the space-time map convolution block C4 has an input channel number of 64, an output channel number of 64, and a step size of 1;
the space-time map convolution block C5 has an input channel number of 64, an output channel number of 128, and a step size of 2;
the number of input channels of the spatio-temporal map convolution block C6 is 128, the number of output channels is 128, and the step size is 1;
global average pooling layer: the output characteristic dimensionality after 6 space-time volume blocks is 128 multiplied by 4 multiplied by 51, and all nodes are subjected to average operation to obtain a 128-dimensional vector;
full connection layer: the first fully connected layer has 128 dimensions for input and 64 dimensions for output, and the second fully connected layer has 64 dimensions for input and 7 dimensions for output.
And (6) training the spatiotemporal graph convolutional neural network by utilizing the constructed spatiotemporal topological graph and the corresponding expression type to obtain a trained spatiotemporal graph convolutional neural network model. During training, an Adam method is adopted as an optimization strategy, and cross entropy is selected as a loss function of gradient back propagation.
And (7) taking a space-time topological graph generated based on the new expression sequence as input, identifying by using the trained network model, and outputting a final classification result.
Based on the same inventive concept, the embodiment of the invention discloses a dynamic expression recognition system based on a space-time diagram convolutional neural network, which comprises:
the preprocessing module is used for preprocessing each expression sequence in the dynamic expression data set to obtain expression sequences with equal length;
the key point detection module is used for detecting the key points of the face of each frame of image in the preprocessed expression sequence to obtain the position coordinates and the serial numbers of each key point and normalizing the coordinates of the key points;
the key point feature fusion module is used for extracting a local texture feature vector of each key point in the expression sequence, and splicing the local texture feature vector with the normalized coordinates of the key points to obtain a local fusion feature vector of the key points;
the space-time topological graph building module is used for connecting key points of the expression sequences and the frames to form space domain edges, connecting key points of the adjacent frames with the same number to form time domain edges to form edge sets of the space-time topological graph, and building the space-time topological graph by taking the key point sets of the expression sequences as node sets of the space-time topological graph;
the space-time graph convolutional neural network module comprises a plurality of space-time graph convolutional blocks which are sequentially connected, a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on a space domain, the space diagram convolution in the space-time diagram convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the space domain;
the network training module is used for training the spatiotemporal graph convolutional neural network by utilizing the constructed spatiotemporal topological graph and the corresponding expression categories to obtain a trained spatiotemporal graph convolutional neural network model;
and the expression recognition module is used for taking a space-time topological graph generated based on the new expression sequence as input, recognizing by using a trained network model and outputting a final classification result.
Based on the same inventive concept, the dynamic expression recognition system based on the space-time graph convolutional neural network disclosed by the invention comprises at least one computing device, wherein the computing device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the computer program realizes the dynamic expression recognition method based on the space-time graph convolutional neural network of the embodiment when being loaded into the processor.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (8)

1. A dynamic expression recognition method based on a space-time diagram convolutional neural network is characterized by comprising the following steps:
(1) preprocessing each expression sequence in the dynamic expression data set to obtain expression sequences with equal length;
(2) detecting key points of the face of each frame of image in the preprocessed expression sequence to obtain position coordinates and serial numbers of each key point, and normalizing the coordinates of the key points;
(3) extracting local texture feature vectors of each key point in the expression sequence, and splicing the local texture feature vectors with the normalized coordinates of the key points to obtain local fusion feature vectors of the key points;
(4) connecting key points of the expression sequences and the frames to form space domain edges, connecting key points of the adjacent frames with the same number to form time domain edges, forming edge sets of a space-time topological graph, and constructing the space-time topological graph by taking the key point sets of the expression sequences as node sets of the space-time topological graph;
(5) constructing a spatiotemporal graph convolutional neural network, wherein the network comprises a plurality of spatiotemporal graph convolutional blocks which are sequentially connected, a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on a space domain, the space diagram convolution in the space-time diagram convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the space domain;
(6) training a space-time graph convolutional neural network by utilizing the constructed space-time topological graph and the corresponding expression categories to obtain a trained space-time graph convolutional neural network model;
(7) and taking a space-time topological graph generated based on the new expression sequence as input, identifying by using the trained network model, and outputting a final classification result.
2. The method for recognizing the dynamic expression based on the spatio-temporal convolutional neural network as claimed in claim 1, wherein the preprocessing in step (1) comprises the following sub-steps:
(1.1) intercepting each expression sequence into a frame sequence with the length of S, intercepting the last S frame of the sequence for sequences more than S frames, and expanding the sequence to the S frames by using the last frame of the sequence for sequences less than S frames; wherein S is the frame length of a set expression sequence;
(1.2) normalizing the size of each frame of image in the sequence to ensure that the size of each frame of image is m multiplied by n pixels; wherein m and n are the set image width and height.
3. The method for recognizing dynamic expressions based on a spatio-temporal convolutional neural network as claimed in claim 1, wherein the step (2) of normalizing the coordinates of the key points comprises the following sub-steps:
(2.1) carrying out face key point detection on each frame of image in the preprocessed expression sequence to obtain a set of key points V ═ { V ═ V }t,iL 1 is more than or equal to t and less than or equal to S, 1 is more than or equal to i and less than or equal to N, wherein vt,i=(xt,i,yt,i) The key point coordinates with the number i in the t frame image are represented, S represents the frame length of the expression sequence, N represents the number of key points in each frame image, and the key points are distributed at the mouth, eyes, eyebrows and nose parts;
(2.2) subtracting the coordinates of the nose tip key points in the first frame from the coordinates of all the key points to obtain normalized coordinatesIs equal to { V't,i|1≤t≤S,1≤i≤N}。
4. The method for identifying dynamic expressions based on the spatio-temporal convolutional neural network as claimed in claim 1, wherein the local fusion feature vector in the step (3) is implemented by the following steps:
noting the local texture feature vector extracted from the key point with the number of i in the t-th frame as lt,iThe feature vector is compared with the normalized coordinates v 'of the key point't,iSplicing to obtain local fusion characteristic vector m of key pointst,i={v′t,i,lt,i) And performing the same operation on all key points in the dynamic expression sequence to obtain a key point set of M ═ Mt,i|1≤t≤S,1≤i≤N}。
5. The method for identifying dynamic expressions based on the spatio-temporal convolutional neural network of claim 1, wherein the method for connecting key points between the expression sequence and the frame in step (4) to form a spatial boundary comprises the following steps: firstly, connecting key points distributed at the positions of the mouth, eyes, eyebrows and nose according to the geometric structure of the human face to form edges of sub-images of all the positions; then, in order to facilitate the circulation of information among subgraphs of each part, the subgraphs are connected with each other to form edges among the subgraphs.
6. The method for identifying dynamic expressions based on the spatiotemporal convolutional neural network as claimed in claim 1, wherein the spatiotemporal convolutional block in the step (5) is specifically calculated as follows:
(5.1) carrying out dimension transformation on the input feature map:
f=g(fin)
finfor inputting feature maps, dimension CinX T x N, wherein CinRepresenting the number of channels of the node characteristics, T representing the time dimension of the characteristic diagram, and N representing the number of nodes of an airspace; g (-) represents a dimension transformation function, and the input feature graph f is transformed by g (-)inIs transformed into nxcinT;
(5.2) calculating normalized similarity matrix B ═ Bi,j|1≤i,j≤N},bi,jRepresenting the similarity degree of the nodes i and j, and generating a new edge for the space-time topological graph by adopting a similarity measurement mode:
Figure FDA0002904586780000031
wherein f isiThe ith row vector of the f matrix is represented, the internal | is absolute value operation, and the external | is modulus operation;
(5.3) constructing a spatial graph convolution, specifically expressed as:
Figure FDA0002904586780000032
wherein
Figure FDA0002904586780000033
A={ai,jI is less than or equal to 1, j is less than or equal to N, and the dimensionality is NxN, wherein ai,j0 denotes that the key points i and j are not connected, ai,j1 represents that two key points are connected, and ai,i1 is ═ 1; Λ is diagonal matrix, diagonal element Λi=∑jai,j;finThe input feature map convolved for the spatial map is the same as the input of step (5.1), h (-) and u (-) both represent the dimension transformation function, h (-) transforms the dimensions of the input feature map to CinT × N, and u (-) transforms the dimension of the computation result into CinX T x N; a convolution kernel with W of 1 × 1 for transforming the number of channels of the node feature into Cout,foutIs the output of the spatial graph convolution and has an output dimension of Cout×T×N;
(5.4) sequentially passing the output result of the step (5.3) through a normalization layer BN and a ReLu activation function layer;
(5.5) constructing residual connection, and connecting the output characteristic diagram of the step (5.4) and the input characteristic diagram of the step (5.1)Sign finCarrying out residual error connection;
(5.6) constructing a time domain convolution layer, wherein the dimension of the output characteristic diagram of the step (5.5) is CoutX T x N, the size of the time domain convolution kernel is set to [ m x 1]Then, the convolution of 1 node and m key frames is completed every time, and m is selected from 2, 3 and 4 values; the step length is s, the s frames are moved every time, the convolution of the next key point is carried out after 1 key point is completed, and the dimensionality of an output characteristic graph of the time domain convolution is C through padding operationout×(T/s)×N;
(5.7) sequentially passing the output result of the step (5.6) through the normalization layer BN and the ReLu activation function layer.
7. A dynamic expression recognition system based on a space-time graph convolutional neural network is characterized by comprising:
the preprocessing module is used for preprocessing each expression sequence in the dynamic expression data set to obtain expression sequences with equal length;
the key point detection module is used for detecting the key points of the face of each frame of image in the preprocessed expression sequence to obtain the position coordinates and the serial numbers of each key point and normalizing the coordinates of the key points;
the key point feature fusion module is used for extracting a local texture feature vector of each key point in the expression sequence, and splicing the local texture feature vector with the normalized coordinates of the key points to obtain a local fusion feature vector of the key points;
the space-time topological graph building module is used for connecting key points of the expression sequences and the frames to form space domain edges, connecting key points of the adjacent frames with the same number to form time domain edges to form edge sets of the space-time topological graph, and building the space-time topological graph by taking the key point sets of the expression sequences as node sets of the space-time topological graph;
the space-time graph convolutional neural network module comprises a plurality of space-time graph convolutional blocks which are sequentially connected, a global average pooling layer, two full-connection layers and a classification layer; on the basis of realizing the feature fusion of adjacent nodes on a space domain, the space diagram convolution in the space-time diagram convolution block firstly calculates the similarity between the nodes to obtain a similarity matrix, and then multiplies the matrix by the input feature to realize the feature fusion of the similar nodes on the space domain;
the network training module is used for training the spatiotemporal graph convolutional neural network by utilizing the constructed spatiotemporal topological graph and the corresponding expression categories to obtain a trained spatiotemporal graph convolutional neural network model;
and the expression recognition module is used for taking a space-time topological graph generated based on the new expression sequence as input, recognizing by using a trained network model and outputting a final classification result.
8. A dynamic expression recognition system based on a space-time graph convolutional neural network, comprising at least one computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program, when loaded into the processor, implements the method for dynamic expression recognition based on a space-time graph convolutional neural network according to any one of claims 1 to 6.
CN202110067161.2A 2021-01-19 2021-01-19 Dynamic expression recognition method and system based on space-time diagram convolutional neural network Active CN112800903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067161.2A CN112800903B (en) 2021-01-19 2021-01-19 Dynamic expression recognition method and system based on space-time diagram convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067161.2A CN112800903B (en) 2021-01-19 2021-01-19 Dynamic expression recognition method and system based on space-time diagram convolutional neural network

Publications (2)

Publication Number Publication Date
CN112800903A true CN112800903A (en) 2021-05-14
CN112800903B CN112800903B (en) 2022-08-26

Family

ID=75810344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067161.2A Active CN112800903B (en) 2021-01-19 2021-01-19 Dynamic expression recognition method and system based on space-time diagram convolutional neural network

Country Status (1)

Country Link
CN (1) CN112800903B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159007A (en) * 2021-06-24 2021-07-23 之江实验室 Gait emotion recognition method based on adaptive graph convolution
CN113435576A (en) * 2021-06-24 2021-09-24 中国人民解放军陆军工程大学 Double-speed space-time graph convolution neural network architecture and data processing method
CN113469144A (en) * 2021-08-31 2021-10-01 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model
CN113468980A (en) * 2021-06-11 2021-10-01 浙江大华技术股份有限公司 Human behavior recognition method and related device
CN113569675A (en) * 2021-07-15 2021-10-29 郑州大学 Mouse open field experimental behavior analysis method based on ConvLSTM network
CN114050975A (en) * 2022-01-10 2022-02-15 苏州浪潮智能科技有限公司 Heterogeneous multi-node interconnection topology generation method and storage medium
CN115272943A (en) * 2022-09-29 2022-11-01 南通双和食品有限公司 Livestock and poultry feeding abnormity identification method based on data processing
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN116311472A (en) * 2023-04-07 2023-06-23 湖南工商大学 Micro-expression recognition method and device based on multi-level graph convolution network
CN113468980B (en) * 2021-06-11 2024-05-31 浙江大华技术股份有限公司 Human behavior recognition method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684911A (en) * 2018-10-30 2019-04-26 百度在线网络技术(北京)有限公司 Expression recognition method, device, electronic equipment and storage medium
CN110796110A (en) * 2019-11-05 2020-02-14 西安电子科技大学 Human behavior identification method and system based on graph convolution network
CN111325099A (en) * 2020-01-21 2020-06-23 南京邮电大学 Sign language identification method and system based on double-current space-time diagram convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邹建成等: "一种基于改进的卷积神经网络的人脸表情识别方法", 《北方工业大学学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468980A (en) * 2021-06-11 2021-10-01 浙江大华技术股份有限公司 Human behavior recognition method and related device
CN113468980B (en) * 2021-06-11 2024-05-31 浙江大华技术股份有限公司 Human behavior recognition method and related device
CN113435576A (en) * 2021-06-24 2021-09-24 中国人民解放军陆军工程大学 Double-speed space-time graph convolution neural network architecture and data processing method
CN113159007A (en) * 2021-06-24 2021-07-23 之江实验室 Gait emotion recognition method based on adaptive graph convolution
CN113569675B (en) * 2021-07-15 2023-05-23 郑州大学 ConvLSTM network-based mouse open field experimental behavior analysis method
CN113569675A (en) * 2021-07-15 2021-10-29 郑州大学 Mouse open field experimental behavior analysis method based on ConvLSTM network
CN113469144A (en) * 2021-08-31 2021-10-01 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model
CN113469144B (en) * 2021-08-31 2021-11-09 北京文安智能技术股份有限公司 Video-based pedestrian gender and age identification method and model
CN114050975A (en) * 2022-01-10 2022-02-15 苏州浪潮智能科技有限公司 Heterogeneous multi-node interconnection topology generation method and storage medium
CN115272943A (en) * 2022-09-29 2022-11-01 南通双和食品有限公司 Livestock and poultry feeding abnormity identification method based on data processing
CN115861822B (en) * 2023-02-07 2023-05-12 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN116311472A (en) * 2023-04-07 2023-06-23 湖南工商大学 Micro-expression recognition method and device based on multi-level graph convolution network
CN116311472B (en) * 2023-04-07 2023-10-31 湖南工商大学 Micro-expression recognition method and device based on multi-level graph convolution network

Also Published As

Publication number Publication date
CN112800903B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
Zhang et al. Multimodal learning for facial expression recognition
Youssif et al. Automatic facial expression recognition system based on geometric and appearance features
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
Murtaza et al. Analysis of face recognition under varying facial expression: a survey.
CN111553267B (en) Image processing method, image processing model training method and device
CN111401216A (en) Image processing method, model training method, image processing device, model training device, computer equipment and storage medium
Li et al. Learning symmetry consistent deep cnns for face completion
Yang et al. Facial expression recognition based on dual-feature fusion and improved random forest classifier
CN113989890A (en) Face expression recognition method based on multi-channel fusion and lightweight neural network
Zhao et al. Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection
Yu Emotion monitoring for preschool children based on face recognition and emotion recognition algorithms
CN112906520A (en) Gesture coding-based action recognition method and device
Podder et al. Time efficient real time facial expression recognition with CNN and transfer learning
Jin et al. Learning facial expressions with 3D mesh convolutional neural network
Tautkutė et al. Classifying and visualizing emotions with emotional DAN
CN112800979B (en) Dynamic expression recognition method and system based on characterization flow embedded network
CN113076905B (en) Emotion recognition method based on context interaction relation
Ling et al. Human object inpainting using manifold learning-based posture sequence estimation
CN112016592B (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
CN115862120B (en) Face action unit identification method and equipment capable of decoupling separable variation from encoder
Kale et al. Face age synthesis: A review on datasets, methods, and open research areas
Dembani et al. UNSUPERVISED FACIAL EXPRESSION DETECTION USING GENETIC ALGORITHM.
Moran Classifying emotion using convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant