CN116935465A - Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method - Google Patents

Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method Download PDF

Info

Publication number
CN116935465A
CN116935465A CN202310808285.0A CN202310808285A CN116935465A CN 116935465 A CN116935465 A CN 116935465A CN 202310808285 A CN202310808285 A CN 202310808285A CN 116935465 A CN116935465 A CN 116935465A
Authority
CN
China
Prior art keywords
micro
frame
optical flow
dimensional
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310808285.0A
Other languages
Chinese (zh)
Other versions
CN116935465B (en
Inventor
李军
许静怡
王有为
徐文涛
徐晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202310808285.0A priority Critical patent/CN116935465B/en
Priority claimed from CN202310808285.0A external-priority patent/CN116935465B/en
Publication of CN116935465A publication Critical patent/CN116935465A/en
Application granted granted Critical
Publication of CN116935465B publication Critical patent/CN116935465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a microexpressive recognition method based on a three-dimensional residual convolution neural network and an optical flow method, which comprises the following steps: pre-processing an original micro-expression video: preprocessing comprises video framing, face alignment and clipping, and peak frame positioning to extract key frame sequences; carrying out graying treatment on the key frame image sequence, and extracting optical flow characteristics from the gray image sequence to obtain a three-channel image sequence as input of a network model; and combining a residual error module to improve the three-dimensional convolutional neural network, and obtaining the recognition and classification results of the micro-expression emotion through feature extraction and analysis. The application improves the recognition rate of the micro-expressions in the video clips and has good practical value.

Description

Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method
Technical Field
The application relates to the technical field of computer vision, in particular to a microexpressive recognition method based on a three-dimensional residual convolution neural network and an optical flow method.
Background
Facial microexpressions are short and subtle facial movements that occur during emotional communication, and are the result of conscious or unconscious suppression. It usually occurs when people try to hide their own true feelings, on the contrary, macro-expressions are ordinary emotional facial expressions, which are easily perceived and interpreted by others in daily interactions. The main difference between macro-and micro-expressions is their intensity and duration. The macro expression typically lasts 0.5 to 4 seconds, while the micro expression lasts no more than 0.5 seconds. Microexpressions are more challenging to analyze due to the short time span and fine granularity variation. Microexpressions are used as clues for detecting lie, and have wide research and practical application in many fields such as psychology, education, medical health, criminal investigation, etc.
Early, the dominant methods of microexpressive feature extraction were Local Binary Pattern (LBP) based and optical flow based feature representation methods. However, the manual extraction of the features is large in calculation amount and long in time consumption, redundant information is easy to generate, and compared with the traditional features which need priori knowledge to perform manual design, the automatic learning through the neural network can not only obtain the micro-expression semantic information of a higher layer, but also enhance the generalization capability of the recognition model. With the development of technology, researchers have introduced deep learning algorithms into the field of micro-expression recognition, including applications of convolutional neural networks, cyclic neural networks, long-short-term memory networks and the like, but only spatial domain features are generally focused, and time domain information of continuous actions is ignored, so that recognition effects are improved.
Disclosure of Invention
The application provides a microexpressive recognition method based on a three-dimensional residual convolution neural network and an optical flow method, which can be used for solving the technical problem that the time domain information of continuous actions per se is ignored in the prior art.
The application provides a micro-expression recognition method based on a three-dimensional residual convolution neural network and an optical flow method, which comprises the following steps:
step A, pre-processing an original micro-expression video: preprocessing includes video framing, face alignment and cropping, and locating peak frames to extract key frame sequences.
Step B, carrying out graying treatment on the key frame image sequence, and extracting optical flow characteristics from the gray frame image sequence to obtain a three-channel image sequence as the input of a network model;
step C, a three-dimensional convolutional neural network is improved by combining a residual error module, and recognition and classification results of the micro-expression emotion are obtained through feature extraction and analysis;
optionally, preprocessing the original micro-expression video includes:
positioning a micro-expression peak value frame by adopting a frequency domain-based method, and extracting the peak value frame and 4 continuous images before and after the peak value frame to form a 9-frame micro-expression image key frame sequence;
peak frame positioning is achieved by:
dividing a video frame sequence according to a preset interval, dividing a face region in an image frame into 6 multiplied by 6 blocks, combining a sliding window with the length of N, adopting three-dimensional fast Fourier transform in each frame interval in sequence, and calculating frequency values of 36 blocks by a 3D FFT; the block is denoted as { b } i1 ,b i2 ,…,b i36 The frequency value of the j-th block of the i-th section is shown in formula (1):
where (x, y, z) denotes its position in the frequency domain, L b And W is b Respectively represent the j-th block b ij Height and width at the i-th interval, and j= {1,2, …,36};
after obtaining the frequency domain signal, filtering the low frequency signal by adopting a high-pass filter; high pass filterIs defined as formula (2), wherein D 0 Representing a threshold value:
filtering the frequency domain signal of the video block according to equation (3):
next, the frequency domain amplitude of 36 blocks of the ith video interval is determined according to equation (4)
Wherein A is i Representing the frequency amplitude of the ith frame interval, i.e. the range of rapid facial motion of the ith interval;
and obtaining frequency information of all video frame intervals, namely obtaining the highest intensity frame representing the rapid facial movement in the peak value interval with the largest frequency amplitude, and taking the middle frame of the peak value interval with the largest frequency amplitude as the micro expression peak value frame.
Optionally, the method includes performing graying processing on the key frame image sequence, and extracting optical flow features from the gray frame image sequence to obtain a three-channel image sequence as an input of a network model, including:
extracting optical flow characteristics from the gray picture sequence: the optical flow horizontal component u and the optical flow vertical component v are obtained by equations (5) and (6):
the optical strain ε is further extracted by calculating the derivative of the optical flow as shown in equation (7):
in the formula, the diagonal terms (. Epsilon.) xxyy ) Is the normal strain component (ε) xyyx ) Is a shear strain component;
then, the optical strain value of each pixel is calculated by taking the sum of squares of the normal strain component and the shear strain component, thereby obtaining |ε|, as shown in formula (8):
the optical flow horizontal component u, the optical flow vertical component v and the optical strain epsilon are formed into a new three-channel micro-expression image sequence in a channel cascade mode.
Optionally, the three-dimensional convolutional neural network is improved by combining with a residual error module, and recognition and classification results of the micro-expression emotion are obtained through feature extraction and analysis, including:
firstly, a three-dimensional convolutional neural network model is improved by combining a residual error module, and a three-dimensional residual error convolutional neural network is constructed:
the 3D ResNet network comprises two 3D Conv modules, 3D Res modules, a 3-layer Dropout layer, a 2-layer Dense layer, a 1-layer flattening layer, a 1-layer batch regularization layer, 1 Relu activation function and 1 Softmax layer;
the 3D Conv module comprises 1 three-dimensional convolution layer, 1 batch regularization layer, 1 Relu activation function and one three-dimensional maximum pooling layer;
the 3D Res module adds an original input x on the basis of an original Relu function output F (x) aiming at the input of the MaxPooling layer;
in the three-dimensional residual convolution module, a shortcut for connecting input and output is established by adding a direct connection edge to a nonlinear convolution layer;
then, carrying out emotion classification on the micro-expressions by adopting a Softmax classifier, wherein a loss function uses cross entropy;
the method is shown as a formula (9):
in the formula, y represents the true distribution,representing network output distribution, n representing total category number;
and inputting the optimized micro-expression video into an improved three-dimensional convolutional neural network, and obtaining recognition and classification results of the micro-expression emotion through feature extraction and analysis.
Compared with the prior art, the application has the remarkable advantages that: (1) Extracting optical flow characteristics of the key frame sequence as the input of the network model can effectively eliminate redundant information, and the recognition effect is superior to micro expression recognition based on appearance characteristics; (2) The micro expression characteristics of the time dimension and the space dimension can be synchronously learned by utilizing the three-dimensional convolution neural network to carry out micro expression recognition, the problem of network degradation and gradient explosion can be effectively solved by introducing the residual error module, and a foundation is provided for further building a deeper neural network.
Drawings
FIG. 1 is a flow chart of the method of the present application for identifying facial micro-expressions in actual operation;
FIG. 2 is a general structural overview of a facial microexpressive recognition process designed by the method of the present application;
FIG. 3 is a diagram of a facial microexpressive feature extraction network designed by the method of the application;
fig. 4 is a block diagram of a three-dimensional residual module in a feature extraction network of the method of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
An embodiment of the present application will be described first with reference to fig. 1.
Referring to fig. 1, the microexpressive recognition method based on the three-dimensional residual convolution neural network and the optical flow method mainly comprises the steps of firstly carrying out video framing, face alignment and clipping on an original microexpressive video, extracting a key frame sequence and extracting optical flow characteristics, further obtaining a three-channel optical flow image sequence, and inputting the three-channel optical flow image sequence into a constructed three-dimensional residual convolution neural network model, so that the recognition and classification of microexpressions in the video are realized. The method comprises the following steps:
step A: preprocessing an original micro-expression video. Preprocessing includes video framing, face alignment and cropping, and locating peak frames to extract key frame sequences. And step A, positioning a micro-expression peak value frame by adopting a frequency domain-based method, and extracting the peak value frame and 4 continuous images before and after the peak value frame to form a 9-frame micro-expression image key frame sequence.
The step A specifically comprises the following steps:
a1, selecting a micro expression data set. The application selects the micro expression video sequences of the SMIC and CASME II data sets for experiments. In order to alleviate the problem of emotion category imbalance between the employed datasets, each micro-expression video sample is re-labeled and mapped to three common expression tags, namely "Positive", "Negative" and "surrise", respectively. The emotion distribution of the dataset microexpressive sample is shown in table 1.
Table 1 data set microexpressive sample distribution
Emotion classification CASMEⅡ SMIC Total number of categories
Negative 88 70 158
Positive 32 51 83
Surprise 25 43 68
Total number of categories 145 164 309
A2, aligning and cutting the human face. First, two center points of an eye region are accurately positioned by an eye detector, and a starting position for describing a face shape by an active shape model Algorithm (ASM) is determined, and then the face shape position is iteratively fitted by the ASM to determine 68 contour coordinate points of the face. And taking the coordinates of the upper, lower, left and right terminals of the face outline to perform the face region clipping work.
Based on 68 face coordinate points, a local weighted average (LWM) transformation is performed on any frame sequence i to align the cropped face regions. Setting a conversion value of any coordinate (x, y) within the image frame as shown in formula (10):
wherein W is a weight value, D n Is the i-th control point (x i ,y i ) The length from the (n-1) th nearest control point located in the selected reference frame, S i (x, y) is a number of (x) i ,y i ) The calculated polynomial comprising n parameters is measured. By using LWM transforms, all images within a sequence can be aligned frame by frame。
The selected data set is subjected to face alignment and clipping operation, and then a video frame sequence with the resolution of 128×128×3 (3 is an RGB channel) is obtained through normalization processing.
A3, positioning peak frames to extract key frame sequences. In the CASME II micro-expression data set, the position information of the peak frame is provided, and the SMIC data set is not marked with the peak frame information, so that a three-dimensional fast Fourier transform (3D-FFT) based method can be adopted to position the micro-expression peak frame.
The specific process of peak frame positioning is to divide the video frame sequence according to a certain interval, divide the face area in the image frame into 6×6 blocks, and then combine with a sliding window with length of N to sequentially calculate the frequency values of 36 blocks in each frame interval by adopting three-dimensional fast fourier transform (3D FFT). The block is denoted as { b } i1 ,b i2 ,…,b i36 The frequency value of the j-th block of the i-th section is shown in formula (11):
where (x, y, z) denotes its position in the frequency domain, L b And W is b Respectively represent the j-th block b ij Height and width in the i-th interval, and j= {1,2, …,36}.
After the frequency domain signal is obtained, a high-pass filter is used for filtering the low-frequency signal, so that the influence of unchanged pixels in the video frame is reduced. High pass filterIs defined as formula (12), wherein D 0 Representing a threshold value:
filtering the frequency domain signal of the video block according to equation (13):
next, the frequency domain amplitude of 36 blocks of the ith video interval is determined according to equation (14)
Wherein A is i The frequency amplitude of the i-th frame section, i.e., the range of the face rapid motion of the i-th section, is represented. In the same way, the frequency information of all video frame intervals can be obtained, the peak value interval with the largest frequency amplitude is the highest intensity frame representing the rapid facial movement, and the middle frame of the interval is taken as the micro expression peak value frame.
In order to eliminate redundant interference information, after the peak value frame is positioned, a key frame sequence of 9 frames of micro-expression images formed by 4 frames of continuous images before and after the peak value frame is selected and used as the input of the subsequent steps.
And A4, expanding a data set. Because the quality of the data set has direct influence on the experimental result of the deep learning, the data set needs to be expanded under the condition of less sample data volume, and the application adopts an affine transformation strategy to enhance the data. The specific process is as follows: the acquired face image frames are respectively moved by 15 pixels leftwards, rightwards, upwards and downwards and are vertically folded, so that the number of data sets can be enlarged to 4 times of the original number.
A5, dividing the training set and the testing set. And dividing the training data set and the test data set by the microexpressive data set subjected to normalization, key frame extraction and data enhancement, wherein the specific dividing ratio is 8:2.
And B, carrying out graying treatment on the micro-expression key frame image sequence, extracting optical flow characteristics including optical flow horizontal components, optical flow vertical components and optical strain from the gray image sequence, and combining the extracted optical flow characteristics into a three-channel image sequence in a channel cascade mode to serve as input of a network model.
B1, graying treatment. And C, carrying out graying treatment on the micro-expression key frame image sequence obtained in the step A. And (3) graying the RGB three-channel picture sequence to obtain a gray picture sequence of one channel.
B2, extracting optical flow characteristics: and C, extracting optical flow characteristics of the one-channel gray scale picture sequence obtained in the step B1 by adopting a TVL1 energy functional with good noise robustness.
The optical flow method is a method for finding out the correspondence existing between the previous frame and the current frame by utilizing the change of pixels in an image sequence in a time domain and the correlation between adjacent frames, so as to calculate the object motion information between the adjacent frames. The instantaneous rate of change of gray scale at a particular coordinate point of a two-dimensional image plane is generally defined as an optical flow vector. The optical flow estimation is based on the luminance constant equation, as shown in equation (15):
wherein I (x, y, t) represents an image intensity function of a pixel point with a coordinate (x, y) at time t,representing spatially varying gradients, I t Representing the time-varying gradient. In the formula (16), p and q represent horizontal and vertical motion vectors, respectively. The optical flow horizontal component u and the optical flow vertical component v can be obtained by the expressions (17) and (18):
the optical strain can be approximated to the facial deformation intensity, and the optical strain ε can be further extracted by calculating the derivative of the optical flow as shown in equation (19):
in the formula, the diagonal terms (. Epsilon.) xxyy ) Is the normal strain component (ε) xyyx ) Is a shear strain component. Then, the optical strain value of each pixel is calculated by taking the sum of squares of the normal strain component and the shear strain component, thereby obtaining ε, as shown in formula (20):
each micro-expression sample comprises 9 frames of pictures, and after the operation of extracting the optical flow characteristics is carried out, each micro-expression sample can obtain 8 frames of horizontal optical flow sequences, 8 frames of vertical optical flow sequences and 8 frames of optical strain sequences.
B3, cascading channels to form input. And B2, forming a new three-channel micro-expression image sequence by using the optical flow horizontal component u, the optical flow vertical component v and the optical strain |epsilon| obtained in the step B2 in a channel cascade mode, and taking the new three-channel micro-expression image sequence as the input of a subsequent feature extraction network, wherein the size of each sample data is 8 multiplied by 128 multiplied by 3.
And C, improving the three-dimensional convolutional neural network by combining a residual error module, and inputting the optimized data set into the constructed three-dimensional residual error convolutional network to obtain the recognition and classification result of the micro-expression emotion.
And C1, improving a three-dimensional convolutional neural network model by combining a residual error module to construct a three-dimensional residual error convolutional neural network, wherein the specific structure of the three-dimensional residual error convolutional neural network by combining with the 3D ResNet network shown in fig. 3 comprises two 3D Conv modules, 3D Res modules, 3 Dropout layers, 2 Dense layers, 1 flattening layer, 1 batch regularization layer, 1 Relu activation function and 1 Softmax layer. The 3D Conv module includes 1 three-dimensional convolution layer, 1 batch regularization layer, 1 Relu activation function, and one three-dimensional max pooling layer. In connection with fig. 4, the 3D Res module adds the original input x on the basis of the original Relu function output F (x) for the input of the MaxPooling layer compared to the 3D Conv module.
In the three-dimensional residual convolution module, a direct connection edge is added to a nonlinear convolution layer, and a shortcut for connecting input and output is established, so that information is transmitted only through a main road of a network layer, and the problem that when the network layer is deeper, all identity mapping is difficult to fit is effectively solved.
And C2, wherein the two three-dimensional convolution modules are used for extracting shallow features of the input image sequence obtained in the step B, including feature extraction of a time domain and a space domain, and the sizes of convolution kernels are 3 multiplied by 3. Since the number of frames of the input image sequence is small, the pooling size of the largest pooling layer in both 3D Conv modules is set to 1×2×2 to preserve temporal features. The size of the convolution kernel in the first 3D Res module is 3 x 3, the filling mode adopted is same, and the pooling size is set to be 1 multiplied by 2. The size of the convolution kernel in the second and third 3D Res modules is 3 x 3 and the pooling size of the largest pooling layer is set to 2 x 2 to downsample the time domain features.
And C3, combining with FIG. 4, in the three-dimensional residual convolution module, by adding a direct connection edge to a nonlinear convolution layer, a shortcut for connecting input and output is established, and the problem that all identity mappings are difficult to fit when a network layer is deeper can be effectively solved.
C4, introducing a batch regularization layer (BN), and carrying out standardized processing on the input of each middle layer of the network to ensure that the output obeys normal distribution with the mean value of 0 and the variance of 1, thereby avoiding the problem of variable distribution deviation; introducing a Relu activation function, avoiding gradient saturation problem at the part with input x more than or equal to 0, and improving nonlinear fitting capacity; dropOut layer was introduced to reduce the number of intermediate features and the drop rate was set to 0.2.
And C5, carrying out emotion classification on the microexpressions by adopting a Softmax classifier, wherein a loss function uses cross entropy, and the calculation method is shown as a formula (21):
in the formula, y represents the true distribution,representing the network output distribution, n represents the total number of categories.
And C6, setting parameters of a 3D ResNet network model of the micro-expression feature extraction network as shown in table 2.
Table 23 d res net network model parameter settings
C7, the adopted performance evaluation indexes are Accuracy (Accuracy), unweighted F1 score (UF 1) and Unweighted Average Recall (UAR). The calculation formula of the evaluation index Acc is shown as a formula (22), the calculation formula of UF1 is shown as a formula (23), and the calculation formula of UAR is shown as a formula (26):
wherein, the liquid crystal display device comprises a liquid crystal display device,
wherein E represents the number of emotion categories divided, alpha represents the alpha-th video frame sequence, beta represents the beta-th experiment, precision α Representing the precision of the alpha-th video frame sequence, recall α Representing an alpha-th video frame sequenceIs a recall rate of (a).
In summary, the method firstly utilizes the image processing technology to process the original micro-expression video in combination with the optimization method, then combines the characteristics of the residual error module to improve the three-dimensional convolution neural network, and constructs the three-dimensional residual error convolution network to extract micro-expression characteristics. The optimized micro-expression video sequence can effectively improve the recognition rate of micro-expressions in video clips through the improved feature extraction network, can be applied to multiple fields of psychology, education, medical health, criminal investigation and the like, and has good practical value.
Compared with the prior art, the application has the remarkable advantages that: (1) Extracting optical flow characteristics of the key frame sequence as the input of the network model can effectively eliminate redundant information, and the recognition effect is superior to micro expression recognition based on appearance characteristics; (2) The micro expression characteristics of the time dimension and the space dimension can be synchronously learned by utilizing the three-dimensional convolution neural network to carry out micro expression recognition, the problem of network degradation and gradient explosion can be effectively solved by introducing the residual error module, and a foundation is provided for further building a deeper neural network.
It will be apparent to those skilled in the art that the techniques of embodiments of the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present application may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the service building apparatus and the service loading apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description in the method embodiments for the matters.
The embodiments of the present application described above do not limit the scope of the present application.

Claims (4)

1. The microexpressive recognition method based on the three-dimensional residual convolution neural network and the optical flow method is characterized by comprising the following steps of:
step A, pre-processing an original micro-expression video: preprocessing comprises video framing, face alignment and clipping, and peak frame positioning to extract key frame sequences;
step B, carrying out graying treatment on the key frame image sequence, and extracting optical flow characteristics from the gray frame image sequence to obtain a three-channel image sequence as the input of a network model;
and step C, combining a residual error module to improve the three-dimensional convolutional neural network, and obtaining recognition and classification results of the micro-expression emotion through feature extraction and analysis.
2. The method for identifying the microexpressive video based on the three-dimensional residual convolution network and the optical flow method according to claim 1, wherein the preprocessing of the original microexpressive video comprises the following steps:
positioning a micro-expression peak value frame by adopting a frequency domain-based method, and extracting the peak value frame and 4 continuous images before and after the peak value frame to form a 9-frame micro-expression image key frame sequence;
peak frame positioning is achieved by:
dividing a video frame sequence according to a preset interval, dividing a face region in an image frame into 6 multiplied by 6 blocks, combining a sliding window with the length of N, adopting three-dimensional fast Fourier transform in each frame interval in sequence, and calculating frequency values of 36 blocks by a 3D FFT; the block is denoted as { b } i1 ,b i2 ,…,b i36 The frequency value of the j-th block of the i-th section is shown in formula (1):
where (x, y, z) denotes its position in the frequency domain, L b And W is b Respectively represent the j-th block b ij In the first placei intervals, and j= {1,2, …,36};
after obtaining the frequency domain signal, filtering the low frequency signal by adopting a high-pass filter; high pass filterIs defined as formula (2), wherein D 0 Representing a threshold value:
filtering the frequency domain signal of the video block according to equation (3):
next, the frequency domain amplitude of 36 blocks of the ith video interval is determined according to equation (4)
Wherein A is i Representing the frequency amplitude of the ith frame interval, i.e. the range of rapid facial motion of the ith interval;
and obtaining frequency information of all video frame intervals, namely obtaining the highest intensity frame representing the rapid facial movement in the peak value interval with the largest frequency amplitude, and taking the middle frame of the peak value interval with the largest frequency amplitude as the micro expression peak value frame.
3. The method for identifying the microexpressive motion based on the three-dimensional residual convolution network and the optical flow method according to claim 1, wherein the steps of performing gray scale processing on the key frame image sequence, extracting optical flow characteristics from the gray scale image sequence, and obtaining the three-channel image sequence as the input of the network model include:
extracting optical flow characteristics from the gray picture sequence: the optical flow horizontal component u and the optical flow vertical component v are obtained by equations (5) and (6):
the optical strain ε is further extracted by calculating the derivative of the optical flow as shown in equation (7):
in the formula, the diagonal terms (. Epsilon.) xxyy ) Is the normal strain component (ε) xyyx ) Is a shear strain component;
then, the optical strain value of each pixel is calculated by taking the sum of squares of the normal strain component and the shear strain component, thereby obtaining |ε|, as shown in formula (8):
the optical flow horizontal component u, the optical flow vertical component v and the optical strain epsilon are formed into a new three-channel micro-expression image sequence in a channel cascade mode.
4. The method for identifying the microexpressive motion based on the three-dimensional residual convolution network and the optical flow method according to claim 1, wherein the method for identifying and classifying the microexpressive motion based on the three-dimensional residual convolution network is characterized by combining a residual module to improve the three-dimensional convolution neural network, and obtaining the identification and classification result of the microexpressive motion through feature extraction and analysis, and comprises the following steps:
firstly, a three-dimensional convolutional neural network model is improved by combining a residual error module, and a three-dimensional residual error convolutional neural network is constructed:
the 3D ResNet network comprises two 3D Conv modules, 3D Res modules, a 3-layer Dropout layer, a 2-layer Dense layer, a 1-layer flattening layer, a 1-layer batch regularization layer, 1 Relu activation function and 1 Softmax layer;
the 3D Conv module comprises 1 three-dimensional convolution layer, 1 batch regularization layer, 1 Relu activation function and one three-dimensional maximum pooling layer;
the 3D Res module adds an original input x on the basis of an original Relu function output F (x) aiming at the input of the MaxPooling layer;
in the three-dimensional residual convolution module, a shortcut for connecting input and output is established by adding a direct connection edge to a nonlinear convolution layer;
then, carrying out emotion classification on the micro-expressions by adopting a Softmax classifier, wherein a loss function uses cross entropy;
the method is shown as a formula (9):
in the formula, y represents the true distribution,representing network output distribution, n representing total category number;
and inputting the optimized micro-expression video into an improved three-dimensional convolutional neural network, and obtaining recognition and classification results of the micro-expression emotion through feature extraction and analysis.
CN202310808285.0A 2023-07-04 Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method Active CN116935465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310808285.0A CN116935465B (en) 2023-07-04 Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310808285.0A CN116935465B (en) 2023-07-04 Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method

Publications (2)

Publication Number Publication Date
CN116935465A true CN116935465A (en) 2023-10-24
CN116935465B CN116935465B (en) 2024-07-09

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456586A (en) * 2023-11-17 2024-01-26 江南大学 Micro expression recognition method, system, equipment and medium
CN117456586B (en) * 2023-11-17 2024-07-09 江南大学 Micro expression recognition method, system, equipment and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130300900A1 (en) * 2012-05-08 2013-11-14 Tomas Pfister Automated Recognition Algorithm For Detecting Facial Expressions
CN105590106A (en) * 2016-01-21 2016-05-18 合肥君达高科信息技术有限公司 Novel face 3D expression and action identification system
CN109389045A (en) * 2018-09-10 2019-02-26 广州杰赛科技股份有限公司 Micro- expression recognition method and device based on mixing space-time convolution model
CN110852271A (en) * 2019-11-12 2020-02-28 哈尔滨工程大学 Micro-expression recognition method based on peak frame and deep forest
WO2020103700A1 (en) * 2018-11-21 2020-05-28 腾讯科技(深圳)有限公司 Image recognition method based on micro facial expressions, apparatus and related device
CN112101306A (en) * 2020-11-10 2020-12-18 成都市谛视科技有限公司 Fine facial expression capturing method and device based on RGB image
CN112883896A (en) * 2021-03-10 2021-06-01 山东大学 Micro-expression detection method based on BERT network
CN113221639A (en) * 2021-04-01 2021-08-06 山东大学 Micro-expression recognition method for representative AU (AU) region extraction based on multitask learning
CN115937936A (en) * 2022-11-29 2023-04-07 西安理工大学 Micro-expression recognition method based on optical flow characteristics

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130300900A1 (en) * 2012-05-08 2013-11-14 Tomas Pfister Automated Recognition Algorithm For Detecting Facial Expressions
CN105590106A (en) * 2016-01-21 2016-05-18 合肥君达高科信息技术有限公司 Novel face 3D expression and action identification system
CN109389045A (en) * 2018-09-10 2019-02-26 广州杰赛科技股份有限公司 Micro- expression recognition method and device based on mixing space-time convolution model
WO2020103700A1 (en) * 2018-11-21 2020-05-28 腾讯科技(深圳)有限公司 Image recognition method based on micro facial expressions, apparatus and related device
CN110852271A (en) * 2019-11-12 2020-02-28 哈尔滨工程大学 Micro-expression recognition method based on peak frame and deep forest
CN112101306A (en) * 2020-11-10 2020-12-18 成都市谛视科技有限公司 Fine facial expression capturing method and device based on RGB image
CN112883896A (en) * 2021-03-10 2021-06-01 山东大学 Micro-expression detection method based on BERT network
CN113221639A (en) * 2021-04-01 2021-08-06 山东大学 Micro-expression recognition method for representative AU (AU) region extraction based on multitask learning
CN115937936A (en) * 2022-11-29 2023-04-07 西安理工大学 Micro-expression recognition method based on optical flow characteristics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾逸琪;关胜晓;: "一种基于隔离损失函数的人脸表情识别方法", 信息技术与网络安全, no. 06, 10 June 2018 (2018-06-10), pages 84 - 88 *
王信;汪友生;: "基于深度学习与传统机器学习的人脸表情识别综述", 应用科技, no. 01, 26 October 2017 (2017-10-26), pages 69 - 76 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456586A (en) * 2023-11-17 2024-01-26 江南大学 Micro expression recognition method, system, equipment and medium
CN117456586B (en) * 2023-11-17 2024-07-09 江南大学 Micro expression recognition method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN109389045B (en) Micro-expression identification method and device based on mixed space-time convolution model
CN111797683A (en) Video expression recognition method based on depth residual error attention network
CN104933414A (en) Living body face detection method based on WLD-TOP (Weber Local Descriptor-Three Orthogonal Planes)
CN112001241B (en) Micro-expression recognition method and system based on channel attention mechanism
CN111476178A (en) Micro-expression recognition method based on 2D-3D CNN
Zhao et al. Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection
CN113537008A (en) Micro-expression identification method based on adaptive motion amplification and convolutional neural network
CN107194314B (en) Face recognition method fusing fuzzy 2DPCA and fuzzy 2DLDA
CN113591763B (en) Classification recognition method and device for face shapes, storage medium and computer equipment
KR20190128933A (en) Emotion recognition apparatus and method based on spatiotemporal attention
Sulistianingsih et al. Classification of batik image using grey level co-occurrence matrix feature extraction and correlation based feature selection
CN112766145B (en) Method and device for identifying dynamic facial expressions of artificial neural network
CN103235943A (en) Principal component analysis-based (PCA-based) three-dimensional (3D) face recognition system
Karamizadeh et al. Race classification using gaussian-based weight K-nn algorithm for face recognition
CN116935465B (en) Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method
CN116935465A (en) Micro-expression recognition method based on three-dimensional residual convolution neural network and optical flow method
Zhang et al. No-reference image quality assessment using independent component analysis and convolutional neural network
Sang et al. MoNET: no-reference image quality assessment based on a multi-depth output network
Yang et al. Combining attention mechanism and dual-stream 3d convolutional neural network for micro-expression recognition
Al-Rawi et al. Feature Extraction of Human Facail Expressions Using Haar Wavelet and Neural network
CN118097360B (en) Image fusion method based on significant feature extraction and residual connection
Yin et al. Face Recognition System using Self-Organizing Feature Map and Appearance-Based Approach
Chihaoui et al. Implementation of skin color selection prior to Gabor filter and neural network to reduce execution time of face detection
Ahmed et al. Non-reference quality monitoring of digital images using gradient statistics and feedforward neural networks
CN109117867B (en) Multi-tested brain image prediction method based on gradient super-calibration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant