CN117197877A

CN117197877A - Micro-expression recognition method and system based on regional weighted optical flow characteristics

Info

Publication number: CN117197877A
Application number: CN202311464800.4A
Authority: CN
Inventors: 李新雨; 周元峰; 王宇; 窦文涵; 路骁; 张圆成
Original assignee: Shandong Institute Of Industrial Technology
Current assignee: Shandong Institute Of Industrial Technology
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2023-12-08
Anticipated expiration: 2043-11-07
Also published as: CN117197877B

Abstract

The invention belongs to the technical field of computer vision, and particularly relates to a micro-expression recognition method and a micro-expression recognition system based on regional weighted optical flow characteristics, which are used for acquiring an initial frame, a peak frame and a termination frame from a micro-expression video sequence to be recognized, and cutting the three frames of data to acquire an image only comprising a facial area; then, the face area is divided according to the movement characteristics of the face muscle, and the optical flow between the initial frame and the end frame relative peak frame is calculated respectively. And then carrying out weighting processing on the optical flow characteristics in different directions of each region, standardizing the weighted characteristic data to obtain a standardized weighted optical flow diagram, inputting the image into a double-flow convolutional neural network for characteristic extraction and classification, and finally outputting the recognition result of the micro expression corresponding to the current association. The micro-expression recognition method is high in recognition accuracy.

Description

Micro-expression recognition method and system based on regional weighted optical flow characteristics

Technical Field

The disclosure belongs to the technical field of computer vision, and particularly relates to a micro-expression recognition method and system based on regional weighted optical flow characteristics.

Background

The microexpressions are facial expressions with short duration and slight facial muscle changes, and have great potential application value in the fields of criminal investigation, clinical diagnosis, psychological treatment, emotion analysis, security protection and the like as microexpressions identification research of psychology and computer vision interdisciplinary. Although microexpressive recognition has a non-trivial application value, the research of recognition algorithms of the microexpressive recognition still has great challenges due to the characteristics of the microexpressions, and the difficulty of recognition can be summarized as follows: (1) The duration is short, typically between 1/25 second and 1/5 second; (2) facial muscle movement strength is low. When the micro-expression occurs, the change of facial muscles has the characteristic of locality, and the change amplitude is very slight and is very difficult to be perceived by naked eyes; (3) there are limited sets of data to be stored. The collection work of the micro-expressions has certain professional difficulty, so that the number of samples in the existing micro-expression data set is very limited, and the number of samples in different categories is also unbalanced.

The convolutional neural network is a deep learning method capable of automatically performing feature learning, and has wide application in the field of micro-expression problems in recent years, but if a complete micro-expression video sequence is directly input into the network to perform feature learning, the network can hardly learn effective feature data due to the characteristics of short micro-expression duration and low facial muscle movement intensity, so that preprocessing before feature learning is necessary, for example, optical flow data generated by facial deformation when micro-expression occurs is used as input of the network, and the learning of the neural network is facilitated. Because the change between each frame in the micro-expression sequence is very weak, if the optical flow data of each frame is input into the neural network, the number of data set samples is limited, and the data of different sample types are unbalanced, it is difficult to train a robust deep learning network model. Therefore, how to highlight the features of micro-expression changes and design an effective deep learning network model is still a very challenging task at present.

Disclosure of Invention

Aiming at the difficulty of micro-expression recognition, the disclosure provides a micro-expression recognition method based on region weighted optical flow characteristics, which only uses a start frame, a peak frame and a stop frame in a micro-expression sequence to calculate optical flow, and weights the optical flow characteristics in different directions of divided regions to emphasize the movement characteristics of facial muscles in the key direction, so as to highlight the change of micro-expressions, and designs a shallow double-flow convolutional neural network for extracting and classifying the micro-expression characteristics.

In a first aspect, the present disclosure proposes a micro-expression recognition method based on regional weighted optical flow features, including the following steps:

step 1: acquiring a micro-expression video to be identified, and extracting a start frame, a peak frame and a termination frame from the micro-expression video to be identified;

step 2: identifying the facial areas of the initial frame, the peak frame and the end frame, cutting the three frames of images to obtain images only containing the facial areas, and dividing the initial frame, the peak frame and the end frame only containing the facial areas according to the movement characteristics of facial muscles;

step 3: calculating optical flow components of peak frames relative to initial frames and peak frames relative to final frames by using an optical flow method, and calculating the ratio of the horizontal direction and the vertical direction of the optical flow for each divided region respectively, wherein any region of the optical flow diagramPixel dot +.>Can be expressed as:

，

wherein,、/>respectively represent area->Weights of all optical flow horizontal vectors and weights of vertical vectors, < >>、/>Representing both horizontal-direction optical flow vectors and vertical-direction optical flow vectors.

Step 4: the optical flow components are weighted according to the areas and subjected to standardization processing, so that an area weighted optical flow diagram capable of highlighting micro expression changes is obtained, and the used standardization formula is as follows:

，

wherein,t is the optical flow intensity threshold, in order to eliminate the effect of abnormal optical flow generated by objective conditions.

By means of the normalization process, on the one hand the effect of false optical flow due to the compensating illumination can be reduced, and on the other hand the optical flow is mapped into the pixel range.

Step 5: and (3) inputting the weighted light flow diagrams obtained by the peak value frame relative to the initial frame and the peak value frame relative to the end frame into a pre-trained double-flow convolutional neural network model, sequentially extracting and classifying the features, and finally outputting a microexpressive classification result corresponding to the current association.

In an exemplary embodiment, in step 2, before clipping, the method further includes: and identifying facial key points of a starting frame, a peak frame and a termination frame in the micro-expression video sequence by using a Dlib face key point detection algorithm, selecting two points on the nose of the starting frame as datum points, and carrying out face alignment on the peak frame and the termination frame by referring to the datum points.

Further, the number of the face key points is 68.

In an exemplary embodiment, in step 2, the clipping is performed in the following manner: the initial frame is taken as a reference image, the position and the size of a clipping region are determined according to the following mode, and the initial frame, the peak frame and the termination frame are clipped according to the position and the size, wherein the expression is as follows:

，

wherein,、/>the abscissa and ordinate of 68 key points are respectively represented.

In an exemplary embodiment, in step 2, three frames of images including only the face region are divided into three regions of eyes, nose and mouth together using the face key point.

In an exemplary embodiment, in step 5, the dual-flow convolutional neural network includes two branches, where the input of the first branch is a normalized region weighted optical flow graph calculated by a start frame and a peak frame, the input of the second branch is a normalized region weighted optical flow graph calculated by a stop frame and a peak frame, and the outputs of the two branch softmax layers are fused in an average calculation manner to obtain a probability value of each classification, so as to obtain a classification result.

In an exemplary embodiment, in step 5, the two branches each include a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a third pooling layer, a first fully-connected layer, a second fully-connected layer, and a softmax layer.

Further, the first convolution layer convolution kernel size is 9*9, the step length is 1, the second convolution layer convolution kernel size and the third convolution layer convolution kernel size are 5*5, the step length is 1, the windows of the pooling layers are 2 x 2, and the step length is 2.

In an exemplary embodiment, in step 5, the dual-flow convolutional neural network is a shallow dual-flow convolutional neural network.

In an exemplary embodiment, in step 5, the pre-trained dual-stream convolutional neural network model specifically includes:

constructing a double-flow convolutional neural network;

constructing a training set, wherein the training set comprises a micro-expression video of a known micro-expression classification label;

and inputting the training set into the double-flow convolutional neural network for training, and stopping training when the loss function reaches the minimum value to obtain a trained double-flow convolutional neural network model.

In the training stage, the difference between the prediction result of the model and the real label is measured by using the cross entropy loss function, and the calculation mode can be expressed as follows, assuming that there are N categories:

，

wherein,is the true class of the sampleLabel vector->Is the predicted output vector of the model. Model parameters are optimized by minimizing cross entropy loss, so that the prediction result of the model is closer to a real class label.

In the training stage, the two different parts of the main network and the feature joint module can perform joint learning through back propagation, so that the network model is jointly optimized, and the overall performance of the network is improved.

In a second aspect, the present disclosure provides a micro-expression recognition system based on region weighted optical flow features, comprising:

a data acquisition module configured to: acquiring a micro-expression video to be identified, and extracting a start frame, a peak frame and a termination frame from the micro-expression video to be identified;

a facial region extraction module configured to: identifying the facial areas of the initial frame, the peak frame and the end frame, cutting the three frames of images to obtain images only containing the facial areas, and dividing the initial frame, the peak frame and the end frame only containing the facial areas according to the movement characteristics of facial muscles;

an optical flow component extraction module configured to: calculating optical flow components of peak frames relative to initial frames and peak frames relative to final frames by using an optical flow method, and calculating the ratio of the horizontal direction and the vertical direction of the optical flow for each divided region respectively, wherein any region of the optical flow diagramIs->Can be expressed as:

，

wherein,、/>respectively represent area->The weighting values of all optical flow horizontal vectors and the weighting values of the vertical direction;

a weighting processing module configured to: the optical flow components are weighted according to the areas and subjected to standardization processing, so that an area weighted optical flow diagram capable of highlighting micro expression changes is obtained, and the used standardization formula is as follows:

，

wherein,t is an optical flow intensity threshold value, and aims to eliminate the influence of abnormal optical flow generated by objective conditions;

a feature extraction classification module configured to: and (3) inputting the weighted light flow diagrams obtained by the peak value frame relative to the initial frame and the peak value frame relative to the end frame into a pre-trained double-flow convolutional neural network, sequentially extracting and classifying the features, and finally outputting a microexpressive classification result corresponding to the current association.

Compared with the prior art, the beneficial effects of the present disclosure are:

1. aiming at the difficulty of micro-expression recognition, the method only adopts a start frame, a peak frame and a stop frame in a micro-expression video sequence, divides facial areas according to the movement characteristics of facial muscles, and acquires an area weighted light flow graph which can highlight the movement characteristics of the micro-expression by adopting an area weighting mode for each area;

2. the present disclosure designs a shallow dual-flow convolutional neural network using an anisotropic weighted optical flow graph as input data, and compared with a deep network, the shallow network can obtain a lightweight network model with better performance without data enhancement technology under the condition of less data quantity. In addition, by designing the double-flow convolution neural network, the two branches can be mutually referred in the characteristic learning process, so that the accuracy of model identification is improved.

Drawings

FIG. 1 is a flow chart of a method according to a first embodiment of the present disclosure;

FIG. 2 is a diagram of key labels according to a first embodiment of the present disclosure;

FIG. 3 is a motion area map of a first embodiment of the present disclosure;

fig. 4 is a network configuration diagram of a first embodiment of the present disclosure;

fig. 5 is a system block diagram of a second embodiment of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

Term interpretation section:

a start frame, which is a frame in which the observed person starts to appear micro-expressions;

peak frame, which is the frame with most obvious micro expression of the observed person;

terminating a frame, namely a frame in which the micro expression of the observed person disappears;

optical flow, which refers to the instantaneous speed of pixel motion of a spatially moving object on an observation imaging plane;

the optical flow method is a method for finding the corresponding relation between the previous frame and the current frame by utilizing the change of pixels in an image sequence in a time domain and the correlation between adjacent frames, so as to calculate the motion information of an object between the adjacent frames.

Example 1

The embodiment provides a micro-expression recognition method based on regional weighted optical flow characteristics, as shown in fig. 1, comprising the following steps:

for example, in the step 1, only the start frame, the peak frame and the end frame of the micro-expression video sequence are used, so that feature redundancy caused by the large similarity between adjacent frames can be reduced, and the extraction of the three frames of data can be divided into two cases:

(1) If the used data set provides the marked initial frame, peak frame and end frame, the data set can be directly used;

(2) If the used data set does not provide marked initial frames, peak frames or end frames, the first frame is selected as the initial frame by default, the last frame is selected as the end frame, the peak frames can be selected as the peak frames by calculating the optical flow of each frame relative to the initial frame and calculating the sum of the optical flow intensities of each frame, wherein the frame with the largest sum is selected as the peak frame.

in particular, in the step 2, interference of the background area may be removed by clipping operation, and in addition, a face alignment operation is required to be performed before clipping, so as to avoid deviation of the face position of the same sample face between different frames caused by clipping operation, thereby affecting calculation of subsequent optical flow, and the specific operations include:

(1) Detecting a start frame, a peak frame and a stop frame by using a Dlib face key point detection algorithm, wherein 68 key points are identified in each frame, as shown in fig. 2;

(2) Points 28 and 30 on the nose of the initial frame are selected as datum points, and the change of the two points during the micro expression generation period is negligible;

(3) The peak frame and the end frame are aligned with the reference point to avoid unnecessary errors caused by the movement of the face;

(4) The initial frame is taken as a reference image, the position and the size of a clipping region are determined according to the following mode, and the initial frame, the peak frame and the termination frame are clipped according to the position and the size, wherein the expression is as follows:

，

wherein,、/>the abscissa and the ordinate of 68 key points are respectively represented;

(5) And carrying out size normalization on the cut image, and uniformly setting the sizes of the sample images obtained through the operation to be the sizes of the first sample images.

In particular, with the face key points, three frames of images including only the face region are divided into three regions of eyes, nose, and mouth, as shown in fig. 3.

Specifically, the 28 th point on the nose is taken as a reference point, and a line parallel to the horizontal direction is made to pass through the pointAbove this horizontal line is the eye area.

In the same way, the center points of the 51 # point on the mouth and the 33 # point on the nose tip are taken as reference points, and a line parallel to the horizontal direction is made through the pointsThe mouth area is below the horizontal line.

Horizontal lineAnd horizontal line->The region contained therebetween is a nose region.

，

wherein,、/>the weighted values of all the optical flow horizontal vectors and the weighted values of the vertical direction in the region a are respectively represented.

Further, in the step 3, the optical flow refers to the change of the brightness of the pixels between two continuous frames, and the change can reflect the movement direction and speed between the pixels, while the optical flow method refers to the calculation of the movement information of the pixels in the period of time, including the movement direction and speed, through the change of the brightness of the pixels between two continuous frames. Specifically, a target point in an imageIts coordinates are +.>The brightness at time t is denoted +.>Through->After a time, the target point moves toThe luminance value of (2) is +.>According to the principle of constant brightness in the optical flow solving process, the method can be as follows:

，

the result after expansion and reduction using the first-order taylor formula is:

，

wherein,，/>，/>、/>respectively represent target points->Edge of the framexShaft and edgeyA variation vector of the axis.

Optical flow vector of point->Can be expressed as: />，

The calculation of the optical flow can use a TV-L1 method, and the method has better robustness.

，

Step 5: and (3) inputting the weighted light flow diagrams obtained by the peak value frame relative to the initial frame and the peak value frame relative to the final frame into a pre-trained shallow double-flow convolutional neural network model, sequentially carrying out feature extraction and feature fusion, and finally outputting a microexpressive classification result corresponding to the current association.

The shallow double-flow convolutional neural network comprises two branches, and as shown in fig. 4, the two branches comprise a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, a third pooling layer, a first full-connection layer, a second full-connection layer and a softmax layer.

The input of the first branch is a normalized area weighted optical flow diagram calculated by a start frame and a peak frame, the input of the second branch is a normalized area weighted optical flow diagram calculated by a stop frame and a peak frame, and the outputs of the two branch softmax layers are fused in an average calculation mode to obtain the probability value of each category, so that the classification result is obtained.

Specifically, the first convolution layer convolution kernel size is 9*9, the step length is 1, the second convolution layer convolution kernel size and the third convolution layer convolution kernel size are 5*5, the step length is 1, the windows of the pooling layers are 2 x 2, and the step length is 2.

As one or more embodiments, the pre-trained network model specifically training step includes:

constructing a double-flow convolutional neural network;

，

wherein,is the true class label vector of the sample, +.>Is the predicted output vector of the model. Model parameters are optimized by minimizing cross entropy loss, so that the prediction result of the model is closer to a real class label.

According to the micro-expression recognition method based on the regional weighted optical flow characteristics, experiments are carried out on a public dataset CASMEII, the accuracy and F1 are used as evaluation indexes, a mode of one-person remaining cross verification is adopted, and the experimental results are shown in the following table:

table 1 experimental results

。

The difference between the ablation experiment and the microexpressive recognition algorithm of the present disclosure is that the ablation experiment does not adopt region weighting and standardization processing, and only uses the optical flow intensity characteristic between the relative peak frames of the start frame and the end frame as the input data of the double-flow convolution network. The double-flow convolutional neural network can be used as a feature extractor to obtain better performance, and the performance of the model for identifying the micro-expression is obviously improved by dividing the facial movement area and carrying out weighting and standardization processing on the optical flow features in different directions.

Example two

There is provided a micro-expression recognition system based on regional weighted optical flow features, as shown in fig. 5, comprising:

an optical flow component extraction module configured to: calculating optical flow components of peak frames relative to initial frames and peak frames relative to final frames by using an optical flow method, and calculating the ratio of the horizontal direction and the vertical direction of the optical flow for each divided region respectively, wherein any region of the optical flow diagramPixel dot +.>Can be expressed as:

，

a feature extraction classification module configured to: and (3) inputting the weighted light flow diagrams obtained by the peak value frame relative to the initial frame and the peak value frame relative to the end frame into a pre-trained double-flow convolutional neural network, sequentially carrying out feature extraction and feature fusion, and finally outputting a microexpressive classification result corresponding to the current association.

It should be noted that the data acquisition module, the face area extraction module, the optical flow component extraction module, the weighting processing module, and the feature extraction and classification module correspond to steps 1 to 5 in the first embodiment, and the modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.

The proposed system may be implemented in other ways. For example, the system embodiments described above are merely exemplary, and the division of modules, such as those described above, is merely a logical function division, and other manners of division may be implemented in practice, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. The micro-expression recognition method based on the regional weighted optical flow characteristics is characterized by comprising the following steps of:

，

wherein,t is the optical flow intensity threshold;

2. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 1, wherein in step 2, before clipping, the method further comprises: and identifying facial key points of a starting frame, a peak frame and a termination frame in the micro-expression video sequence by using a Dlib face key point detection algorithm, selecting two points on the nose of the starting frame as datum points, and carrying out face alignment on the peak frame and the termination frame by referring to the datum points.

3. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 2, wherein the number of the facial key points is 68.

4. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 1, wherein in step 2, the clipping is performed in the following manner:

the initial frame is taken as a reference image, the position and the size of a clipping region are determined according to the following mode, and the initial frame, the peak frame and the termination frame are clipped according to the position and the size, wherein the expression is as follows:

，

5. The method according to claim 1, wherein in step 2, three frames of images including only the face region are divided into three regions of eyes, nose and mouth using the face key points.

6. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 1, wherein in step 5, the dual-flow convolutional neural network comprises two branches, the input of the first branch is a normalized regional weighted optical flow graph calculated by a start frame and a peak frame, the input of the second branch is a normalized regional weighted optical flow graph calculated by a stop frame and a peak frame, and the outputs of the two branch softmax layers are fused in an average calculation manner to obtain a probability value of each classification, thereby obtaining a classification result.

7. The method of claim 6, wherein in step 5, the two branches each include a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer, a third pooling layer, a first full-connection layer, a second full-connection layer, and a softmax layer.

8. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 1, wherein in step 5, the dual-flow convolutional neural network is a shallow dual-flow convolutional neural network.

9. The method for identifying micro-expressions based on regional weighted optical flow features according to claim 1, wherein in step 5, the pre-trained dual-flow convolutional neural network model specifically comprises the following steps:

constructing a double-flow convolutional neural network;

constructing a training set, wherein the training set comprises a microexpressive video of a known microexpressive classification label;

inputting the training set into a double-flow convolutional neural network for training, and stopping training when the loss function reaches the minimum value to obtain a trained double-flow convolutional neural network model;

，

wherein,is the true class label vector of the sample, +.>The model prediction output vector is obtained by optimizing model parameters by minimizing cross entropy loss, so that the model prediction result is closer to a real class label.

10. A micro-expression recognition system based on regional weighted optical flow features, comprising:

，

wherein,t is the optical flow intensity threshold;