CN115331289A - Micro-expression recognition method based on video motion amplification and optical flow characteristics - Google Patents

Micro-expression recognition method based on video motion amplification and optical flow characteristics Download PDF

Info

Publication number
CN115331289A
CN115331289A CN202210948759.7A CN202210948759A CN115331289A CN 115331289 A CN115331289 A CN 115331289A CN 202210948759 A CN202210948759 A CN 202210948759A CN 115331289 A CN115331289 A CN 115331289A
Authority
CN
China
Prior art keywords
optical flow
micro
channel
image frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210948759.7A
Other languages
Chinese (zh)
Inventor
赵明华
董爽爽
都双丽
胡静
李鹏
王琳
王理
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202210948759.7A priority Critical patent/CN115331289A/en
Publication of CN115331289A publication Critical patent/CN115331289A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a micro-expression recognition method based on video motion amplification and optical flow characteristics, which specifically comprises the following steps: selecting a data set and classifying according to emotion; preprocessing all original image frame sequences of the selected data set to obtain all single-channel gray-scale image sequences as a part of network model input; calculating the optical flow characteristics of all the obtained image frame sequences by adopting an RAFT network structure based on deep learning, and taking a visualized optical flow graph as the other part of the network model input; and superposing all single-channel gray-scale image sequences and all visual RGB optical flow image sequences into four-channel images, inputting the four-channel images into a designed VGG16 network, extracting spatial features of the micro-expression, and classifying the spatial features to obtain final identification precision. The method solves the key problems that the micro expression recognition method in the prior art is low in face motion intensity, short in duration and difficult in extracting the fine motion change of the face in the video frame.

Description

Micro-expression recognition method based on video motion amplification and optical flow characteristics
Technical Field
The invention belongs to the technical field of digital image processing and recognition, and particularly relates to a micro-expression recognition method based on video motion amplification and optical flow characteristics.
Background
In recent years, micro-expression recognition has important research value in the fields of criminal investigation lie detection, depression analysis and the like. However, since the micro-expressions themselves have the characteristics of small action amplitude and short duration, the micro-expressions are difficult to be identified manually. Even a professionally trained psychological researcher has only about 47% accuracy in identifying micro-expressions. Because the dependence on human eyes to identify micro-expression is limited by professional training and a large amount of time cost, and the identification accuracy is low, the large-scale popularization of micro-expression identification is seriously hindered. With the rapid development of computer vision and deep learning, more and more researchers apply the machine learning algorithm to micro-expression recognition, so that a lot of difficulties existing in manual recognition are solved, and the recognition accuracy is obviously improved. However, in view of the characteristics of micro-expressions, how to extract the slight motion changes of the human face in the video frame is still a key problem in the field. Therefore, micro expression recognition is still in a rapid development stage at present, and realization of micro expression recognition is gradually becoming an important research topic in the field of emotion calculation.
Disclosure of Invention
The invention aims to provide a micro-expression recognition method based on video motion amplification and optical flow characteristics, and solves the key problems that in the micro-expression recognition method in the prior art, the face motion intensity is low, the duration is short, and the extraction of the fine motion change of the face in a video frame is difficult.
The invention adopts the technical scheme that the micro-expression recognition method based on video motion amplification and optical flow characteristics is implemented according to the following steps:
step 1, selecting a data set and classifying according to emotion;
step 2, preprocessing all original image frame sequences of the selected data set to obtain all single-channel gray-scale image sequences as a part of network model input;
step 3, adopting a RAFT network structure based on deep learning to calculate the optical flow characteristics of all the image frame sequences obtained in the step 2 and taking a visual optical flow graph as the other part of the network model input;
and 4, superposing all single-channel gray-scale image sequences obtained in the step 2 and all visual RGB optical flow image sequences obtained in the step 3 into four-channel images, inputting the four-channel images into a designed VGG16 network, extracting spatial domain characteristics of the micro-expression, and classifying the spatial domain characteristics to obtain final identification precision.
The present invention is also characterized in that,
the step 2 is implemented according to the following steps:
step 2.1, amplifying fine facial muscle motion amplitudes in all original image frame sequences of the selected data set by adopting a learning-based video motion amplification method;
step 2.2, a model for detecting 68 key point information provided by the dlib library is used for realizing face alignment operation, a face area is obtained by cutting, and the resolution is uniformly adjusted to be 224 pixels multiplied by 224 pixels;
step 2.3, selecting a peak frame and 4 frames before and after the peak frame of each micro expression image sequence, and taking 9 frames of images as key frames in total to reduce the influence of redundant information in all image frame sequences obtained in the step 2.2 on identification;
and 2.4, carrying out graying treatment on all the image frame sequences obtained in the step 2.3 by utilizing a cv2.Imread () function to obtain a single-channel gray-scale image which is used as a part of network model input.
The step 2.1 is specifically implemented according to the following steps:
first, all adjacent frames (X) in all original image frame sequences are input t-1 ,X t ) By means of an encoder H e (. To obtain their respective shape characteristics (M) t-1 ,M t ) And texture feature (V) t-1 ,V t );
Then, the shape features (M) of the preceding and following frames are compared t-1 ,M t ) Sending the signal into an amplifier for action amplitude amplification; wherein the amplifier H m (. Cndot.) is expressed as:
H m (M t-1 ,M t ,α)=M t-1 +h(α×g(M t -M t-1 )) (1)
formula (1) where g (-) is represented by a 3 × 3 convolution followed by a ReLU activation function, and h (-) is a 3 × 3 convolution followed by a 3 × 3 residual block;
finally, the decoder reconstructs the changed shape information and the unchanged texture information to generate an enlarged image frame sequence.
In step 3, a "RAFT" network structure based on deep learning is adopted to calculate the optical flow characteristics of all the image frame sequences obtained in step 2.3, and the specific steps are as follows:
first, a feature encoder P θ Adjacent frames (T) of all image frame sequences resulting from step 2.3 1 ,T 2 ) Extracting optical flow characteristics pixel by pixel and outputting the optical flow characteristics at the resolution of 1/8, wherein the number of channels of an output characteristic map is D =256; at the same time, a context coder C is also included θ From T only 1 Extracting optical flow characteristics; feature encoder P θ And a context encoder C θ The RAFT feature extraction stage is formed together, and only needs to be executed once;
then, the image feature P obtained by feature extraction is given θ (T 1 ) And P θ (T 2 ) By pairing all pairs of feature vectors (P) θ (T 1 ),P θ (T 2 ) Performing dot product to obtain a complete correlation quantity Q for calculating the visual similarity, wherein the expression of the correlation quantity Q is as follows:
Figure BDA0003788564410000031
finally, the optical flow is iteratively updated with a gated loop element based loop update structure to generate a final visualization optical flow graph.
Step 4 is specifically implemented according to the following steps:
step 4.1, designing a VGG16 network model, wherein the designed VGG16 network model adopts 13 convolutional layers and 5 maximal pooling layers to extract features, and zero padding is used for filling feature edges before each convolutional layer; the last 3 layers of full connection layers are responsible for completing classification tasks, a dropout method is applied to the full connection layers, and the ratio of the dropout method is set to be 0.5, namely dropout =0.5;
step 4.2, the initial learning rate used by the VGG16 network model designed in the step 4.1 during training is 10 -5 Attenuation of 10 -6 Epoch is set to 100, and batch size is set to 3. After all the parameters are set, superposing all the single-channel gray-scale image sequences obtained in the step 2.4 and all the visual RGB optical flow image sequences obtained in the step 3 into a four-channel image sequence, inputting the four-channel image sequence into a designed VGG16 network model, extracting spatial features of the four-channel image sequence, and realizing emotion classification by utilizing softmax;
step 4.3, firstly, randomly dividing the obtained four-channel image sequence into two parts, wherein 80% of training sets and 20% of testing sets;
then, training the accuracy of the model by using a training set and testing the accuracy of the model by using a testing set, wherein the calculation method is shown in a formula (3) so as to verify the effectiveness of the model;
Figure BDA0003788564410000041
then, in order to reduce errors, a method of 10 groups of simple cross validation is adopted, samples are disturbed, a training set and a testing set are reselected, and models are trained and validated continuously; this was repeated 10 times to obtain the accuracy of 10 models and averaged to obtain the final accuracy of the model.
The invention has the beneficial effects that:
the method disclosed by the invention is used for processing the micro expression by combining LVMM and RAFT, and extracting the spatial domain characteristics of the micro expression by adopting a VGG16 network and classifying to obtain a micro expression recognition result. Meanwhile, in order to reduce the influence of redundant information in the sequence of the microexpressive image frames on identification, a key 9 frames of the microexpressive sequence are selected from a CASME II data set to be tested, and compared with other 7 mainstream methods. Experimental results show that the method obtains better performance, and the identification precision reaches 67.98%.
Drawings
FIG. 1 is a flow chart of an algorithm framework used in the micro-expression recognition method based on video motion amplification and optical flow characteristics according to the present invention;
FIG. 2 is a graph comparing the LVMM's effects with different magnification factors α in the method of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
The invention provides a micro-expression recognition method based on video motion amplification and optical flow characteristics, which is implemented according to the following steps as shown in figure 1:
step 1, selecting a data set, classifying the data set into 5 types (happy, dispost, surfrise, repression and others) according to emotions, using a public spontaneous micro-expression data set CASME II released by a psychological research institute of Chinese academy of sciences Fu Xiaolan team, and the division condition of the data set is shown in Table 1:
TABLE 1 CASME II dataset partitioning case
Figure BDA0003788564410000051
And 2, preprocessing all original image frame sequences of the selected data set. The method is implemented according to the following steps:
and 2.1, amplifying the fine facial muscle motion amplitude in all original image frame sequences of the selected data set by adopting a learning-based video motion amplification method (LVMM) to enhance visual characteristics. LVMM mainly comprises an encoder H e (. A) amplifier H m (. And decoding)H device d Three parts (v.). In the experimental process of zooming in with LVMM, first, all adjacent frames (X) in all original image frame sequences are input t-1 ,X t ) By means of an encoder H e (. To obtain their respective shape characteristics (M) t-1 ,M t ) And texture feature (V) t-1 ,V t ) (ii) a The obtained texture features are not subjected to motion amplification, but are mainly used for restraining noise caused by subsequent intensity amplification; then, the shape features (M) of the preceding and following frames are compared t-1 ,M t ) Sending into an amplifier for amplification of the operation range, amplifier H m (. Cndot.) can be expressed as:
H m (M t-1 ,M t ,α)=M t-1 +h(α×g(M t -M t-1 )) (1)
formula (1) where g (-) is represented by a 3 × 3 convolution followed by a ReLU activation function, and h (-) is a 3 × 3 convolution followed by a 3 × 3 residual block; finally, the decoder reconstructs the changed shape information and the unchanged texture information to generate an enlarged image frame sequence.
Through repeated experimental comparison, a reasonable amplification factor alpha =15 is finally selected. As shown in fig. 2, which is a result of using different magnification factors (α =5, α =10, α =15, α =20, α = 25) for a certain frame in all original image frame sequences, we find that, when α =15, the effect of enlarging the image frame does not affect the image quality.
Step 2.2, in order to reduce the influence of the non-face area in all the amplified image frame sequences obtained in the step 2.1 on micro expression recognition to the maximum extent, a model detected by 68 pieces of key point information provided by a dlib library is used for realizing face alignment operation, a face area is obtained by cutting, and the resolution is uniformly adjusted to 224 pixels × 224 pixels so that the input space dimension is matched with the VGG16 network model;
and 2.3, considering the change and subtlety of the facial motion in the micro-expression image sequence, the change between two continuous frames is hardly perceived. If all image frame sequences obtained in step 2.2 are directly input into the network model training, a large number of redundant features are included. Meanwhile, the shortest duration time of the micro expression is about 1/25 second, the frame rate of the samples in the CASME II data set is 200 frames/second, and the shortest micro expression duration frame sequence can be obtained by conversion and is 8 frames, so that the peak frame and the front and rear 4 frames of each micro expression image sequence are selected, and 9 frames of images are used as key frames, so that the influence of redundant information in all the image frame sequences obtained in the step 2.2 on identification is reduced;
and 2.4, carrying out graying treatment on all the image frame sequences obtained in the step 2.3 by utilizing a cv2.Imread () function to obtain a single-channel gray-scale image which is used as a part of network model input.
And 3, capturing representative motion characteristics between the micro-expression adjacent image frame sequences due to the optical flow. Higher signal-to-noise ratio can be obtained, and abundant and key input characteristics are provided for the network. Therefore, we use the deep learning based "RAFT" network structure for the first time to calculate the optical flow characteristics of all image frame sequences obtained in step 2.3 and obtain a visual optical flow graph as another part of the network model input. RAFT extraction of optical flow comprises three steps: first, a feature encoder P θ Adjacent frames (T) of all image frame sequences resulting from step 2.3 1 ,T 2 ) And extracting optical flow characteristics pixel by pixel and outputting the optical flow characteristics at the resolution of 1/8, wherein the number of channels of the output characteristic map is D =256. At the same time, a context coder C is also included θ From T only 1 And extracting optical flow characteristics. Feature encoder P θ And a context encoder C θ The RAFT feature extraction stage is formed together, and only needs to be executed once; then, the image feature P obtained by feature extraction is given θ (T 1 ) And P θ (T 2 ) By pairing all pairs of feature vectors (P) θ (T 1 ),P θ (T 2 ) To perform dot product to obtain a complete correlation quantity Q to calculate the visual similarity:
Figure BDA0003788564410000071
finally, the optical flow is iteratively updated using a gated loop unit (GRU) based loop update structure to generate a final visualization optical flow graph.
And 4, superposing all single-channel gray-scale image sequences obtained in the step 2.4 and all visual RGB optical flow image sequences obtained in the step 3 into four-channel images, inputting the four-channel images into a designed VGG16 network, extracting spatial features of the micro-expression, and classifying the spatial features to obtain final identification precision. The method is implemented according to the following steps:
and 4.1, through the three steps, all single-channel gray-scale image sequences obtained in the step 2.4 and all visual RGB optical flow image sequences obtained in the step 3 are already available. In order to complete the classification and identification work of the micro expressions, a VGG16 network model is designed. The VGG16 network is simple and neat, 13 convolutional layers and 5 maximum pooling layers are adopted to extract features, and in order to ensure that the size of an input image does not change, zero padding is used for filling feature edges before each convolutional layer. The last 3 layers of full connection layers are responsible for completing classification tasks, in order to reduce the overfitting phenomenon, a dropout method is applied to the full connection layers, the number of neurons can be randomly shielded according to set parameters, the generalization capability of a network model is improved, and meanwhile the training speed of the network is accelerated. Referring to the empirical values, we set their ratio to 0.5, i.e. dropout =0.5.
Step 4.2, the initial learning rate (lr) used by the designed VGG16 network model during training is 10 -5 Attenuation (decay) of 10 -6 Epoch is set to 100, and batch size is set to 3. And (3) after all the parameters are set, superposing all the single-channel gray-scale image sequences obtained in the step (2.4) and all the visual RGB optical flow image sequences obtained in the step (3) into a four-channel image sequence, inputting the four-channel image sequence into the VGG16 network model, extracting the spatial features of the four-channel image sequence, and realizing emotion classification by utilizing softmax.
And 4.3, the error phenomenon caused by data set division can be avoided to a certain extent due to the cross validation of the model. Therefore, our experiment used 10 sets of simple cross-validation to reduce this error. The method comprises the following specific operations: firstly, randomly dividing an obtained four-channel image sequence into two parts (80% of training set and 20% of testing set); then, training the accuracy of the model by using a training set and testing the accuracy of the model by using a testing set (the calculation method is shown in formula 3) so as to verify the effectiveness of the model; then, a method of 10 groups of simple cross validation is adopted, the sample is disturbed, the training set and the testing set are reselected, and the model is continuously trained and validated. As shown in table 2, the final recognition result is an average of 67.98% for 10 sets of experiments.
Figure BDA0003788564410000081
TABLE 2 10 training results
Figure BDA0003788564410000091
And (3) comparing and verifying experimental results: the average of 10 sets of experimental results (shown in Table 2) obtained in step 4.3 was compared with other existing methods, as shown in Table 3, including conventional methods LBP-TOP, STLBP-IP, bi-WOOF and deep learning methods ELRCN-SE, CNN + LSTM, CNNCapsNet, MSCNN. The result shows that the recognition precision of the method is improved by 3.35% compared with a suboptimal method CNNCapsNet, a better micro-expression recognition effect is obtained, and the method solves the key problems that the micro-expression recognition method in the prior art is low in face motion intensity, short in duration and difficult in extracting the fine motion change of the face in a video frame.
TABLE 3 comparison of the Performance of the method herein with the existing method on CASME II
Figure BDA0003788564410000092

Claims (5)

1. The micro-expression recognition method based on video motion amplification and optical flow features is characterized by comprising the following steps:
step 1, selecting a data set and classifying according to emotion;
step 2, preprocessing all original image frame sequences of the selected data set to obtain all single-channel grey-scale image sequences as a part of network model input;
step 3, adopting a RAFT network structure based on deep learning to calculate the optical flow characteristics of all the image frame sequences obtained in the step 2 and taking a visual optical flow graph as the other part of the network model input;
and 4, superposing all single-channel gray-scale image sequences obtained in the step 2 and all visual RGB optical flow image sequences obtained in the step 3 into four-channel images, inputting the four-channel images into a designed VGG16 network, extracting spatial domain characteristics of the micro-expression, and classifying the spatial domain characteristics to obtain final identification precision.
2. The micro expression recognition method based on video motion amplification and optical flow features as claimed in claim 1, wherein the step 2 is implemented by the following steps:
step 2.1, amplifying fine facial muscle motion amplitudes in all original image frame sequences of the selected data set by adopting a learning-based video motion amplification method;
step 2.2, the model of 68 key point information detection provided by the dlib library is used for realizing the face alignment operation, the face area is obtained by cutting, and the resolution is uniformly adjusted to 224 pixels × 224 pixels;
step 2.3, selecting a peak frame and 4 frames before and after the peak frame of each micro expression image sequence, and taking 9 frames of images as key frames so as to reduce the influence of redundant information in all the image frame sequences obtained in the step 2.2 on identification;
and 2.4, carrying out graying treatment on all the image frame sequences obtained in the step 2.3 by utilizing a cv2.Imread () function to obtain a single-channel gray-scale image which is used as a part of network model input.
3. The micro expression recognition method based on video motion amplification and optical flow features as claimed in claim 2, wherein the step 2.1 is implemented by the following steps:
first, all adjacent frames (X) in all original image frame sequences are input t-1 ,X t ) By an encoder H e (. To obtain their respective shape characteristics (M) t-1 ,M t ) And texture feature (V) t-1 ,V t );
Then, the shape features (M) of the preceding and following frames are compared t-1 ,M t ) Sending the signal into an amplifier for action amplitude amplification; wherein the amplifier H m (. Cndot.) is expressed as:
H m (M t-1 ,M t ,α)=M t-1 +h(α×g(M t -M t-1 )) (1)
formula (1) where g (-) is represented by a 3 × 3 convolution followed by a ReLU activation function, and h (-) is a 3 × 3 convolution followed by a 3 × 3 residual block;
finally, the decoder reconstructs the changed shape information and the unchanged texture information to generate an enlarged image frame sequence.
4. The micro-expression recognition method based on video motion amplification and optical flow features as claimed in claim 3, wherein in step 3, a "RAFT" network structure based on deep learning is adopted to calculate the optical flow features of all image frame sequences obtained in step 2.3, and the specific steps are as follows:
first, a feature encoder P θ Adjacent frames (T) of all image frame sequences resulting from step 2.3 1 ,T 2 ) Extracting optical flow characteristics pixel by pixel and outputting the optical flow characteristics at the resolution of 1/8, wherein the number of channels of an output characteristic map is D =256; also, a context encoder C is included θ From T only 1 Extracting optical flow characteristics; feature encoder P θ And a context encoder C θ The RAFT feature extraction stage is formed together, and only needs to be executed once;
then, the image feature P obtained by feature extraction is given θ (T 1 ) And P θ (T 2 ) By pairing all feature vectors (P) θ (T 1 ),P θ (T 2 ) Performing dot product to obtain a complete correlation quantity Q for calculating the visual similarity, wherein the expression of the correlation quantity Q is as follows:
Figure FDA0003788564400000031
finally, the optical flow is iteratively updated using a gated loop element based loop update structure to generate a final visualized optical flow graph.
5. The micro-expression recognition method based on video motion amplification and optical flow features as claimed in claim 4, wherein step 4 is implemented by the following steps:
step 4.1, designing a VGG16 network model, wherein the designed VGG16 network model adopts 13 convolutional layers and 5 maximal pooling layers to take charge of extracting features, and zero padding is used for filling feature edges before each convolutional layer; the last 3 layers of full connection layers are responsible for completing classification tasks, a dropout method is applied to the full connection layers, and the ratio of the dropout method is set to be 0.5, namely dropout =0.5;
step 4.2, the initial learning rate used by the VGG16 network model designed in the step 4.1 during training is 10 -5 Attenuation of 10 -6 Epoch set to 100, batch size set to 3; after all the parameters are set, superposing all the single-channel gray-scale image sequences obtained in the step 2.4 and all the visual RGB optical flow image sequences obtained in the step 3 into a four-channel image sequence, inputting the four-channel image sequence into a designed VGG16 network model, extracting the spatial characteristics of the four-channel image sequence, and realizing emotion classification by utilizing softmax;
step 4.3, firstly, randomly dividing the obtained four-channel image sequence into two parts, wherein 80% of training sets and 20% of testing sets;
then, training the accuracy of the model by using a training set and testing the accuracy of the model by using a testing set, wherein the calculation method is shown in a formula (3) so as to verify the effectiveness of the model;
Figure FDA0003788564400000032
then, a method of 10 groups of simple cross validation is adopted, the sample is disturbed, the training set and the test set are reselected, and the data and the validation model continue to be trained; this was repeated 10 times to obtain the accuracy of 10 models and averaged to obtain the final accuracy of the model.
CN202210948759.7A 2022-08-09 2022-08-09 Micro-expression recognition method based on video motion amplification and optical flow characteristics Pending CN115331289A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210948759.7A CN115331289A (en) 2022-08-09 2022-08-09 Micro-expression recognition method based on video motion amplification and optical flow characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210948759.7A CN115331289A (en) 2022-08-09 2022-08-09 Micro-expression recognition method based on video motion amplification and optical flow characteristics

Publications (1)

Publication Number Publication Date
CN115331289A true CN115331289A (en) 2022-11-11

Family

ID=83922004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210948759.7A Pending CN115331289A (en) 2022-08-09 2022-08-09 Micro-expression recognition method based on video motion amplification and optical flow characteristics

Country Status (1)

Country Link
CN (1) CN115331289A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766035A (en) * 2020-12-01 2021-05-07 华南理工大学 Bus-oriented system and method for recognizing violent behavior of passenger on driver

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766035A (en) * 2020-12-01 2021-05-07 华南理工大学 Bus-oriented system and method for recognizing violent behavior of passenger on driver
CN112766035B (en) * 2020-12-01 2023-06-23 华南理工大学 System and method for identifying violence behaviors of passengers on drivers facing buses

Similar Documents

Publication Publication Date Title
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
WO2023185243A1 (en) Expression recognition method based on attention-modulated contextual spatial information
CN110287805B (en) Micro-expression identification method and system based on three-stream convolutional neural network
CN110399821B (en) Customer satisfaction acquisition method based on facial expression recognition
CN109753950B (en) Dynamic facial expression recognition method
CN113869229B (en) Deep learning expression recognition method based on priori attention mechanism guidance
CN113537008B (en) Micro expression recognition method based on self-adaptive motion amplification and convolutional neural network
CN108416780A (en) A kind of object detection and matching process based on twin-area-of-interest pond model
CN112446891A (en) Medical image segmentation method based on U-Net network brain glioma
CN112150476A (en) Coronary artery sequence vessel segmentation method based on space-time discriminant feature learning
CN110428364B (en) Method and device for expanding Parkinson voiceprint spectrogram sample and computer storage medium
CN112580521B (en) Multi-feature true and false video detection method based on MAML (maximum likelihood markup language) element learning algorithm
CN114038037B (en) Expression label correction and identification method based on separable residual error attention network
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
Ahmed et al. Improve of contrast-distorted image quality assessment based on convolutional neural networks.
CN112668486A (en) Method, device and carrier for identifying facial expressions of pre-activated residual depth separable convolutional network
CN115331289A (en) Micro-expression recognition method based on video motion amplification and optical flow characteristics
Zhi et al. Micro-expression recognition with supervised contrastive learning
Jaymon et al. Real time emotion detection using deep learning
CN111626197B (en) Recognition method based on human behavior recognition network model
Mozaffari et al. Irisnet: Deep learning for automatic and real-time tongue contour tracking in ultrasound video data using peripheral vision
Yao [Retracted] Application of Higher Education Management in Colleges and Universities by Deep Learning
CN113963427B (en) Method and system for rapid in-vivo detection
Wang et al. Curiosity-driven salient object detection with fragment attention
CN115346259A (en) Multi-granularity academic emotion recognition method combined with context information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination