CN114202787A - Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism - Google Patents

Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism Download PDF

Info

Publication number
CN114202787A
CN114202787A CN202111421041.4A CN202111421041A CN114202787A CN 114202787 A CN114202787 A CN 114202787A CN 202111421041 A CN202111421041 A CN 202111421041A CN 114202787 A CN114202787 A CN 114202787A
Authority
CN
China
Prior art keywords
neural network
frame
attention mechanism
micro
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111421041.4A
Other languages
Chinese (zh)
Inventor
李俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202111421041.4A priority Critical patent/CN114202787A/en
Publication of CN114202787A publication Critical patent/CN114202787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism, which comprises four steps of micro-expression video picture preprocessing, rolling machine neural network characteristic extraction, two-dimensional attention mechanism weight calculation and recurrent neural network result prediction, has reasonable design and ingenious conception, is based on the idea that the recognition of each frame is integrated into the analysis of the whole video segment and result influence is generated among multiframes, thereby providing a multi-frame micro-expression emotion recognition algorithm based on deep learning and a two-dimensional attention mechanism, in the prediction stage, the feature vector extracted from each frame of image is used as the input of the recurrent neural network, the similarity relation between the frame feature and other frames is calculated through an attention mechanism, the features with high similarity are given higher weight, and the features of other frames are added for prediction, so that a more accurate recognition result can be obtained.

Description

Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism
Technical Field
The invention relates to the technical field of computer intelligent recognition, in particular to a multi-frame micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism.
Background
With the development of technology, in the field of computer information technology, the identification of micro expressions is a very challenging task, because the life cycle of an expression in nature is very short, and compared with common expressions, the micro expressions last for a very short time, usually less than one second; and secondly, the action expression range is low, the micro expression is different from the common expression, the micro expression is not easy to be perceived, the micro expression belongs to the behavior of people generated in an unconscious state, and the movement is difficult to disguise and cover, so that the micro expression has great significance to criminal investigation and security.
The whole system of the computer information technology needs feature modeling related to space dimension and time dimension dynamic, so that the complexity of the micro expression recognition task is large. The existing micro-expression recognition system firstly corrects a face image, then cuts the face image into a plurality of small image blocks, and finally obtains a micro-expression recognition result by utilizing a pre-established neural network model.
In the patent of micro-expression recognition method (application (patent) No. CN202110347510.6), a face detected is corrected and then cut into a plurality of blocks, and then a recognition result is directly obtained through a pre-established coiling machine neural network. However, in the prior art method, firstly, the connection between the whole face image is ignored, such as the expression of smile of a person, not only the mouth is smile, but also the face muscle is changed, and if the face image is divided into a plurality of pieces, the connection between a plurality of parts is difficult to capture. And secondly, the relation among a plurality of frames of pictures in the time dimension is ignored, if a person is excited to cry, the wrong judgment can be easily generated if only a single frame of picture is analyzed.
Disclosure of Invention
Aiming at the problems and defects of the prior art, the invention provides a multi-frame micro-expression emotion recognition method and system based on a deep learning and two-dimensional attention mechanism, aiming at solving the problem of poor recognition effect caused by the fact that the relation among multiple frames in a time dimension cannot be combined in micro-expression recognition.
The technical scheme of the invention is as follows:
a multiframe micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism comprises four steps of micro-expression video picture preprocessing, coiling machine neural network feature extraction, two-dimensional attention mechanism weight calculation and cyclic neural network result prediction, and specifically comprises the following steps:
s1, preprocessing a micro expression video picture, specifically comprising face registration of a human face, time dimension frame interpolation and micro expression action amplification;
s2, a convolutional neural network feature extraction step, wherein a feature extraction network is used for extracting features of a plurality of preprocessed micro-expression pictures by adopting a rolling machine neural network, wherein the feature extraction network is used for finding the significant features of slight muscle movement and slight change of the face, so that a feature vector for representing original information is obtained;
s3, a two-dimensional attention mechanism weight calculation step, wherein the similarity between the current frame feature and all frames of the whole video is calculated through a two-dimensional attention mechanism to obtain the attention weight;
and S4, a recurrent neural network result prediction step, wherein the gated recurrent neural network classifies the characteristics of each frame according to the attention weight obtained in the previous step to obtain a prediction result.
In the above technical solution, in the step of S1,
the face registration step of the human face specifically comprises the steps of selecting a first frame of a video segment as a reference, mapping feature points in the first frame to feature points of a standard image through a mapping function, and mapping all the subsequent frames through the same method to reduce difference between different human faces;
the time dimension frame inserting step is that the time length of the whole video is expanded in a time domain frame inserting mode, the whole video is regarded as a network, each frame represents a node in the network, adjacent frames in the video are also adjacent nodes in the network, a high-dimensional continuous curve is obtained in a network embedding mode, and the curve is sampled to obtain an image sequence after interpolation;
and the micro-expression action amplification step is to perform action amplification pretreatment on the face image by adopting a linear Euler video amplification method, obtain a plurality of frames of face images through the steps and uniformly amplify or reduce the face images to the same length and width dimensions.
In the technical scheme, in the step S2, the coiling machine neural network records the obtained three-dimensional characteristic vector matrix as F, wherein F belongs to RC*H*WC, H, W, wherein the unit is a single pixel, and the unit is the depth, height and width of the matrix respectively, and the feature vector F represents the information in the original picture; the convolutional neural network adopts the maximum pooling with 3 windows of 2x2, and the operation ensures that the length and width of the extracted features are reduced when the features of the pictures are extracted, and the complexity of the model is reduced. Convolutional neural networks adopt the idea of residual blocks in ResNet.
As a preferred aspect of the above technical solution, the parameters of the convolutional neural network are:
Figure BDA0003377406670000031
in the above technical solution, in the step S3, the two-dimensional attention mechanism is based on the prediction in the step S4, and the upper output result of each step of the threshold recurrent neural network is used as the input of the two-dimensional attention mechanism; the two-dimensional attention mechanism calculation method specifically comprises the steps of defining that F' contains all information of a single-frame input picture, counting the total frame number of the whole video as T, obtaining T three-dimensional characteristic vector matrixes F after T face pictures enter a neural network of a rolling machine, splicing the T three-dimensional characteristic vector matrixes F in a long dimension to form a larger three-dimensional characteristic vector matrix meter as H, wherein H belongs to RC*H*(T*W)The depth and width are the same as F, and the length is T times of F.
In the above technical solution, the two-dimensional attention mechanism is that the input end includes two parts: the first is the upper output of each step of the gated recurrent neural network, and a feature map with the same dimensionality as H is obtained by performing convolution of 1x1 and then copying the spatial dimensionality;
and the other is a three-dimensional characteristic vector matrix H extracted from the whole video, the characteristics of the two vector matrixes are subjected to matrix addition, Tanh operation is performed, finally, attention weight matrix alpha is obtained through softmax operation, the attention weight matrix and the H are subjected to matrix dot multiplication and summation to obtain a matrix, the matrix and an input matrix are spliced in the depth dimension, and finally, an output result is obtained through a full-connection layer.
In the above technical solution, in step S4, the result of the whole gated recurrent neural network includes T gated neural network units, each gated recurrent neural network unit has two input ends at the left and below, and two output ends at the right and above; the left side input is a feature vector G, the feature vector H is subjected to maximum pooling, the dimensionality is 1 × C, and C is the depth of the recurrent neural network; a vector matrix F1 is obtained by extracting a first frame input from the lower part through a coiling machine neural network; the lower side input of the second gated cyclic neural network is a vector matrix F2 extracted by the coiling machine neural network of the first frame, and the left side input is the right side output of the first gated cyclic neural network; by analogy, the whole gating sequence decoding prediction consists of the T small units; the upper side output of the recurrent neural network unit is not only the right side input of the next unit, but also the input of the attention mechanism module, and the prediction result of the frame is obtained after the input of the attention mechanism module.
By adopting the scheme, the invention provides the multi-frame micro-expression emotion recognition method based on the deep learning and two-dimensional attention mechanism, the design is reasonable, the conception is ingenious, based on the idea that the recognition of each frame is integrated into the analysis of the whole video clip and the result influence is generated among multiple frames, the multi-frame micro-expression emotion recognition algorithm based on the deep learning and two-dimensional attention mechanism is further provided, the feature vector extracted from the image of each frame is used as the input of a recurrent neural network in the prediction stage, the similarity relation between the frame feature and other frames is calculated through the attention mechanism, the feature with high similarity is given higher weight, and the features of other frames are added for prediction together, so that a more accurate recognition result can be obtained.
Drawings
Fig. 1 is a schematic flow chart illustrating steps of a multi-frame micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism.
Fig. 2 is a schematic structural diagram of a threshold recurrent neural network unit of a multi-frame micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism.
Fig. 3 is a schematic diagram of a two-dimensional attention mechanism of a multi-frame micro-expression emotion recognition method based on deep learning and the two-dimensional attention mechanism.
Fig. 4 is a schematic diagram of a sequence prediction structure of a gated recurrent neural network of a multi-frame micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism.
Detailed Description
In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
As shown in fig. 1, the multi-frame micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism includes the following steps:
(1) preprocessing the micro-expression picture:
after the whole video is subjected to face detection and cutting, three steps of face registration, time dimension frame insertion and micro-expression action amplification of a face are required to be sequentially carried out.
Because the positions of the faces in the video, the head deviation and other actions make the difference between the detected pictures larger, and the distribution of facial features between different people has difference, the face registration based on key points needs to be firstly carried out on the faces, so that the micro-expression pictures of the multi-frame images have the same reference. The method comprises the steps of selecting a first frame of a video clip as a reference, mapping the characteristic points of the first frame to the characteristic points of a standard image through a mapping function, and mapping all the subsequent frames through the same method, so that the difference among different human faces can be reduced.
Because the duration of the micro expression is extremely short, the time length of the whole video needs to be extended in a time domain frame interpolation mode, the whole video is regarded as a network, each frame represents a node in the network, adjacent frames in the video are also adjacent nodes in the network, a high-dimensional continuous curve is obtained in a network embedding mode, and the curve is sampled to obtain an interpolated image sequence.
Because the action amplitude of the micro expression action is small, action amplification processing needs to be carried out on the face image, and a linear Euler video amplification method is adopted for preprocessing. Through the steps, a plurality of frames of face images can be obtained, and the face images are uniformly amplified or reduced to the same length and width dimensions.
(2) Extracting the characteristic of the rolling machine neural network:
and performing feature extraction on the preprocessed picture through a specially designed convolutional neural network module. The parameters of the whole coiling machine neural network are shown in table 1, and the obtained three-dimensional characteristic vector matrix is recorded as F, wherein F belongs to RC*H*WWhere C, H, W represent the depth, height and width of the matrix, respectively, in units of a single pixel, i.e. the feature vector F can be considered to represent the information in the original picture. The maximum pooling with 3 windows of 2x2 is adopted in the specially designed convolutional neural network, so that the operation ensures that the length and width of the features obtained by extracting the picture are reduced during the extraction of the picture features, and the complexity of a model is reduced. The convolutional neural network design adopts the idea of a residual block in ResNet, and the stability of the feature extraction network is also ensured.
Figure BDA0003377406670000071
Table 1 feature extraction module operating parameter list
(3) Two-dimensional attention mechanism weight calculation:
f contains all information of single-frame input pictures, the total frame number of the whole video is counted as T, and T human face pictures are obtained after entering a neural network of a rolling machineA three-dimensional characteristic vector matrix F is spliced into a larger three-dimensional characteristic vector matrix in a long dimension and is counted as H, the H belongs to RC*H*(T*W)The depth and width are the same as F, and the length is T times of F.
The two-dimensional attention mechanism is based on the fourth step prediction, and the upper output result of each step of the gated recurrent neural network is used as the input of the two-dimensional attention mechanism, and the structural diagram of the threshold recurrent neural network unit is shown in fig. 2.
The two-dimensional attention mechanism is schematically shown in fig. 3, and the input comprises two parts: the first is the upper output of each step of the gated recurrent neural network, and a feature map with the same dimensionality as H is obtained by performing convolution of 1x1 and then copying the spatial dimensionality; and the other is a three-dimensional characteristic vector matrix H extracted from the whole video, the characteristics of the two vector matrixes are subjected to matrix addition, Tanh operation is performed, finally, attention weight matrix alpha is obtained through softmax operation, the attention weight matrix and the H are subjected to matrix dot multiplication and summation to obtain a matrix, the matrix and an input matrix are spliced in the depth dimension, and finally, an output result is obtained through a full-connection layer.
(4) And (3) predicting a recurrent neural network result:
the whole gated recurrent neural network result comprises T gated neural network units, and each gated recurrent neural network unit has two inputs on the left side and the lower side and two outputs on the right side and the upper side, which is schematically shown in FIG. 2.
In this patent application, the first gated recurrent neural network element will have two inputs, the left input being the eigenvector G, G being the eigenvector H undergoing maximum pooling, the dimension being 1 × C, and C being the recurrent neural network depth. A vector matrix F1 is obtained by extracting a first frame input from the lower part through a coiling machine neural network; the lower side input of the second gated recurrent neural network is the vector matrix F2 extracted by the rolling machine neural network of the first frame, and the left side input is the right side output of the first gated recurrent neural network. By analogy, the whole gating sequence decoding prediction consists of such T small units. And simultaneously, the result of the upper side output is recorded, the upper side output of the recurrent neural network unit is not only the right side input of the next unit, but also the input of the attention mechanism module, and the prediction result of the frame is obtained after the input is processed by the attention mechanism module. The entire gated cyclic neural network sequence is represented in figure 4.
Example 1
The following is a specific embodiment of the present invention:
the invention provides a method and a system for identifying multi-frame micro-expression emotions based on deep learning and a two-dimensional attention mechanism, which comprises the following specific processes:
(1) preprocessing the micro-expression picture:
the input to the explicit model is first a succession of video frames, each of which has a facial expression. In the first step, the face needs to be subjected to face registration based on key points, so that the multiple frames of image micro-expression pictures have the same reference. The method is characterized in that a first frame of a video segment is selected as a reference, feature points in the first frame are mapped to feature points of a standard image through a mapping function, all the subsequent frames are mapped through the same method, the difference between different human faces can be reduced, and the human face feature points and the facial feature calibration are identified through a dlib library of python.
Because the duration of the micro expression is extremely short, the time length of the whole video needs to be extended in a time domain frame interpolation mode, the whole video is regarded as a network, each frame represents a node in the network, adjacent frames in the video are also adjacent nodes in the network, a high-dimensional continuous curve is obtained in a network embedding mode, and the curve is sampled to obtain an interpolated image sequence.
Because the action amplitude of the micro expression action is small, action amplification processing needs to be carried out on the face image, and a linear Euler video amplification method is adopted for preprocessing. Through the steps, a plurality of frames of face images can be obtained, and in the case, the images are uniformly enlarged or reduced to the size of 32 × 3 pixels in length and width.
(2) Extracting the characteristic of the rolling machine neural network:
and performing feature extraction on the preprocessed picture through a specially designed convolutional neural network module. Machine for finishingThe parameters of the neural network of the coil winding machine are shown in table 1, and the obtained three-dimensional characteristic vector matrix is recorded as F, wherein F belongs to RC*H*WWhere C, H, W represents the depth, height and width of the matrix, respectively, and the unit is a single pixel, in the case where C is 128, H is 4 and W is 4, the feature vector F can be considered to represent the information in the original picture. The maximum pooling of 3 windows 2x2 is adopted in the specially designed convolutional neural network, so that the operation ensures that the length and the width of the features obtained by image extraction are reduced during image feature extraction, the complexity of the model is reduced, and the original 32 x 3 is changed into a 4 x 128 matrix. The convolutional neural network design adopts the idea of a residual block in ResNet, and the stability of the feature extraction network is also ensured.
(3) Two-dimensional attention mechanism weight calculation:
the two-dimensional attention mechanism, the input, consists of two parts: the first is the upper output of each step of the gated recurrent neural network, and a characteristic diagram with the same dimension as H is obtained by performing convolution with 1x1 and then copying the spatial dimension, wherein the size is 4 x 128; and the other is a three-dimensional characteristic vector matrix H extracted from the whole video, the characteristics of the two vector matrixes are subjected to matrix addition, Tanh operation is performed, finally, attention weight matrix alpha is obtained through softmax operation, the attention weight matrix and the H are subjected to matrix dot multiplication and summation to obtain a matrix and an input matrix, the input matrix is spliced in the depth dimension, and finally, an output result is obtained through a full-connection layer, and the obtained result is the result after characteristic enhancement is performed through a two-dimensional attention mechanism.
(4) And (3) predicting a recurrent neural network result: in this patent, the first gated recurrent neural network unit will have two inputs, the left input is the eigenvector G, G is the eigenvector H that has undergone the largest pooling, the dimension is 1 × C, C is the recurrent neural network depth, and in the case, C is 128. A vector matrix F1 is obtained by extracting a first frame input from the lower part through a coiling machine neural network; the lower side input of the second gated recurrent neural network is the vector matrix F2 extracted by the rolling machine neural network of the first frame, and the left side input is the right side output of the first gated recurrent neural network. By analogy, the whole gating sequence decoding prediction consists of such T small units. And simultaneously, the result of the upper side output is recorded, the upper side output of the recurrent neural network unit is not only the right side input of the next unit, but also the input of the attention mechanism module, and the prediction result of the frame is obtained after the input is processed by the attention mechanism module.
The technical features mentioned above are combined with each other to form various embodiments which are not listed above, and all of them are regarded as the scope of the present invention described in the specification; also, modifications and variations may be suggested to those skilled in the art in light of the above teachings, and it is intended to cover all such modifications and variations as fall within the true spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A multiframe micro-expression emotion recognition method based on deep learning and a two-dimensional attention mechanism is characterized by comprising four steps of micro-expression video picture preprocessing, rolling machine neural network feature extraction, two-dimensional attention mechanism weight calculation and recurrent neural network result prediction, and specifically comprises the following steps:
s1, preprocessing a micro expression video picture, specifically comprising face registration of a human face, time dimension frame interpolation and micro expression action amplification;
s2, a convolutional neural network feature extraction step, wherein a feature extraction network is used for extracting features of a plurality of preprocessed micro-expression pictures by adopting a rolling machine neural network, wherein the feature extraction network is used for finding the significant features of slight muscle movement and slight change of the face, so that a feature vector for representing original information is obtained;
s3, a two-dimensional attention mechanism weight calculation step, wherein the similarity between the current frame feature and all frames of the whole video is calculated through a two-dimensional attention mechanism to obtain the attention weight;
and S4, a recurrent neural network result prediction step, wherein the gated recurrent neural network classifies the characteristics of each frame according to the attention weight obtained in the previous step to obtain a prediction result.
2. The method for recognizing the multi-frame micro-expression emotion based on deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein, in the step of S1,
the face registration step of the human face specifically comprises the steps of selecting a first frame of a video segment as a reference, mapping feature points in the first frame to feature points of a standard image through a mapping function, and mapping all the subsequent frames through the same method to reduce difference between different human faces;
the time dimension frame inserting step is that the time length of the whole video is expanded in a time domain frame inserting mode, the whole video is regarded as a network, each frame represents a node in the network, adjacent frames in the video are also adjacent nodes in the network, a high-dimensional continuous curve is obtained in a network embedding mode, and the curve is sampled to obtain an image sequence after interpolation;
and the micro-expression action amplification step is to perform action amplification pretreatment on the face image by adopting a linear Euler video amplification method, obtain a plurality of frames of face images through the steps and uniformly amplify or reduce the face images to the same length and width dimensions.
3. The multiframe microexpression emotion recognition method based on deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein in the step of S2, the coiling machine neural network records the obtained three-dimensional feature vector matrix as F, wherein F belongs to RC*H*WC, H, W, wherein the unit is a single pixel, and the unit is the depth, height and width of the matrix respectively, and the feature vector F represents the information in the original picture; maximum pooling of 3 windows of 2x2 was used in the convolutional neural network.
4. The method for recognizing the multi-frame micro-expression emotion based on the deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein the parameters of the convolutional neural network are as follows:
Figure FDA0003377406660000021
5. according to claimThe method for recognizing the multi-frame micro-expression emotion based on the deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein in the step S3, the two-dimensional attention mechanism is based on the prediction in the step S4, and the output result above each step of the threshold recurrent neural network is used as the input of the two-dimensional attention mechanism; the two-dimensional attention mechanism calculation method specifically comprises the steps of defining that F' contains all information of a single-frame input picture, counting the total frame number of the whole video as T, obtaining T three-dimensional characteristic vector matrixes F after T face pictures enter a neural network of a rolling machine, splicing the T three-dimensional characteristic vector matrixes F in a long dimension to form a larger three-dimensional characteristic vector matrix meter as H, wherein H belongs to RC*H*(T*W)The depth and width are the same as F, and the length is T times of F.
6. The method for recognizing the multi-frame micro-expression emotion based on the deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein the two-dimensional attention mechanism is characterized in that the input end comprises two parts: the first is the upper output of each step of the gated recurrent neural network, and a feature map with the same dimensionality as H is obtained by performing convolution of 1x1 and then copying the spatial dimensionality;
and the other is a three-dimensional characteristic vector matrix H extracted from the whole video, the characteristics of the two vector matrixes are subjected to matrix addition, Tanh operation is performed, finally, attention weight matrix alpha is obtained through softmax operation, the attention weight matrix and the H are subjected to matrix dot multiplication and summation to obtain a matrix, the matrix and an input matrix are spliced in the depth dimension, and finally, an output result is obtained through a full-connection layer.
7. The method for recognizing the multi-frame micro-expression emotion based on the deep learning and two-dimensional attention mechanism as claimed in claim 1, wherein in step S4, the whole gated recurrent neural network result includes T gated neural network units, each gated recurrent neural network unit has two input ends at the left side and the lower side, and two output ends at the right side and the upper side; the left side input is a feature vector G, the feature vector H is subjected to maximum pooling, the dimensionality is 1 × C, and C is the depth of the recurrent neural network; a vector matrix F1 is obtained by extracting a first frame input from the lower part through a coiling machine neural network; the lower side input of the second gated cyclic neural network is a vector matrix F2 extracted by the coiling machine neural network of the first frame, and the left side input is the right side output of the first gated cyclic neural network; by analogy, the whole gating sequence decoding prediction consists of the T small units; the upper side output of the recurrent neural network unit is not only the right side input of the next unit, but also the input of the attention mechanism module, and the prediction result of the frame is obtained after the input of the attention mechanism module.
CN202111421041.4A 2021-11-26 2021-11-26 Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism Pending CN114202787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111421041.4A CN114202787A (en) 2021-11-26 2021-11-26 Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111421041.4A CN114202787A (en) 2021-11-26 2021-11-26 Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism

Publications (1)

Publication Number Publication Date
CN114202787A true CN114202787A (en) 2022-03-18

Family

ID=80649135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111421041.4A Pending CN114202787A (en) 2021-11-26 2021-11-26 Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism

Country Status (1)

Country Link
CN (1) CN114202787A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842539A (en) * 2022-05-30 2022-08-02 山东大学 Micro-expression discovery method and system based on attention mechanism and one-dimensional convolution sliding window
CN115375665A (en) * 2022-08-31 2022-11-22 河南大学 Early Alzheimer disease development prediction method based on deep learning strategy

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842539A (en) * 2022-05-30 2022-08-02 山东大学 Micro-expression discovery method and system based on attention mechanism and one-dimensional convolution sliding window
CN114842539B (en) * 2022-05-30 2023-04-07 山东大学 Micro-expression discovery method and system based on attention mechanism and one-dimensional convolution sliding window
CN115375665A (en) * 2022-08-31 2022-11-22 河南大学 Early Alzheimer disease development prediction method based on deep learning strategy
CN115375665B (en) * 2022-08-31 2024-04-16 河南大学 Advanced learning strategy-based early Alzheimer disease development prediction method

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
Wang et al. Adaptive fusion for RGB-D salient object detection
CN113158723B (en) End-to-end video motion detection positioning system
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN114202787A (en) Multiframe micro-expression emotion recognition method based on deep learning and two-dimensional attention mechanism
CN113642634A (en) Shadow detection method based on mixed attention
CN108921032B (en) Novel video semantic extraction method based on deep learning model
CN109063626B (en) Dynamic face recognition method and device
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN111488805B (en) Video behavior recognition method based on salient feature extraction
CN111738054A (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN111160356A (en) Image segmentation and classification method and device
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN111507138A (en) Image recognition method and device, computer equipment and storage medium
CN116012930B (en) Dimension expression recognition method based on deep learning convolutional neural network
CN112446348A (en) Behavior identification method based on characteristic spectrum flow
CN108376234B (en) Emotion recognition system and method for video image
Diyasa et al. Multi-face Recognition for the Detection of Prisoners in Jail using a Modified Cascade Classifier and CNN
CN113076905B (en) Emotion recognition method based on context interaction relation
CN110852271A (en) Micro-expression recognition method based on peak frame and deep forest
CN112528077B (en) Video face retrieval method and system based on video embedding
CN111310516A (en) Behavior identification method and device
CN111046213B (en) Knowledge base construction method based on image recognition
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN116403152A (en) Crowd density estimation method based on spatial context learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination