CN114005157B - Micro-expression recognition method for pixel displacement vector based on convolutional neural network - Google Patents
Micro-expression recognition method for pixel displacement vector based on convolutional neural network Download PDFInfo
- Publication number
- CN114005157B CN114005157B CN202111204917.XA CN202111204917A CN114005157B CN 114005157 B CN114005157 B CN 114005157B CN 202111204917 A CN202111204917 A CN 202111204917A CN 114005157 B CN114005157 B CN 114005157B
- Authority
- CN
- China
- Prior art keywords
- displacement vector
- image
- pixel displacement
- maximum frame
- frame image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006073 displacement reaction Methods 0.000 title claims abstract description 123
- 239000013598 vector Substances 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000014509 gene expression Effects 0.000 description 16
- 230000003287 optical effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 238000010606 normalization Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 229940060587 alpha e Drugs 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a pixel displacement vector micro-expression recognition method based on a convolutional neural network, which comprises the steps of establishing an end-to-end micro-expression recognition network based on a pixel displacement generation module, and processing flow based on the micro-expression recognition network: selecting a maximum frame, wherein a certain frame before and after the original maximum frame is randomly selected as a maximum frame image in the training process; inputting the selected maximum frame image and the initial frame image into a pixel displacement generating module together, and outputting a pixel displacement vector feature map between the two images; calculating a correlation loss function, which comprises the steps of firstly sampling the generated displacement vector feature image to obtain a displacement feature image, then sampling to generate an approximate maximum frame image, and calculating reconstruction loss and regular loss; normalizing operation, including normalizing the generated pixel displacement vector feature map; and performing feature learning and microexpressive classification, namely connecting the largest frame image with the normalized pixel displacement vector feature image, and inputting the connected image into a classification network to obtain a classification prediction result.
Description
Technical Field
The invention belongs to the technical field of micro-expression recognition, and relates to a micro-expression recognition technology based on dynamic feature representation.
Background
Currently, the mainstream deep learning methods for micro-expression recognition are divided into two main categories:
The first major category is to sequentially perform feature extraction on each frame in an image sequence and input the feature extraction into a time-series neural network, and learn spatial distribution and time-varying features at the same time. As a ELRCN network (document 1) has been proposed in recent years, experimental results indicate that temporal and spatial features play different roles in microexpressive recognition, and that good recognition effects depend on the effective combination of both.
The second major category extracts the variation characteristics of the whole expression sequence as a characteristic map, and the characteristic map is directly input into a classification network for prediction, and is generally classified by utilizing variation difference characteristics between a starting frame and a maximum frame of a micro expression segment. The feature extraction method is continuously improved, and LBP-TOP (document 2) is widely used in the early stage to extract the spatiotemporal variation features of micro expressions and is used as a reference method in the field. Based on this approach, a series of LBP variants have also been proposed one by one to improve the quality and robustness of the extracted features. Later gradually replaced by Optical flow (literature 3), optical flow estimates the change of the object position between two frames, characterizes the direction and the size of the image pixel movement, and can extract the inter-frame object movement information more robustly. Bi-WOOF (document 4) calculates Optical strain as a supplement on the basis of Optical flow. In addition, the method for extracting the change characteristics of the micro-expression segment also includes DYNAMIC IMAGING (document 5) method for the motion recognition field, which compresses a picture sequence into an RGB image, wherein the RGB image contains the spatial characteristics and the time dynamic characteristics of the whole image sequence.
However, the extraction of the change characteristics of the expression sequence is realized in the preprocessing process of training at present, is limited to the respective processing process, is not fused with a deep learning network for classification, cannot adjust the generated dynamic characteristics according to the feedback of the classification effect, and lacks sufficient flexibility and adaptability.
Related literature:
[ literature ] 1】H.Khor,J.See,R.C.Phan,W.Lin,"Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition,"Proceedings of the 2018International Conference on Automatic Face&Gesture Recognition(FG),2018,pp.667–674.
[ Literature ] 2】G.Zhao,M.Pietikainen,"Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions,"Pattern Analysis and Machine Intelligence,IEEE Transactions,2009,pp.915–928.
[ Document 3 ] D.Fleet, Y.Weiss, "Optical Flow Estimation," Springer US,2006.
[ Literature ] 4】Liong ST,See J,Wong K,Phan RC,"Less is more:Micro-expression recognition from video using apex frame,"Signal Processing:Image Communication,2018,pp.62:82–92.
[ Literature ] 5】H.Bilen,B.Fernando,E.Gavves,A.Vedaldiand S.Gould,"Dynamic image networks for action recognition,"In Proc.IEEE Int.Conf.Comput.Vis.Pattern Recognit,2016,pp.3034–3042.
Disclosure of Invention
Aiming at the defects of the existing micro-expression recognition method, the invention provides an end-to-end micro-expression recognition network based on a pixel displacement generation module based on deep learning, and more spaces capable of being automatically adjusted according to data are provided for the displacement feature extraction and expression recognition classification module so as to increase the overall fitness of the model.
The technical proposal of the invention is a micro-expression recognition method based on a pixel displacement vector of a convolutional neural network, which establishes an end-to-end micro-expression recognition network based on a pixel displacement generation module, the processing flow based on the micro-expression recognition network comprises the following steps,
Selecting a maximum frame, wherein a certain frame before and after the original maximum frame is randomly selected as a maximum frame image in the training process;
Generating a pixel displacement vector feature map, which comprises inputting a selected maximum frame image and a starting frame image into a pixel displacement generating module, and outputting the pixel displacement vector feature map between the two images through the learning and feature fusion of each convolution layer;
Calculating a correlation loss function, which comprises the steps of firstly carrying out bilinear interpolation up-sampling on the generated displacement vector feature image to obtain a displacement feature image with the same size as a maximum frame, then carrying out sampling on an original initial frame image according to the displacement feature image to generate an approximate maximum frame image, and calculating reconstruction loss and regular loss according to the generated approximate maximum frame image and the original selected maximum frame image;
Normalizing operation, including normalizing the generated pixel displacement vector feature map;
and performing feature learning and microexpressive classification, namely connecting the previously selected maximum frame image with the normalized pixel displacement vector feature image, and inputting the connected maximum frame image and normalized pixel displacement vector feature image into a classification network to obtain a classification prediction result.
In the training process, the selection of the maximum frame is realized through a randomization process, a certain frame in a certain range before and after the original maximum frame is randomly selected, and the image pair actually used for training is increased; if in the verification or test stage, directly adopting the original maximum frame image;
the generated pixel displacement vector features are normalized before being input into the classification network, and each displacement vector feature map is divided by the average value of the first several values of the absolute values of each displacement vector feature map.
Moreover, for the generated pixel displacement vector feature map, the loss function thereof includes a reconstruction loss between the original maximum frame and the maximum frame reconstructed from the start frame and the displacement vector, and an L1 canonical loss calculated for the displacement vector feature map itself.
And the selected maximum frame image and the generated pixel displacement vector feature image are input into a classification network together for learning, and after a classification prediction result is obtained, the classification loss is calculated according to the requirement so as to use the related evaluation index.
Compared with the prior art, the invention has the following advantages and positive effects:
(1) The pixel displacement generation module provided by the invention can be combined with a classification network to perform end-to-end unified training, and classification loss can be reversely transmitted to the pixel displacement generation module, so that the pixel displacement generation module automatically adjusts parameters according to classification effects to generate displacement characteristics easier to classify, and meanwhile, the overall model also has higher fitness.
(2) The random maximum frame selection operation provided by the invention can increase the image pair actually used for training, enhance the robustness of the network and the sensitivity to slight change, and improve the generation and classification effects of displacement characteristics.
(3) The normalization operation provided by the invention is equivalent to reducing the expression displacement with larger amplitude, amplifying the expression displacement with smaller amplitude, playing a role in self-adaptive expression amplitude adjustment, reducing the influence of amplitude difference between different image pairs on the classification network, and enabling the classification network to be easier to learn.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate and explain the application and, together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic diagram of an overall structure of an end-to-end micro-expression recognition network based on a pixel displacement generation module according to an embodiment of the present invention;
Fig. 2 is a schematic structural diagram of a pixel displacement generating module according to an embodiment of the present invention;
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description is presented by way of illustration or example only, and is not intended to limit the invention.
The invention discloses a full convolution pixel Displacement Generation Module (DGM) which is used for generating a pixel displacement vector feature map (displacement) between two frames to replace dynamic features generated by a traditional Optical Flow or DYNAMIC IMAGING method. The module is combined with the existing LEARNet classification network to form an end-to-end micro-expression recognition model. We also disclose a randomization operation on the largest frame to increase training sample pairs and increase the sensitivity of the network to fine expression changes. The invention directly takes a starting frame in the expression sequence and a maximum frame selected by randomizing operation as input, generates a pixel displacement vector feature map by using a pixel displacement generating module, performs normalization processing on the feature map, and then is connected with a maximum frame image to be input to LEARNet classification network for learning and prediction. The model can back-propagate the gradient of the classification loss to the DGM, so that the gradient can adjust parameters according to classification results, and displacement features easier to classify are generated.
As shown in fig. 1, an embodiment of the present invention provides an end-to-end micro-expression recognition method based on a pixel displacement generating Module, and the end-to-end micro-expression recognition network based on the pixel displacement generating Module provided by the present invention includes two parts of neural networks for generating pixel displacement vector features and for learning and classifying, the network model performs end-to-end training, and dynamic features of micro-expressions are represented by pixel displacement vectors between a start frame and a maximum frame generated by a convolution Module DGM (i.e., a pixel displacement generating Module DISPLACEMENT GENERATING Module shown in fig. 2).
In an embodiment, the main flow based on the micro-expression recognition network is as follows:
(1) Selecting the maximum frame: in the training process, a randomization process is needed to select the maximum frame, a certain frame in a certain range before and after the original maximum frame is randomly selected as the maximum frame image, and the image pair actually used for training is increased. If in the verification or test stage, the original maximum frame image is directly adopted.
Let the initial frame index be I, the maximum frame index be j, and the index of the last frame in the image sequence be I last, then the randomizing selected frame index I select is calculated by the following formula:
Iselect=random[MAX(i+1,round(j-(j-i)*0.2)),MIN(round(j+(j-i)*0.2),Imax)]
Where MAX represents selecting the larger of the two and MIN represents selecting the smaller of the two to ensure that the selected largest frame is after the start frame and does not exceed the last frame of the sequence. random () represents any integer within the randomly chosen interval.
(2) Generating a pixel displacement vector feature map: the selected maximum frame image and the initial frame image are input into a pixel displacement generating module together, and a pixel displacement vector feature diagram between the two images is output through the learning and feature fusion of each convolution layer.
(3) Calculating a correlation loss function: the generated displacement vector feature image is subjected to bilinear interpolation up-sampling to obtain the displacement feature image with the same size as the maximum frame, and then the original initial frame image is sampled according to the displacement feature image to generate an approximate maximum frame image. If the value of the displacement vector is not an integer, bilinear interpolation is adopted to calculate the pixel value of the corresponding point. Then, based on the generated approximate maximum frame image and the original selected maximum frame image, the L rec reconstruction loss and the L 1 regular loss are calculated.
(4) Normalization operation: and normalizing the generated pixel displacement vector feature map. Let the network generated displacement feature map be I f, the M (I, n) function represents taking the average of the first n numbers of the image I, and the normalized image I n can be obtained by the following formula:
Wherein the comparison with 0.0001 is to avoid zero errors.
(5) Feature learning and micro expression classification are carried out: and connecting the previously selected maximum frame image with the normalized pixel displacement vector feature image (concat), inputting the maximum frame image and the normalized pixel displacement vector feature image into a classification network to obtain a classification prediction result (micro expressions are divided into three types of negative, positive and surprise) and calculating the classification Loss Softmax Loss, the evaluation indexes such as UF1 and UAR according to the requirement.
The invention can be considered to provide an end-to-end micro-expression recognition model based on a pixel displacement generation module, which comprises a displacement vector feature generation module, a randomization processing module, a normalization processing module and a classification network module. Wherein:
The displacement vector feature generation module takes a starting frame in the expression sequence and a maximum frame selected by randomization operation as inputs, and a convolutional neural network provided by the pixel displacement generation module generates a pixel displacement vector feature map (DISPLACEMENTS) between the two frames to replace a dynamic image generated by a traditional Optical flow or DYNAMIC IMAGING method. The pixel displacement vector feature represents the displacement of each pixel of the maximum frame image on the basis of the initial frame image, the range of values is between (-1, 1), and in order to make the network concentrate on the features around each pixel point, the generated pixel displacement vector feature map is multiplied by a scaling factor alpha epsilon (0, 1) to limit the range between (-alpha, alpha). The correlation loss function includes: reconstruction loss between the original maximum frame and the maximum frame reconstructed from the start frame and the displacement vector; l1 canonical loss calculated on the displacement vector feature map itself.
In the randomization processing module, because different image sequences among data sets and in the data sets have different expression amplitudes, in order to make a network for generating displacement vector characteristic images more robust, a certain frame in a certain range before and after the original maximum frame is randomly selected as the maximum frame when data is loaded.
In the normalization processing module, in order to normalize the pixel displacement vector feature maps with different magnitudes, the invention divides each displacement vector feature map by the average value of the first n large values of the absolute values of the displacement vector feature maps. Averaging instead of maximum is to reduce the interference of larger noise points that may occur.
The classifying network can select different existing network structures, and the pixel displacement vector feature images obtained before are normalized and then input into the classifying network together with the selected maximum frame image for learning and prediction, so that the characteristics of time dimension and space dimension are maintained. According to the embodiment of the invention, LEARNet(Verma Monu,Vipparthi Santosh Kumar,Singh Girdhari,Murala Subrahmanyam,"LEARNet:Dynamic Imaging Network for Micro Expression Recognition,"IEEE Transactions on Image Processing,2019,pp.99.) is selected as the classification network, and compared with the classical ResNet and VGG structure, the network can retain more details and better learn and distinguish the characteristics of different expression categories.
As shown in fig. 2, the present invention provides a schematic structure diagram of a pixel displacement generation module, in which Conv, conv1, conv2, conv3, conv4, up, conv5, conv6 are sequentially set, and the Up output is connected (Concat) with the output of the Conv1 layer as the input of Conv 5. The module thus comprises two downsamples (implemented by Conv1 and Conv3 layers with stride 2) and one upsample (implemented by Up layer), the specific parameter configurations of the respective convolution layers being shown in the following table. Wherein each convolutional layer Conv, conv1, conv2, conv3, conv4, conv5 is followed by BN layer (batch normalization layer) and leak_ relu activation function layer, and the last Conv6 layer is followed by BN layer and Tanh activation function layer. The Up layer represents an Up-sampling layer using bilinear interpolation, the output of which is connected to the output of the Conv1 layer as input to the next layer.
For the input images with the width and the height of w and h respectively, the final output channel number is 2, and the pixel displacement vector characteristic diagrams with the width and the height of w/2 and h/2 are used for classification. Wherein the first channel represents displacement in the X direction and the second channel represents displacement in the Y direction. And meanwhile, carrying out bilinear interpolation up-sampling on the generated pixel displacement vector feature map to obtain a displacement feature map with width and height of w and h for calculating a loss function. The method comprises the steps of firstly carrying out grid sampling on an initial frame image according to an up-sampled displacement characteristic image to generate an approximate maximum frame image, then calculating L rec loss of the approximate maximum frame and an original selected maximum frame, and simultaneously calculating L 1 regular loss of the displacement characteristic image. The associated loss function is set as follows:
(1) Let the original starting frame image be designated as I s, the selected largest frame image be designated as I t, and T (I s) represent the approximate largest frame image obtained by sampling the starting frame according to the displacement signature, then the L rec reconstruction loss is calculated by:
Lrec=||T(Is)-It||1
(2) To further refine the generated displacement features, let T xy represent the pixel displacement vector at (x, y), calculate the L 1 canonical loss for the pixel displacement vector feature map as shown in the following equation:
(3) The classification loss of the microexpressions uses classical Cross Entropy cross entropy loss, denoted as L c, and the overall loss of the network is calculated as follows:
L=w1×Lc+w2×Lrec+w3×L1
Where w 1,w2 and w 3 are the weights of the L c,Lrec and L 1 loss functions, respectively. The three loss functions can be respectively and reversely propagated to the displacement generation module, and the weight coefficient is selected according to the magnitude difference, and the weight is set to enable the gradient to have higher magnitude than that of the L c and the L 1 because the module takes the reconstruction loss L rec as the main loss. The examples preferably take the experimental value w 1=0.0001,w2=1000,w3 =1.
The pixel displacement value is expressed as a percentage relative to the image width and height, assuming T xy=(Δx,Δy) as the pixel displacement vector at (x, y), it indicates that the pixel at (x, y) of the original starting frame has moved to (x+w×Δ x,y+h×Δy) of the approximate maximum frame image. In order to make the network concentrate on the displacement characteristics around the pixels, multiplying the displacement characteristics by a scaling factor alpha E (0, 1) to obtain a final pixel displacement vector characteristic diagram in the range of [ -alpha, alpha ], namely limiting the actual X-direction displacement component size between [ -w X alpha, w X alpha ] and Y-direction displacement component size between [ -h X alpha, h X alpha ].
For the convenience of understanding the technical effects of the present invention, the following experimental results are attached:
TABLE 1 ablation experiment results on networks proposed by the patent
TABLE 2 UF1 and UAR results comparison of networks using conventional dynamic feature extraction methods and networks proposed by the patent
In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.
In some possible embodiments, a micro-expression recognition system based on a pixel displacement vector of a convolutional neural network is provided, and the micro-expression recognition system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute a micro-expression recognition method based on the pixel displacement vector of the convolutional neural network.
In some possible embodiments, a micro-expression recognition system based on a pixel displacement vector of a convolutional neural network is provided, which comprises a readable storage medium, wherein a computer program is stored on the readable storage medium, and the computer program is executed to realize the micro-expression recognition method based on the pixel displacement vector of the convolutional neural network.
It should be understood that parts of the specification not specifically set forth herein are all prior art.
It should be understood that the foregoing description of the implementation examples of the current popular framework is not to be construed as limiting the scope of the invention, but that the appended claims are intended to cover all such alternatives and modifications as may be included within the scope of the invention as defined by the appended claims.
Claims (5)
1. A micro-expression recognition method of pixel displacement vector based on convolutional neural network is characterized in that: establishing an end-to-end micro-expression recognition network based on a pixel displacement generation module, wherein the processing flow based on the micro-expression recognition network comprises the following steps of selecting a maximum frame, and randomly selecting a certain frame before and after the original maximum frame as a maximum frame image in a training process;
Generating a pixel displacement vector feature map, which comprises inputting a selected maximum frame image and a starting frame image into a pixel displacement generating module, and outputting the pixel displacement vector feature map between the two images through the learning and feature fusion of each convolution layer;
Calculating a correlation loss function, which comprises the steps of firstly carrying out bilinear interpolation up-sampling on the generated displacement vector feature image to obtain a displacement feature image with the same size as a maximum frame, then carrying out sampling on an original initial frame image according to the displacement feature image to generate an approximate maximum frame image, and calculating reconstruction loss and regular loss according to the generated approximate maximum frame image and the original selected maximum frame image;
Normalizing operation, including normalizing the generated pixel displacement vector feature map;
and performing feature learning and microexpressive classification, namely connecting the previously selected maximum frame image with the normalized pixel displacement vector feature image, and inputting the connected maximum frame image and normalized pixel displacement vector feature image into a classification network to obtain a classification prediction result.
2. The method for identifying the microexpressions of the pixel displacement vectors based on the convolutional neural network according to claim 1, wherein the method comprises the following steps: in the training process, the selection of the maximum frame is realized through a randomization process, a certain frame in a certain range before and after the original maximum frame is randomly selected, and an image pair actually used for training is increased; if in the verification or test stage, the original maximum frame image is directly adopted.
3. The method for identifying the microexpressions of the pixel displacement vectors based on the convolutional neural network according to claim 1, wherein the method comprises the following steps: the generated pixel displacement vector features are normalized before being input into the classification network, and each displacement vector feature map is divided by the average value of the first several values of the absolute values of each displacement vector feature map.
4. The method for identifying the microexpressions of the pixel displacement vectors based on the convolutional neural network according to claim 1, wherein the method comprises the following steps: for the generated pixel displacement vector feature map, the loss function comprises reconstruction loss between the original maximum frame and the maximum frame reconstructed according to the initial frame and the displacement vector, and L1 regular loss calculated on the displacement vector feature map.
5. The micro-expression recognition method of the pixel displacement vector based on the convolutional neural network according to claim 1,2, 3 or 4, wherein: the selected maximum frame image and the generated pixel displacement vector feature image are input into a classification network together for learning, and after a classification prediction result is obtained, the classification loss is calculated according to the need so as to correlate with an evaluation index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111204917.XA CN114005157B (en) | 2021-10-15 | 2021-10-15 | Micro-expression recognition method for pixel displacement vector based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111204917.XA CN114005157B (en) | 2021-10-15 | 2021-10-15 | Micro-expression recognition method for pixel displacement vector based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114005157A CN114005157A (en) | 2022-02-01 |
CN114005157B true CN114005157B (en) | 2024-05-10 |
Family
ID=79923097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111204917.XA Active CN114005157B (en) | 2021-10-15 | 2021-10-15 | Micro-expression recognition method for pixel displacement vector based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114005157B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114627218B (en) * | 2022-05-16 | 2022-08-12 | 成都市谛视无限科技有限公司 | Human face fine expression capturing method and device based on virtual engine |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020037965A1 (en) * | 2018-08-21 | 2020-02-27 | 北京大学深圳研究生院 | Method for multi-motion flow deep convolutional network model for video prediction |
CN112183419A (en) * | 2020-10-09 | 2021-01-05 | 福州大学 | Micro-expression classification method based on optical flow generation network and reordering |
CN112766159A (en) * | 2021-01-20 | 2021-05-07 | 重庆邮电大学 | Cross-database micro-expression identification method based on multi-feature fusion |
CN112800891A (en) * | 2021-01-18 | 2021-05-14 | 南京邮电大学 | Discriminative feature learning method and system for micro-expression recognition |
-
2021
- 2021-10-15 CN CN202111204917.XA patent/CN114005157B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020037965A1 (en) * | 2018-08-21 | 2020-02-27 | 北京大学深圳研究生院 | Method for multi-motion flow deep convolutional network model for video prediction |
CN112183419A (en) * | 2020-10-09 | 2021-01-05 | 福州大学 | Micro-expression classification method based on optical flow generation network and reordering |
CN112800891A (en) * | 2021-01-18 | 2021-05-14 | 南京邮电大学 | Discriminative feature learning method and system for micro-expression recognition |
CN112766159A (en) * | 2021-01-20 | 2021-05-07 | 重庆邮电大学 | Cross-database micro-expression identification method based on multi-feature fusion |
Non-Patent Citations (1)
Title |
---|
吴进 ; 闵育 ; 马思敏 ; 张伟华 ; .一种基于CNN与LSTM结合的微表情识别算法.电讯技术.2020,(01),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN114005157A (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062872B (en) | Image super-resolution reconstruction method and system based on edge detection | |
CN111639692A (en) | Shadow detection method based on attention mechanism | |
CN112507617B (en) | Training method of SRFlow super-resolution model and face recognition method | |
CN113688723A (en) | Infrared image pedestrian target detection method based on improved YOLOv5 | |
Cai et al. | Residual channel attention generative adversarial network for image super-resolution and noise reduction | |
CN112149500B (en) | Face recognition small sample learning method with partial shielding | |
CN111047543A (en) | Image enhancement method, device and storage medium | |
CN111291669A (en) | Two-channel depression angle human face fusion correction GAN network and human face fusion correction method | |
CN112507920A (en) | Examination abnormal behavior identification method based on time displacement and attention mechanism | |
CN114022506A (en) | Image restoration method with edge prior fusion multi-head attention mechanism | |
CN114005157B (en) | Micro-expression recognition method for pixel displacement vector based on convolutional neural network | |
CN110570375B (en) | Image processing method, device, electronic device and storage medium | |
CN117351542A (en) | Facial expression recognition method and system | |
CN118212463A (en) | Target tracking method based on fractional order hybrid network | |
CN117893409A (en) | Face super-resolution reconstruction method and system based on illumination condition constraint diffusion model | |
Hua et al. | An Efficient Multiscale Spatial Rearrangement MLP Architecture for Image Restoration | |
CN115860113B (en) | Training method and related device for self-countermeasure neural network model | |
CN114582002B (en) | Facial expression recognition method combining attention module and second-order pooling mechanism | |
CN116977200A (en) | Processing method and device of video denoising model, computer equipment and storage medium | |
CN115797646A (en) | Multi-scale feature fusion video denoising method, system, device and storage medium | |
CN111047537A (en) | System for recovering details in image denoising | |
CN113012072A (en) | Image motion deblurring method based on attention network | |
CN114596609A (en) | Audio-visual counterfeit detection method and device | |
CN114240778A (en) | Video denoising method and device and terminal | |
Maity et al. | A survey on super resolution for video enhancement using gan |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |