CN116645717A - Microexpressive recognition method and system based on PCANet+ and LSTM - Google Patents

Microexpressive recognition method and system based on PCANet+ and LSTM Download PDF

Info

Publication number
CN116645717A
CN116645717A CN202310681811.1A CN202310681811A CN116645717A CN 116645717 A CN116645717 A CN 116645717A CN 202310681811 A CN202310681811 A CN 202310681811A CN 116645717 A CN116645717 A CN 116645717A
Authority
CN
China
Prior art keywords
image
feature
micro
pcanet
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310681811.1A
Other languages
Chinese (zh)
Inventor
姚俊峰
王仕琪
龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202310681811.1A priority Critical patent/CN116645717A/en
Publication of CN116645717A publication Critical patent/CN116645717A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a microexpressive recognition method and a microexpressive recognition system based on PCANet+ and LSTM, and the method comprises the following steps: step S10, detecting key points of faces of head images of frames in a portrait video; step S20, preprocessing the head image based on key points to obtain a plurality of face images; step S30, normalizing the frame number of the face image; step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence; s50, inputting an optical flow image sequence into a PCANet+ network to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image; step S60, intercepting feature areas from the two-dimensional feature map based on key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted combination to obtain scores of different categories of micro expressions; step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function. The application has the advantages that: the recognition effect of the micro-expressions is greatly improved.

Description

Microexpressive recognition method and system based on PCANet+ and LSTM
Technical Field
The application relates to the technical field of expression recognition, in particular to a microexpressive recognition method and system based on PCANet+ and LSTM.
Background
Along with the progress of technology, artificial intelligence is also continuously developed, wherein the artificial intelligence comprises an expression recognition technology, and the emotion and psychological activities of the current person can be rapidly judged by automatically recognizing the micro-expression of the person in the video. However, the conventional deep learning-based micro-expression recognition method often comprises deeper network layers, so that parameters are too many, fitting is easy to occur in the process of recognizing micro-expressions with smaller data quantity, and a two-dimensional convolutional neural network cannot extract complete time sequence information of the micro-expressions, so that the recognition effect of the micro-expressions is poor.
Therefore, how to provide a microexpressive recognition method and system based on PCANet+ and LSTM to achieve the effect of improving microexpressive recognition becomes a technical problem to be solved urgently.
Disclosure of Invention
The application aims to solve the technical problem of providing a microexpressive recognition method and a microexpressive recognition system based on PCANet+ and LSTM, so as to achieve the effect of improving microexpressive recognition.
In a first aspect, the present application provides a microexpressive recognition method based on pcanet+ and LSTM, including the steps of:
step S10, acquiring a portrait video, and detecting key points of faces of head images of frames in the portrait video;
step S20, preprocessing of alignment, clipping and scaling is carried out on each frame of human head image based on each key point, so as to obtain a plurality of human face images;
step S30, normalizing the frame number of the face image;
step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
s50, inputting the optical flow image sequence into a PCANet+network for spatial feature extraction to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image;
step S60, intercepting feature areas from a two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
and step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions.
Further, the step S10 specifically includes:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
Further, the step S20 specifically includes:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
Further, the step S30 specifically includes:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
Further, in the step S60, the characteristic region includes at least an eyebrow region, an eye region, a mouth region, and a nose region.
In a second aspect, the present application provides a microexpressive recognition system based on pcanet+ and LSTM, comprising the following modules:
the key point detection module is used for acquiring a portrait video and detecting key points of faces of head images of frames in the portrait video;
the human head image preprocessing module is used for preprocessing the alignment, cutting and scaling of human head images of each frame based on each key point to obtain a plurality of human face images;
the frame number normalization module is used for normalizing the frame number of the face image;
the optical flow calculation module is used for carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
the PCANet+ feature extraction module is used for inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction to obtain a plurality of feature images, and performing weighted average on each feature image to obtain a two-dimensional feature image;
the LSTM feature extraction module is used for intercepting feature areas from the two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
and the score mapping module is used for mapping the score into probabilities of different categories of micro expressions by using a Softmax function so as to complete the identification of the micro expressions.
Further, the keypoint detection module is specifically configured to:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
Further, the human head image preprocessing module is specifically configured to:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
Further, the frame number normalization module is specifically configured to:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
Further, in the LSTM feature extraction module, the feature region includes at least an eyebrow region, an eye region, a mouth region, and a nose wing region.
The application has the advantages that:
the method comprises the steps of carrying out key point detection on faces of all frames of head images in an acquired face video, carrying out pretreatment of alignment, clipping and scaling on all frames of head images based on all key points to obtain a plurality of face images, and normalizing the frames of the face images; performing optical flow calculation on each face image to obtain an optical flow image sequence, and inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction and weighted average to obtain a two-dimensional feature map; intercepting a feature area from a two-dimensional feature map, inputting the feature area into an LSTM network to extract time sequence features, weighting and combining the time sequence features to obtain scores of different categories of micro-expressions, and finally mapping the scores into probabilities of the different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions; the PCANet+ network and the LSTM network are combined, the PCANet+ network is used for extracting space features, the LSTM network is used for extracting time sequence features, space-time features of the micro-expression are effectively extracted, the PCANet+ network can directly calculate network parameters of the current layer through input of the current layer, parameters and calculated amount of the network are reduced, fitting is avoided, and finally the recognition effect of the micro-expression is greatly improved.
Drawings
The application will be further described with reference to examples of embodiments with reference to the accompanying drawings.
Fig. 1 is a flowchart of a microexpressive recognition method based on pcanet+ and LSTM according to the present application.
Fig. 2 is a schematic structural diagram of a microexpressive recognition system based on pcanet+ and LSTM according to the present application.
Detailed Description
The technical scheme in the embodiment of the application has the following overall thought: the PCANet+ network and the LSTM network are combined, space features are extracted through the PCANet+ network, time sequence features are extracted through the LSTM network, so that space-time features of the micro-expressions are effectively extracted, the PCANet+ network can directly calculate network parameters of the current layer through input of the current layer, parameters and calculated amount of the network are reduced, and overfitting is avoided, so that the recognition effect of the micro-expressions is improved.
Referring to fig. 1 to 2, a preferred embodiment of a microexpressive recognition method based on pcanet+ and LSTM according to the present application includes the following steps:
step S10, acquiring a portrait video, and detecting key points of faces of head images of frames in the portrait video;
step S20, preprocessing of alignment, clipping and scaling is carried out on each frame of human head image based on each key point, so as to obtain a plurality of human face images;
step S30, normalizing the frame number of the face image;
step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
s50, inputting the optical flow image sequence into a PCANet+network for spatial feature extraction to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image;
step S60, intercepting feature areas from a two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
considering that the micro-expressions only have more obvious motions in partial areas of the face, not all the facial areas are helpful for classifying the micro-expressions, and therefore, the feature areas are intercepted from the two-dimensional feature map.
And step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions.
The step S10 specifically includes:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
The active shape model is built on the basis of a point distribution model, priori knowledge such as gray level, size, shape and approximate position of an image is comprehensively considered when a human face is detected, a statistical model of characteristic point distribution of a training image sample is obtained by learning characteristic points marked by a training set, the statistical model is taken as an initial position, a shape model of a target image is obtained by continuous iteration, finally, shape constraint is applied to a testing set, and the best matched point is searched, so that the positioning of the human face characteristic points (key points) is realized.
The step S20 specifically includes:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
In order to eliminate the influence of head rotation and offset on microexpression recognition, human face alignment is required to be carried out on a human head image according to the detected key points; because the relative positions of the characteristic points of the left-eye inner angle and the right-eye inner angle in the human face are relatively stable and cannot be changed due to facial muscle movement, the application horizontally aligns the connecting lines of the left-eye inner angle and the right-eye inner angle, then cuts the human face area according to the aligned images and key points, and removes areas, such as clothes, background, hair and the like, which are irrelevant to micro-expression in the human head image; the size of the face image obtained by cutting is not completely consistent due to the rotation of the head in the head image, the change of factors such as distance from the camera, and the like, so that the size needs to be unified in order to adapt to the requirement of a subsequent network on the input size.
The step S30 specifically includes:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
Because of different durations of the microexpressions in the acquisition process and different camera frame rates used by different data sets, the obtained microexpressive image sequences are inconsistent in length, and in the application, an LSTM network is needed to be used when microexpressive features are extracted, so that the subsequent LSTM network processing is convenient, and a time interpolation algorithm is needed to normalize the frame number of the microexpressive sample sequence (face image) during data preprocessing. The time interpolation algorithm is firstly used for lip language identification, and is a stream-based interpolation method, which maps the whole micro expression sequence into a curve, each frame in the sequence corresponds to a point in the curve, and then resamples on the curve to obtain an interpolated image sequence.
In step S40, the optical flow is a two-dimensional vector field on the image plane, describing the instantaneous speed of the pixel motion of two consecutive frames in a video sequence; in order to improve the effect of PCANet+network feature learning, the application performs dense optical flow calculation on the image sequence (face image) of the micro-expression so as to enhance the face movement information. Optical flow computation relies on two basic assumptions: 1. the brightness is kept unchanged, namely, the brightness of the pixels at the corresponding positions of the object in the motion and shooting process is kept unchanged in the image sequence; 2. the temporal continuity, i.e. the movement of the target pixel between adjacent frames, cannot be too severe, which ensures that the pixel coordinates are temporally continuous and delicate.
The optical flow calculation process is as follows:
the optical flow method represents an image sequence as a three-dimensional matrix, the brightness of a certain pixel in the sequence is represented as I (x, y, t), wherein x, y, t is the time-space coordinate of the pixel, the pixel reaches the next frame after the time of deltat, displacement of deltax and deltay is completed in the image, and according to the assumption that the brightness is unchanged, the intensity of the pixel before and after the movement is constant, and the following formula is obtained:
I(x,y,t)=I(x+Δx,y+Δy,t+Δt)---------------------------(1)
based on the assumption of time continuity, taylor expansion is performed on the right side of equation (1), resulting in the following equation:
wherein epsilon represents a higher order infinitely small and can be ignored; substituting formula (2) into formula (1) and dividing by Δt yields the following formula:
let u and v represent velocity vectors of the pixel in x-axis and y-axis, i.e. u=Δx/Δt, v=Δy/Δt, respectively, which are substituted into formula (3), resulting in the following formula:
I x u+I y v+I t =0---------------------------------------(4)
(u, v) is the optical flow field generated by the pixel in delta t time, and can be solved by adding constraint conditions; different constraint conditions are added, so that different optical flow field calculation methods can be obtained. The method applies the TV-L1 algorithm to the calculation of the optical flow field, and the TV-L1 algorithm introduces a subspace track model to ensure the time consistency of the optical flow, and can retain the edge characteristics in the image; for one pixel point in the sequence of micro-expressions, the continuous optical flow field is calculated first using the optical flow estimation loss function shown in equation (5):
wherein L represents the length of the sequence of microexpressive images;an R-based trajectory representing a trajectory space for constructing a trajectory space; />Representing a spatial domain of the image; lin: tight>The mapping function is represented as a function of the mapping,the optical flow field u (t), v (t) can be mapped to a new space constructed by the R-base trajectory; the first term of equation (5) represents a penalty term for the constant luminance constraint, the second term is used to place the derived optical flow on the base trajectory, and the third term represents the spatial regularization of the trajectory model coefficients based on the total variation.
Setting a sequence of micro-expression images asSetting the first frame as a reference frame, and calculating optical flow components U in the horizontal direction and the vertical direction of all the other frames according to the optical flow calculation method>Then stacking the optical flow sequences obtained by calculation in a frame unit, and inputting the optical flow sequences into a PCANet+network; for the optical flow sequence U, ++>The sliding window with the size of T and the step length of s is used for sampling the sliding window to obtain two subsequence sets mu and v, and the two subsequence sets mu and v are defined as formula (6):
wherein, the liquid crystal display device comprises a liquid crystal display device,the elements of corresponding positions in μ and ν are connected by equation (6) to form an input sequence Γ consisting of stacked optical flow components:
the method comprises the steps of connecting optical flow components in a channel dimension, and stacking optical flow sequences in the horizontal direction and the vertical direction of each video segment in T frames through multi-channel stacking operation to obtain an optical flow image sequence with the channel number of 2T.
The step S50 specifically includes:
dividing a human image video into K segments, wherein each segment comprises T frames of human face images, and obtaining a set Γ= { I of a multi-channel optical flow image after optical flow calculation and stacking of each segment 1 ,I 2 ,...,I K }, wherein I i The multi-channel image corresponding to the ith video clip; and then sequentially inputting the optical flow images in the gamma into a PCANet+ network of two layers, wherein the number of the filters of the first PCA convolution layer is expressed as D1, the size is k1×k1, and the number of the filters of the second PCA convolution layer is expressed as D2, and the size is k2×k2. Each convolution layer is followed by a pooling layer, the first of which is an average pooling layer whose filter size is fixed at 3 x 3. The second pooling layer is the largest pooling layer, the filter size of which is set to 3 x 3. For multi-channel image I i Obtaining a D2 two-dimensional characteristic image set O after passing through two layers of PCANet+networks 2 ={O 2 ,O 2 ,...,O 2 }。
And then directly taking the feature map (taking the face key local area) output by the PCANet+network second layer as the input of the subsequent LSTM network. The PCA filter of the second layer network is obtained by L before the layer as known from the filter learning process of PCANet + 2 The feature vector corresponding to the maximum feature value is converted. The filter with larger characteristic value, the classification information contained in the characteristic diagram output after convolution is more important; therefore, the feature map output by the second layer is weighted and averaged according to the magnitude of the corresponding feature value of the filter, as shown in the formula (8), to obtain a two-dimensional feature map O i '. At the same time, this is also advantageous for unifying the input size of the LSTM network so that it is not affected by the number of PCA filters.
Wherein, the liquid crystal display device comprises a liquid crystal display device,represent the firstA feature map of the j-th filter output of layer 2; lambda (lambda) j And the characteristic value corresponding to the jth filter is represented.
In the step S60, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.
A sample in the microexpressive data set is preprocessed to obtain a sequence containing L pictures, when optical flow calculation is carried out, the first frame is used as a reference frame to calculate the optical flow of all other frames, so that L-1 dual-channel optical flow feature images containing horizontal optical flow components and vertical optical flow components are obtained, the optical flow feature images are stacked by T frames, and K=L-T+1 multichannel images are obtained, wherein the T value is an odd number. And (3) carrying out feature extraction and key region feature segmentation on a multichannel optical flow image by PCANet+ to obtain two-dimensional features near the eyebrows, the mouth and the nose wings, respectively converting the two-dimensional features into one-dimensional vectors, and then splicing the two-dimensional vectors to obtain the input of a time node of the LSTM network. Since the LSTM network inputs are time series data containing K feature vectors and outputs are scores for each category of microexpressions, a many-to-one expansion model is used for training.
The LSTM network-based feature learning model mainly comprises two parts, wherein the first part is a time sequence feature extraction network formed by two LSTM layers, and the second part is a classifier formed by a full connection layer and a Softmax function. Each LSTM layer is unfolded to K LSTM units, K time sequence data generated by the micro-expression samples are corresponding, the parameters of the gating units are adjusted, the sequence information is selectively memorized, and the extraction of the time sequence features is completed. And then, weighting and summing the extracted features by using a full connection layer to obtain the scores of the categories of the micro-expressions. Finally, mapping the score into probability by using a Softmax function, namely a final classification result of the micro-expression. The model uses a cross entropy loss function to optimize the network:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a predicted value of a kth emotion category corresponding to the microexpressive sample output by the Softmax function; y represents the one-hot encoding vector of the sample real label; y is k The value of the sample in the kth emotion category is represented, and when the true label of the sample is k, the value is 1, otherwise the value is 0.
The application discloses a preferred embodiment of a microexpressive recognition system based on PCANet+ and LSTM, which comprises the following modules:
the key point detection module is used for acquiring a portrait video and detecting key points of faces of head images of frames in the portrait video;
the human head image preprocessing module is used for preprocessing the alignment, cutting and scaling of human head images of each frame based on each key point to obtain a plurality of human face images;
the frame number normalization module is used for normalizing the frame number of the face image;
the optical flow calculation module is used for carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
the PCANet+ feature extraction module is used for inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction to obtain a plurality of feature images, and performing weighted average on each feature image to obtain a two-dimensional feature image;
the LSTM feature extraction module is used for intercepting feature areas from the two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
considering that the micro-expressions only have more obvious motions in partial areas of the face, not all the facial areas are helpful for classifying the micro-expressions, and therefore, the feature areas are intercepted from the two-dimensional feature map.
And the score mapping module is used for mapping the score into probabilities of different categories of micro expressions by using a Softmax function so as to complete the identification of the micro expressions.
The key point detection module is specifically configured to:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
The active shape model is built on the basis of a point distribution model, priori knowledge such as gray level, size, shape and approximate position of an image is comprehensively considered when a human face is detected, a statistical model of characteristic point distribution of a training image sample is obtained by learning characteristic points marked by a training set, the statistical model is taken as an initial position, a shape model of a target image is obtained by continuous iteration, finally, shape constraint is applied to a testing set, and the best matched point is searched, so that the positioning of the human face characteristic points (key points) is realized.
The human head image preprocessing module is specifically used for:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
In order to eliminate the influence of head rotation and offset on microexpression recognition, human face alignment is required to be carried out on a human head image according to the detected key points; because the relative positions of the characteristic points of the left-eye inner angle and the right-eye inner angle in the human face are relatively stable and cannot be changed due to facial muscle movement, the application horizontally aligns the connecting lines of the left-eye inner angle and the right-eye inner angle, then cuts the human face area according to the aligned images and key points, and removes areas, such as clothes, background, hair and the like, which are irrelevant to micro-expression in the human head image; the size of the face image obtained by cutting is not completely consistent due to the rotation of the head in the head image, the change of factors such as distance from the camera, and the like, so that the size needs to be unified in order to adapt to the requirement of a subsequent network on the input size.
The frame number normalization module is specifically configured to:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
Because of different durations of the microexpressions in the acquisition process and different camera frame rates used by different data sets, the obtained microexpressive image sequences are inconsistent in length, and in the application, an LSTM network is needed to be used when microexpressive features are extracted, so that the subsequent LSTM network processing is convenient, and a time interpolation algorithm is needed to normalize the frame number of the microexpressive sample sequence (face image) during data preprocessing. The time interpolation algorithm is firstly used for lip language identification, and is a stream-based interpolation method, which maps the whole micro expression sequence into a curve, each frame in the sequence corresponds to a point in the curve, and then resamples on the curve to obtain an interpolated image sequence.
In the optical flow calculation module, the optical flow is a two-dimensional vector field on an image plane, and the instantaneous speed of the pixel motion of two continuous frames in a video sequence is described; in order to improve the effect of PCANet+network feature learning, the application performs dense optical flow calculation on the image sequence (face image) of the micro-expression so as to enhance the face movement information. Optical flow computation relies on two basic assumptions: 1. the brightness is kept unchanged, namely, the brightness of the pixels at the corresponding positions of the object in the motion and shooting process is kept unchanged in the image sequence; 2. the temporal continuity, i.e. the movement of the target pixel between adjacent frames, cannot be too severe, which ensures that the pixel coordinates are temporally continuous and delicate.
The optical flow calculation process is as follows:
the optical flow method represents an image sequence as a three-dimensional matrix, the brightness of a certain pixel in the sequence is represented as I (x, y, t), wherein x, y, t is the time-space coordinate of the pixel, the pixel reaches the next frame after the time of deltat, displacement of deltax and deltay is completed in the image, and according to the assumption that the brightness is unchanged, the intensity of the pixel before and after the movement is constant, and the following formula is obtained:
I(x,y,t)=I(x+Δx,y+Δy,t+Δt)---------------------------(1)
based on the assumption of time continuity, taylor expansion is performed on the right side of equation (1), resulting in the following equation:
wherein epsilon represents a higher order infinitely small and can be ignored; substituting formula (2) into formula (1) and dividing by Δt yields the following formula:
let u and v represent velocity vectors of the pixel in x-axis and y-axis, i.e. u=Δx/Δt, v=Δy/Δt, respectively, which are substituted into formula (3), resulting in the following formula:
I x u+I y v+I t =0---------------------------------------(4)
(u, v) is the optical flow field generated by the pixel in delta t time, and can be solved by adding constraint conditions; different constraint conditions are added, so that different optical flow field calculation methods can be obtained. The method applies the TV-L1 algorithm to the calculation of the optical flow field, and the TV-L1 algorithm introduces a subspace track model to ensure the time consistency of the optical flow, and can retain the edge characteristics in the image; for one pixel point in the sequence of micro-expressions, the continuous optical flow field is calculated first using the optical flow estimation loss function shown in equation (5):
wherein L represents the length of the sequence of microexpressive images;an R-based trajectory representing a trajectory space for constructing a trajectory space; />Representing an imageA spatial domain; lin: tight>Representing a mapping function, the optical flow field u (t), v (t) can be mapped to a new space constructed by the R-based trajectory; the first term of equation (5) represents a penalty term for the constant luminance constraint, the second term is used to place the derived optical flow on the base trajectory, and the third term represents the spatial regularization of the trajectory model coefficients based on the total variation.
Setting a sequence of micro-expression images asSetting the first frame as a reference frame, and calculating optical flow components U in the horizontal direction and the vertical direction of all the other frames according to the optical flow calculation method>Then stacking the optical flow sequences obtained by calculation in a frame unit, and inputting the optical flow sequences into a PCANet+network; for the optical flow sequence U, ++>The sliding window with the size of T and the step length of s is used for sampling the sliding window to obtain two subsequence sets mu and v, and the two subsequence sets mu and v are defined as formula (6):
wherein, the liquid crystal display device comprises a liquid crystal display device,the elements of corresponding positions in μ and ν are connected by equation (6) to form an input sequence Γ consisting of stacked optical flow components:
the method comprises the steps of connecting optical flow components in a channel dimension, and stacking optical flow sequences in the horizontal direction and the vertical direction of each video segment in T frames through multi-channel stacking operation to obtain an optical flow image sequence with the channel number of 2T.
The PCANet+ feature extraction module is specifically configured to:
dividing a human image video into K segments, wherein each segment comprises T frames of human face images, and obtaining a set Γ= { I of a multi-channel optical flow image after optical flow calculation and stacking of each segment 1 ,I 2 ,...,I K }, wherein I i The multi-channel image corresponding to the ith video clip; and then sequentially inputting the optical flow images in the gamma into a PCANet+ network of two layers, wherein the number of the filters of the first PCA convolution layer is expressed as D1, the size is k1×k1, and the number of the filters of the second PCA convolution layer is expressed as D2, and the size is k2×k2. Each convolution layer is followed by a pooling layer, the first of which is an average pooling layer whose filter size is fixed at 3 x 3. The second pooling layer is the largest pooling layer, the filter size of which is set to 3 x 3. For multi-channel image I i Obtaining a D2 two-dimensional characteristic image set O after passing through two layers of PCANet+networks 2 ={O 2 ,O 2 ,...,O 2 }。
And then directly taking the feature map (taking the face key local area) output by the PCANet+network second layer as the input of the subsequent LSTM network. The PCA filter of the second layer network is obtained by L before the layer as known from the filter learning process of PCANet + 2 The feature vector corresponding to the maximum feature value is converted. The filter with larger characteristic value, the classification information contained in the characteristic diagram output after convolution is more important; therefore, the feature map output by the second layer is weighted and averaged according to the magnitude of the corresponding feature value of the filter, as shown in the formula (8), to obtain a two-dimensional feature map O i '. At the same time, this is also advantageous for unifying the input size of the LSTM network so that it is not affected by the number of PCA filters.
Wherein, the liquid crystal display device comprises a liquid crystal display device,a feature map representing the j-th filter output of layer 2; lambda (lambda) j And the characteristic value corresponding to the jth filter is represented.
In the LSTM feature extraction module, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.
A sample in the microexpressive data set is preprocessed to obtain a sequence containing L pictures, when optical flow calculation is carried out, the first frame is used as a reference frame to calculate the optical flow of all other frames, so that L-1 dual-channel optical flow feature images containing horizontal optical flow components and vertical optical flow components are obtained, the optical flow feature images are stacked by T frames, and K=L-T+1 multichannel images are obtained, wherein the T value is an odd number. And (3) carrying out feature extraction and key region feature segmentation on a multichannel optical flow image by PCANet+ to obtain two-dimensional features near the eyebrows, the mouth and the nose wings, respectively converting the two-dimensional features into one-dimensional vectors, and then splicing the two-dimensional vectors to obtain the input of a time node of the LSTM network. Since the LSTM network inputs are time series data containing K feature vectors and outputs are scores for each category of microexpressions, a many-to-one expansion model is used for training.
The LSTM network-based feature learning model mainly comprises two parts, wherein the first part is a time sequence feature extraction network formed by two LSTM layers, and the second part is a classifier formed by a full connection layer and a Softmax function. Each LSTM layer is unfolded to K LSTM units, K time sequence data generated by the micro-expression samples are corresponding, the parameters of the gating units are adjusted, the sequence information is selectively memorized, and the extraction of the time sequence features is completed. And then, weighting and summing the extracted features by using a full connection layer to obtain the scores of the categories of the micro-expressions. Finally, mapping the score into probability by using a Softmax function, namely a final classification result of the micro-expression. The model uses a cross entropy loss function to optimize the network:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a predicted value of a kth emotion category corresponding to the microexpressive sample output by the Softmax function; y represents the one-hot encoding vector of the sample real label; y is k The value of the sample in the kth emotion category is represented, and when the true label of the sample is k, the value is 1, otherwise the value is 0.
In summary, the application has the advantages that:
the method comprises the steps of carrying out key point detection on faces of all frames of head images in an acquired face video, carrying out pretreatment of alignment, clipping and scaling on all frames of head images based on all key points to obtain a plurality of face images, and normalizing the frames of the face images; performing optical flow calculation on each face image to obtain an optical flow image sequence, and inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction and weighted average to obtain a two-dimensional feature map; intercepting a feature area from a two-dimensional feature map, inputting the feature area into an LSTM network to extract time sequence features, weighting and combining the time sequence features to obtain scores of different categories of micro-expressions, and finally mapping the scores into probabilities of the different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions; the PCANet+ network and the LSTM network are combined, the PCANet+ network is used for extracting space features, the LSTM network is used for extracting time sequence features, space-time features of the micro-expression are effectively extracted, the PCANet+ network can directly calculate network parameters of the current layer through input of the current layer, parameters and calculated amount of the network are reduced, fitting is avoided, and finally the recognition effect of the micro-expression is greatly improved.
While specific embodiments of the application have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the application, and that equivalent modifications and variations of the application in light of the spirit of the application will be covered by the claims of the present application.

Claims (10)

1. A microexpressive recognition method based on PCANet+ and LSTM is characterized in that: the method comprises the following steps:
step S10, acquiring a portrait video, and detecting key points of faces of head images of frames in the portrait video;
step S20, preprocessing of alignment, clipping and scaling is carried out on each frame of human head image based on each key point, so as to obtain a plurality of human face images;
step S30, normalizing the frame number of the face image;
step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
s50, inputting the optical flow image sequence into a PCANet+network for spatial feature extraction to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image;
step S60, intercepting feature areas from a two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
and step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions.
2. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S10 specifically includes:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
3. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S20 specifically includes:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
4. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S30 specifically includes:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
5. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: in the step S60, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.
6. A microexpressive recognition system based on pcanet+ and LSTM, characterized in that: the device comprises the following modules:
the key point detection module is used for acquiring a portrait video and detecting key points of faces of head images of frames in the portrait video;
the human head image preprocessing module is used for preprocessing the alignment, cutting and scaling of human head images of each frame based on each key point to obtain a plurality of human face images;
the frame number normalization module is used for normalizing the frame number of the face image;
the optical flow calculation module is used for carrying out optical flow calculation on each face image to obtain an optical flow image sequence;
the PCANet+ feature extraction module is used for inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction to obtain a plurality of feature images, and performing weighted average on each feature image to obtain a two-dimensional feature image;
the LSTM feature extraction module is used for intercepting feature areas from the two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;
and the score mapping module is used for mapping the score into probabilities of different categories of micro expressions by using a Softmax function so as to complete the identification of the micro expressions.
7. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the key point detection module is specifically configured to:
and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.
8. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the human head image preprocessing module is specifically used for:
and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.
9. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the frame number normalization module is specifically configured to:
and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.
10. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: in the LSTM feature extraction module, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.
CN202310681811.1A 2023-06-09 2023-06-09 Microexpressive recognition method and system based on PCANet+ and LSTM Pending CN116645717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310681811.1A CN116645717A (en) 2023-06-09 2023-06-09 Microexpressive recognition method and system based on PCANet+ and LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310681811.1A CN116645717A (en) 2023-06-09 2023-06-09 Microexpressive recognition method and system based on PCANet+ and LSTM

Publications (1)

Publication Number Publication Date
CN116645717A true CN116645717A (en) 2023-08-25

Family

ID=87622886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310681811.1A Pending CN116645717A (en) 2023-06-09 2023-06-09 Microexpressive recognition method and system based on PCANet+ and LSTM

Country Status (1)

Country Link
CN (1) CN116645717A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275070A (en) * 2023-10-11 2023-12-22 中邮消费金融有限公司 Video facial mask processing method and system based on micro-expressions

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275070A (en) * 2023-10-11 2023-12-22 中邮消费金融有限公司 Video facial mask processing method and system based on micro-expressions

Similar Documents

Publication Publication Date Title
WO2022111236A1 (en) Facial expression recognition method and system combined with attention mechanism
CN109472198B (en) Gesture robust video smiling face recognition method
CN109919977B (en) Video motion person tracking and identity recognition method based on time characteristics
KR102174595B1 (en) System and method for identifying faces in unconstrained media
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
Kumar et al. Learning-based approach to real time tracking and analysis of faces
CN109389045B (en) Micro-expression identification method and device based on mixed space-time convolution model
CN111046734B (en) Multi-modal fusion sight line estimation method based on expansion convolution
CN112801043A (en) Real-time video face key point detection method based on deep learning
Zhao et al. Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection
JP2014211719A (en) Apparatus and method for information processing
CN106529441B (en) Depth motion figure Human bodys' response method based on smeared out boundary fragment
Vadlapati et al. Facial recognition using the OpenCV Libraries of Python for the pictures of human faces wearing face masks during the COVID-19 pandemic
CN116645717A (en) Microexpressive recognition method and system based on PCANet+ and LSTM
CN108647605B (en) Human eye gaze point extraction method combining global color and local structural features
CN112488165A (en) Infrared pedestrian identification method and system based on deep learning model
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
CN116645718A (en) Micro-expression recognition method and system based on multi-stream architecture
Gürel Development of a face recognition system
Wang et al. An attention self-supervised contrastive learning based three-stage model for hand shape feature representation in cued speech
Kostov et al. Method for simple extraction of paralinguistic features in human face
Yamamoto et al. Algorithm optimizations for low-complexity eye tracking
Meshram et al. Convolution Neural Network based Hand Gesture Recognition System
Gottumukkal et al. Real time face detection from color video stream based on PCA method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination