CN116645717A

CN116645717A - Microexpressive recognition method and system based on PCANet+ and LSTM

Info

Publication number: CN116645717A
Application number: CN202310681811.1A
Authority: CN
Inventors: 姚俊峰; 王仕琪; 龙飞
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-08-25

Abstract

The application provides a microexpressive recognition method and a microexpressive recognition system based on PCANet+ and LSTM, and the method comprises the following steps: step S10, detecting key points of faces of head images of frames in a portrait video; step S20, preprocessing the head image based on key points to obtain a plurality of face images; step S30, normalizing the frame number of the face image; step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence; s50, inputting an optical flow image sequence into a PCANet+ network to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image; step S60, intercepting feature areas from the two-dimensional feature map based on key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted combination to obtain scores of different categories of micro expressions; step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function. The application has the advantages that: the recognition effect of the micro-expressions is greatly improved.

Description

Microexpressive recognition method and system based on PCANet+ and LSTM

Technical Field

The application relates to the technical field of expression recognition, in particular to a microexpressive recognition method and system based on PCANet+ and LSTM.

Background

Along with the progress of technology, artificial intelligence is also continuously developed, wherein the artificial intelligence comprises an expression recognition technology, and the emotion and psychological activities of the current person can be rapidly judged by automatically recognizing the micro-expression of the person in the video. However, the conventional deep learning-based micro-expression recognition method often comprises deeper network layers, so that parameters are too many, fitting is easy to occur in the process of recognizing micro-expressions with smaller data quantity, and a two-dimensional convolutional neural network cannot extract complete time sequence information of the micro-expressions, so that the recognition effect of the micro-expressions is poor.

Therefore, how to provide a microexpressive recognition method and system based on PCANet+ and LSTM to achieve the effect of improving microexpressive recognition becomes a technical problem to be solved urgently.

Disclosure of Invention

The application aims to solve the technical problem of providing a microexpressive recognition method and a microexpressive recognition system based on PCANet+ and LSTM, so as to achieve the effect of improving microexpressive recognition.

In a first aspect, the present application provides a microexpressive recognition method based on pcanet+ and LSTM, including the steps of:

step S10, acquiring a portrait video, and detecting key points of faces of head images of frames in the portrait video;

step S20, preprocessing of alignment, clipping and scaling is carried out on each frame of human head image based on each key point, so as to obtain a plurality of human face images;

step S30, normalizing the frame number of the face image;

step S40, carrying out optical flow calculation on each face image to obtain an optical flow image sequence;

s50, inputting the optical flow image sequence into a PCANet+network for spatial feature extraction to obtain a plurality of feature images, and carrying out weighted average on each feature image to obtain a two-dimensional feature image;

step S60, intercepting feature areas from a two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;

and step S70, mapping the score into probabilities of different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions.

Further, the step S10 specifically includes:

and acquiring a portrait video, and detecting 68 key points of the face of each frame of head image in the portrait video through an active shape model.

Further, the step S20 specifically includes:

and acquiring left-eye inner corner points and right-eye inner corner points in each frame of human head image from each key point, carrying out rotation alignment on each frame of human head image based on connecting lines of the left-eye inner corner points and the right-eye inner corner points, cutting face areas in the human head image based on each key point, and scaling each cut face area to a uniform size so as to finish preprocessing of each frame of human head image, thereby obtaining a plurality of human face images.

Further, the step S30 specifically includes:

and after carrying out gray level processing on the face image of each frame, normalizing the frame number of the face image by using a time interpolation algorithm.

Further, in the step S60, the characteristic region includes at least an eyebrow region, an eye region, a mouth region, and a nose region.

In a second aspect, the present application provides a microexpressive recognition system based on pcanet+ and LSTM, comprising the following modules:

the key point detection module is used for acquiring a portrait video and detecting key points of faces of head images of frames in the portrait video;

the human head image preprocessing module is used for preprocessing the alignment, cutting and scaling of human head images of each frame based on each key point to obtain a plurality of human face images;

the frame number normalization module is used for normalizing the frame number of the face image;

the optical flow calculation module is used for carrying out optical flow calculation on each face image to obtain an optical flow image sequence;

the PCANet+ feature extraction module is used for inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction to obtain a plurality of feature images, and performing weighted average on each feature image to obtain a two-dimensional feature image;

the LSTM feature extraction module is used for intercepting feature areas from the two-dimensional feature map based on the key points, inputting the feature areas into an LSTM network to extract time sequence features, and carrying out weighted summation on the time sequence features to obtain scores of different categories of micro expressions;

and the score mapping module is used for mapping the score into probabilities of different categories of micro expressions by using a Softmax function so as to complete the identification of the micro expressions.

Further, the keypoint detection module is specifically configured to:

Further, the human head image preprocessing module is specifically configured to:

Further, the frame number normalization module is specifically configured to:

Further, in the LSTM feature extraction module, the feature region includes at least an eyebrow region, an eye region, a mouth region, and a nose wing region.

The application has the advantages that:

the method comprises the steps of carrying out key point detection on faces of all frames of head images in an acquired face video, carrying out pretreatment of alignment, clipping and scaling on all frames of head images based on all key points to obtain a plurality of face images, and normalizing the frames of the face images; performing optical flow calculation on each face image to obtain an optical flow image sequence, and inputting the optical flow image sequence into a PCANet+ network to perform space feature extraction and weighted average to obtain a two-dimensional feature map; intercepting a feature area from a two-dimensional feature map, inputting the feature area into an LSTM network to extract time sequence features, weighting and combining the time sequence features to obtain scores of different categories of micro-expressions, and finally mapping the scores into probabilities of the different categories of micro-expressions by using a Softmax function so as to complete identification of the micro-expressions; the PCANet+ network and the LSTM network are combined, the PCANet+ network is used for extracting space features, the LSTM network is used for extracting time sequence features, space-time features of the micro-expression are effectively extracted, the PCANet+ network can directly calculate network parameters of the current layer through input of the current layer, parameters and calculated amount of the network are reduced, fitting is avoided, and finally the recognition effect of the micro-expression is greatly improved.

Drawings

The application will be further described with reference to examples of embodiments with reference to the accompanying drawings.

Fig. 1 is a flowchart of a microexpressive recognition method based on pcanet+ and LSTM according to the present application.

Fig. 2 is a schematic structural diagram of a microexpressive recognition system based on pcanet+ and LSTM according to the present application.

Detailed Description

The technical scheme in the embodiment of the application has the following overall thought: the PCANet+ network and the LSTM network are combined, space features are extracted through the PCANet+ network, time sequence features are extracted through the LSTM network, so that space-time features of the micro-expressions are effectively extracted, the PCANet+ network can directly calculate network parameters of the current layer through input of the current layer, parameters and calculated amount of the network are reduced, and overfitting is avoided, so that the recognition effect of the micro-expressions is improved.

Referring to fig. 1 to 2, a preferred embodiment of a microexpressive recognition method based on pcanet+ and LSTM according to the present application includes the following steps:

step S30, normalizing the frame number of the face image;

considering that the micro-expressions only have more obvious motions in partial areas of the face, not all the facial areas are helpful for classifying the micro-expressions, and therefore, the feature areas are intercepted from the two-dimensional feature map.

The step S10 specifically includes:

The active shape model is built on the basis of a point distribution model, priori knowledge such as gray level, size, shape and approximate position of an image is comprehensively considered when a human face is detected, a statistical model of characteristic point distribution of a training image sample is obtained by learning characteristic points marked by a training set, the statistical model is taken as an initial position, a shape model of a target image is obtained by continuous iteration, finally, shape constraint is applied to a testing set, and the best matched point is searched, so that the positioning of the human face characteristic points (key points) is realized.

The step S20 specifically includes:

In order to eliminate the influence of head rotation and offset on microexpression recognition, human face alignment is required to be carried out on a human head image according to the detected key points; because the relative positions of the characteristic points of the left-eye inner angle and the right-eye inner angle in the human face are relatively stable and cannot be changed due to facial muscle movement, the application horizontally aligns the connecting lines of the left-eye inner angle and the right-eye inner angle, then cuts the human face area according to the aligned images and key points, and removes areas, such as clothes, background, hair and the like, which are irrelevant to micro-expression in the human head image; the size of the face image obtained by cutting is not completely consistent due to the rotation of the head in the head image, the change of factors such as distance from the camera, and the like, so that the size needs to be unified in order to adapt to the requirement of a subsequent network on the input size.

The step S30 specifically includes:

Because of different durations of the microexpressions in the acquisition process and different camera frame rates used by different data sets, the obtained microexpressive image sequences are inconsistent in length, and in the application, an LSTM network is needed to be used when microexpressive features are extracted, so that the subsequent LSTM network processing is convenient, and a time interpolation algorithm is needed to normalize the frame number of the microexpressive sample sequence (face image) during data preprocessing. The time interpolation algorithm is firstly used for lip language identification, and is a stream-based interpolation method, which maps the whole micro expression sequence into a curve, each frame in the sequence corresponds to a point in the curve, and then resamples on the curve to obtain an interpolated image sequence.

In step S40, the optical flow is a two-dimensional vector field on the image plane, describing the instantaneous speed of the pixel motion of two consecutive frames in a video sequence; in order to improve the effect of PCANet+network feature learning, the application performs dense optical flow calculation on the image sequence (face image) of the micro-expression so as to enhance the face movement information. Optical flow computation relies on two basic assumptions: 1. the brightness is kept unchanged, namely, the brightness of the pixels at the corresponding positions of the object in the motion and shooting process is kept unchanged in the image sequence; 2. the temporal continuity, i.e. the movement of the target pixel between adjacent frames, cannot be too severe, which ensures that the pixel coordinates are temporally continuous and delicate.

The optical flow calculation process is as follows:

the optical flow method represents an image sequence as a three-dimensional matrix, the brightness of a certain pixel in the sequence is represented as I (x, y, t), wherein x, y, t is the time-space coordinate of the pixel, the pixel reaches the next frame after the time of deltat, displacement of deltax and deltay is completed in the image, and according to the assumption that the brightness is unchanged, the intensity of the pixel before and after the movement is constant, and the following formula is obtained:

I(x,y,t)＝I(x+Δx,y+Δy,t+Δt)---------------------------(1)

based on the assumption of time continuity, taylor expansion is performed on the right side of equation (1), resulting in the following equation:

wherein epsilon represents a higher order infinitely small and can be ignored; substituting formula (2) into formula (1) and dividing by Δt yields the following formula:

let u and v represent velocity vectors of the pixel in x-axis and y-axis, i.e. u=Δx/Δt, v=Δy/Δt, respectively, which are substituted into formula (3), resulting in the following formula:

I _x u+I _y v+I _t ＝0---------------------------------------(4)

(u, v) is the optical flow field generated by the pixel in delta t time, and can be solved by adding constraint conditions; different constraint conditions are added, so that different optical flow field calculation methods can be obtained. The method applies the TV-L1 algorithm to the calculation of the optical flow field, and the TV-L1 algorithm introduces a subspace track model to ensure the time consistency of the optical flow, and can retain the edge characteristics in the image; for one pixel point in the sequence of micro-expressions, the continuous optical flow field is calculated first using the optical flow estimation loss function shown in equation (5):

wherein L represents the length of the sequence of microexpressive images;an R-based trajectory representing a trajectory space for constructing a trajectory space; />Representing a spatial domain of the image; lin: tight>The mapping function is represented as a function of the mapping,the optical flow field u (t), v (t) can be mapped to a new space constructed by the R-base trajectory; the first term of equation (5) represents a penalty term for the constant luminance constraint, the second term is used to place the derived optical flow on the base trajectory, and the third term represents the spatial regularization of the trajectory model coefficients based on the total variation.

Setting a sequence of micro-expression images asSetting the first frame as a reference frame, and calculating optical flow components U in the horizontal direction and the vertical direction of all the other frames according to the optical flow calculation method>Then stacking the optical flow sequences obtained by calculation in a frame unit, and inputting the optical flow sequences into a PCANet+network; for the optical flow sequence U, ++>The sliding window with the size of T and the step length of s is used for sampling the sliding window to obtain two subsequence sets mu and v, and the two subsequence sets mu and v are defined as formula (6):

wherein, the liquid crystal display device comprises a liquid crystal display device,the elements of corresponding positions in μ and ν are connected by equation (6) to form an input sequence Γ consisting of stacked optical flow components:

the method comprises the steps of connecting optical flow components in a channel dimension, and stacking optical flow sequences in the horizontal direction and the vertical direction of each video segment in T frames through multi-channel stacking operation to obtain an optical flow image sequence with the channel number of 2T.

The step S50 specifically includes:

dividing a human image video into K segments, wherein each segment comprises T frames of human face images, and obtaining a set Γ= { I of a multi-channel optical flow image after optical flow calculation and stacking of each segment ₁ ,I ₂ ,...,I _K }, wherein I _i The multi-channel image corresponding to the ith video clip; and then sequentially inputting the optical flow images in the gamma into a PCANet+ network of two layers, wherein the number of the filters of the first PCA convolution layer is expressed as D1, the size is k1×k1, and the number of the filters of the second PCA convolution layer is expressed as D2, and the size is k2×k2. Each convolution layer is followed by a pooling layer, the first of which is an average pooling layer whose filter size is fixed at 3 x 3. The second pooling layer is the largest pooling layer, the filter size of which is set to 3 x 3. For multi-channel image I _i Obtaining a D2 two-dimensional characteristic image set O after passing through two layers of PCANet+networks ² ＝{O ² ,O ² ,...,O ² }。

And then directly taking the feature map (taking the face key local area) output by the PCANet+network second layer as the input of the subsequent LSTM network. The PCA filter of the second layer network is obtained by L before the layer as known from the filter learning process of PCANet + ₂ The feature vector corresponding to the maximum feature value is converted. The filter with larger characteristic value, the classification information contained in the characteristic diagram output after convolution is more important; therefore, the feature map output by the second layer is weighted and averaged according to the magnitude of the corresponding feature value of the filter, as shown in the formula (8), to obtain a two-dimensional feature map O _i '. At the same time, this is also advantageous for unifying the input size of the LSTM network so that it is not affected by the number of PCA filters.

Wherein, the liquid crystal display device comprises a liquid crystal display device,represent the firstA feature map of the j-th filter output of layer 2; lambda (lambda) _j And the characteristic value corresponding to the jth filter is represented.

In the step S60, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.

A sample in the microexpressive data set is preprocessed to obtain a sequence containing L pictures, when optical flow calculation is carried out, the first frame is used as a reference frame to calculate the optical flow of all other frames, so that L-1 dual-channel optical flow feature images containing horizontal optical flow components and vertical optical flow components are obtained, the optical flow feature images are stacked by T frames, and K=L-T+1 multichannel images are obtained, wherein the T value is an odd number. And (3) carrying out feature extraction and key region feature segmentation on a multichannel optical flow image by PCANet+ to obtain two-dimensional features near the eyebrows, the mouth and the nose wings, respectively converting the two-dimensional features into one-dimensional vectors, and then splicing the two-dimensional vectors to obtain the input of a time node of the LSTM network. Since the LSTM network inputs are time series data containing K feature vectors and outputs are scores for each category of microexpressions, a many-to-one expansion model is used for training.

The LSTM network-based feature learning model mainly comprises two parts, wherein the first part is a time sequence feature extraction network formed by two LSTM layers, and the second part is a classifier formed by a full connection layer and a Softmax function. Each LSTM layer is unfolded to K LSTM units, K time sequence data generated by the micro-expression samples are corresponding, the parameters of the gating units are adjusted, the sequence information is selectively memorized, and the extraction of the time sequence features is completed. And then, weighting and summing the extracted features by using a full connection layer to obtain the scores of the categories of the micro-expressions. Finally, mapping the score into probability by using a Softmax function, namely a final classification result of the micro-expression. The model uses a cross entropy loss function to optimize the network:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a predicted value of a kth emotion category corresponding to the microexpressive sample output by the Softmax function; y represents the one-hot encoding vector of the sample real label; y is _k The value of the sample in the kth emotion category is represented, and when the true label of the sample is k, the value is 1, otherwise the value is 0.

The application discloses a preferred embodiment of a microexpressive recognition system based on PCANet+ and LSTM, which comprises the following modules:

The key point detection module is specifically configured to:

The human head image preprocessing module is specifically used for:

The frame number normalization module is specifically configured to:

In the optical flow calculation module, the optical flow is a two-dimensional vector field on an image plane, and the instantaneous speed of the pixel motion of two continuous frames in a video sequence is described; in order to improve the effect of PCANet+network feature learning, the application performs dense optical flow calculation on the image sequence (face image) of the micro-expression so as to enhance the face movement information. Optical flow computation relies on two basic assumptions: 1. the brightness is kept unchanged, namely, the brightness of the pixels at the corresponding positions of the object in the motion and shooting process is kept unchanged in the image sequence; 2. the temporal continuity, i.e. the movement of the target pixel between adjacent frames, cannot be too severe, which ensures that the pixel coordinates are temporally continuous and delicate.

The optical flow calculation process is as follows:

I(x,y,t)＝I(x+Δx,y+Δy,t+Δt)---------------------------(1)

I _x u+I _y v+I _t ＝0---------------------------------------(4)

wherein L represents the length of the sequence of microexpressive images;an R-based trajectory representing a trajectory space for constructing a trajectory space; />Representing an imageA spatial domain; lin: tight>Representing a mapping function, the optical flow field u (t), v (t) can be mapped to a new space constructed by the R-based trajectory; the first term of equation (5) represents a penalty term for the constant luminance constraint, the second term is used to place the derived optical flow on the base trajectory, and the third term represents the spatial regularization of the trajectory model coefficients based on the total variation.

The PCANet+ feature extraction module is specifically configured to:

Wherein, the liquid crystal display device comprises a liquid crystal display device,a feature map representing the j-th filter output of layer 2; lambda (lambda) _j And the characteristic value corresponding to the jth filter is represented.

In the LSTM feature extraction module, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.

In summary, the application has the advantages that:

While specific embodiments of the application have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the application, and that equivalent modifications and variations of the application in light of the spirit of the application will be covered by the claims of the present application.

Claims

1. A microexpressive recognition method based on PCANet+ and LSTM is characterized in that: the method comprises the following steps:

step S30, normalizing the frame number of the face image;

2. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S10 specifically includes:

3. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S20 specifically includes:

4. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: the step S30 specifically includes:

5. The micro-expression recognition method based on PCANet+ and LSTM as recited in claim 1, wherein: in the step S60, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.

6. A microexpressive recognition system based on pcanet+ and LSTM, characterized in that: the device comprises the following modules:

7. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the key point detection module is specifically configured to:

8. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the human head image preprocessing module is specifically used for:

9. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: the frame number normalization module is specifically configured to:

10. The micro-expression recognition system based on pcanet+ and LSTM of claim 6 wherein: in the LSTM feature extraction module, the feature area includes at least an eyebrow area, an eye area, a mouth area, and a nose wing area.