CN109614927B

CN109614927B - Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction

Info

Publication number: CN109614927B
Application number: CN201811499959.9A
Authority: CN
Inventors: 张延良; 郭辉; 李赓; 桂伟峰; 王俊峰; 蒋涵笑; 卢冰
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2022-11-08
Anticipated expiration: 2038-12-10
Also published as: CN109614927A

Abstract

The application provides a micro-expression recognition method, which is used for carrying out face recognition on each frame in a video and extracting a face area; extracting the pixel number, the background color and the face brightness of each frame in the video; sequentially selecting a non-first frame, and calculating the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of the selected frame and the previous frame; calculating a difference value of each non-first frame; determining frames with difference values larger than a preset threshold value and the first frame of the video as candidate frames; in the candidate frames, determining frames with continuous marks as micro-expression frames; and extracting the expression features of the micro-expression frames, performing dimensionality reduction on the expression features through a pre-trained dimensionality reduction model, and identifying the dimensionality-reduced features to obtain an identification result. According to the method and the device, the micro expression frame is selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frame related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frame are improved.

Description

Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a micro-expression identification method.

Background

Micro-expressions are non-verbal behaviors that can reveal a person's own emotions.

At present, common expressions are mainly focused on, and besides the common facial expressions, micro expressions generated by uncontrolled contraction of facial muscles in a psychological suppression state exist.

The duration of micro expression is short and the amplitude of action is very small. There is considerable difficulty in correctly observing and identifying. The success rate of accurately capturing and identifying the micro-expressions with the naked eye is low. After professional training, the recognition rate can only reach 47%.

Therefore, recognition methods of micro-expressions are receiving increasing attention from researchers.

Disclosure of Invention

In order to solve the above problems, the embodiments of the present application provide a micro expression recognition method.

Acquiring a face video;

carrying out face recognition on each frame in the video, and extracting a face area;

extracting the pixel number, the background color and the face brightness of each frame in the video;

sequentially selecting a non-first frame, and calculating the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of the selected frame and the previous frame;

calculating a difference value of each non-first frame, wherein the difference value = (the area difference of the face region = the face brightness difference + the background color difference) ^ the pixel number difference;

determining frames with difference values larger than a preset threshold value and determining the first frame of the video as candidate frames;

in the candidate frames, determining frames with continuous marks as micro-expression frames;

and extracting the expression features of the micro-expression frames, performing dimensionality reduction on the expression features through a pre-trained dimensionality reduction model, and identifying the dimensionality-reduced features to obtain an identification result.

Optionally, extracting a background color of each frame in the video includes:

for any one of the frames in the video,

determining a non-face area in any frame as a background area;

determining RGB color values of all pixel points in the background area of any frame, wherein the RGB color values comprise red color values, green color values and blue color values;

calculating the RGB color mean value of the background area of any frame by the following formula, wherein the RGB color mean value comprises a red color mean value, a green color mean value and a blue color mean value:

wherein j is the pixel point identification of the background area of any frame,

is the red color mean of the background area of any frame,

is the average of the green color of the background area of any frame,

is the average value of blue color of the background area of any frame, c _1j A red color value, c, of a j-th pixel point in the background area of any frame _2j Green color value of j-th pixel point in any frame background area, c _3j The blue color value, n, of the j-th pixel point in the background area of any frame ₁ The total number of pixel points in any frame background region;

calculating the RGB color mean square error of the background area of any frame, wherein the RGB color mean square error comprises a red color mean square error, a green color mean square error and a blue color mean square error:

wherein σ ₁₁ Mean square error of red color, σ ₂₁ Mean square error of green color, σ ₃₁ Is the blue color mean square error;

determining RGB color intervals of any frame background area, wherein the RGB color intervals comprise red color intervals

Green color interval

Blue color interval

Determining the number n2 of pixel points of which the red color values are located in a red color interval of an RGB color interval, the green color values are located in a green color interval, and the blue color values are located in a blue color interval, in all the pixel points of the background area of any frame;

and determining the background color of any frame according to the n 2.

Optionally, the background color is represented by RGB color values;

the determining the background color of any frame according to n2 comprises:

calculating the pixel number ratio n of any frame ₃ ＝n ₂ /n ₁ ；

The red color value of the background color of any frame is

A green color value of

A blue color value of

Optionally, extracting the face brightness of each frame in the video includes:

for any one frame in the video, the video is,

determining the brightness value of each pixel point in the face area of any frame according to the following formula:

wherein k is the pixel point identifier of any frame of face region, h _k The brightness value R of the kth pixel point of any frame of human face region _k Is a red color value, G, of the RGB color values of the k-th pixel point _k Is a green color value, B, of the RGB color values of the k-th pixel point _k The color value of blue in the RGB color values of the kth pixel point;

determining a maximum brightness value and a minimum brightness value in the brightness values of all pixel points of any frame of human face region;

calculating the brightness average value of the human face area of any frame

Wherein n is ₄ The total number of pixel points in the face region of any frame is calculated;

according to the maximum brightness value, the minimum brightness value and

and determining the face brightness of any frame in the video.

Optionally, the function is based on a maximum luminance value, a minimum luminance value and

determining the face brightness of any frame in the video, comprising:

calculating a first difference d1= maximum luminance value-minimum luminance value;

calculating a second difference value

Calculating a third difference value

Calculating a brightness ratio d4= | d1-d2|/| d1-d3|;

calculating the mean square error of the brightness of the human face region of any frame

The face brightness of any frame in the video is

Optionally, before calculating the disparity value of each non-leading frame, the method further includes:

performing primary screening on the non-first frame according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of each non-first frame;

the calculating a difference value of each non-first frame includes:

and calculating the difference value of each frame after the initial screening.

Optionally, the performing a preliminary screening on the non-first frame according to the face area difference, the pixel number difference, the background color difference, and the face brightness difference of each non-first frame includes:

for any of the non-leading frames, the frame is,

if the area difference of the face region of any non-first frame is not larger than a first value, the pixel number difference is not larger than a second value, the background color difference is not larger than a third value, and the face brightness difference is not larger than a fourth value, the non-first frame passes through a primary screen; alternatively, the first and second electrodes may be,

if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then the non-first frame passes through the primary screening; alternatively, the first and second liquid crystal display panels may be,

if the face brightness difference of any non-first frame is not larger than a fourth value, but the face area difference, the pixel number difference and the background color difference are all 0, the non-first frame passes through a primary screen;

the first value is (sum of face area differences of all non-first frames + face area-avg 1 of the first frames)/total frame number of the face video, the second value is (sum of pixel number differences of all non-first frames + pixel number-avg 2 of the first frames)/total frame number of the face video, the third value is (sum of background color differences of all non-first frames + background color-avg 3 of the first frames)/total frame number of the face video, the fourth value is (sum of face brightness differences of all non-first frames + face brightness-avg 4 of the first frames)/total frame number of the face video, avg1= face area sum of all frames/total frame number of the face video, avg2= sum of all frames/total frame number of the face video, avg3= background color sum of all frames/total frame number of the face video, avg4= face brightness sum of all frames/total frame number of the face video.

Optionally, before the performing the dimension reduction processing on the expression features through the pre-trained dimension reduction model, the method further includes:

obtaining a sample set X, wherein the total number of samples in the X is m, each sample comprises a plurality of expression features, and each sample belongs to one category;

classifying all samples according to categories;

calculating mean vectors of classes

Wherein i is a class identifier, mu _i Is a mean vector of class i, b _i Is the number of samples of the i-th class, j is the sample identification, x _ij A vector formed by expression characteristics of the ith sample and the jth sample;

determining the total mean vector according to the mean vectors of all types

Wherein, mu ₀ E is the total mean vector, and E is the total number of different classes to which the samples in X belong;

calculating an inter-class variance vector and an intra-class variance vector according to the total mean vector;

and determining the expression characteristics after dimensionality reduction according to the between-class variance vector and the within-class variance vector to form a dimensionality reduction model.

Optionally, the calculating an inter-class variance vector and an intra-class variance vector according to the overall mean vector includes:

wherein S is _w Is an inter-class variance vector, S _b Is an intra-class variance vector, X _i Is a set composed of the ith type samples.

Optionally, the determining the expression features after the dimensionality reduction according to the between-class variance vector and the within-class variance vector includes:

calculating a weight vector W = diag (S) composed of the weights of the expression features _b ·/S _w ) Wherein diag () is a function for taking the elements on the diagonal of the matrix, ·/is an operator for applying S _w And S _b Is divided by the corresponding element;

sorting the expression features according to the sequence of the weights of the expression features from big to small;

and determining a preset number of expression features which are ranked in the front as the expression features after dimension reduction.

The beneficial effects are as follows:

the micro expression frames are selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frames related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frames are improved.

Drawings

Specific embodiments of the present application will be described below with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram illustrating a principle of a dimension reduction model classified into 2 classes according to an embodiment of the present application;

FIG. 2 is a flow chart of a micro expression recognition method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating LBP descriptor calculation according to an embodiment of the present application;

fig. 4 shows a schematic diagram of feature extraction provided in an embodiment of the present application.

Detailed Description

In order to make the technical solutions and advantages of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and not an exhaustive list of all embodiments. And the embodiments and features of the embodiments in the present description may be combined with each other without conflict.

Due to the short duration of micro-expression and the very small amplitude of the motion. There is considerable difficulty in correctly observing and identifying. Based on the above, the application provides a micro expression frame identification method, the method compares the difference between each frame and the next frame and the difference between each frame and the previous frame to obtain the difference value of the frame, and the micro expression frame is determined according to the difference value of each frame.

The expression recognition method provided by the application comprises 2 major processes, wherein the first major process is a dimension reduction model training process, and the other major process is an actual micro expression recognition process based on a trained dimension reduction model.

The training dimension reduction model process is not a process which is executed every time the expression recognition method provided by the application is executed, and only when the expression recognition method provided by the application is executed for the first time, or an expression recognition scene is changed, or when the actual micro expression recognition is carried out based on the trained dimension reduction model, the dimension reduction effect of the expression characteristics is not ideal, or other reasons exist, the training dimension reduction model process is executed, so that the dimension reduction effect of the expression characteristics is improved, and the accuracy of the actual micro expression recognition result is improved.

The method and the device do not limit the execution triggering conditions of the process of training the dimension reduction model.

The specific implementation method of the process of training the dimension reduction model is as follows:

step 1, a sample set X is obtained.

Wherein the total number of samples in X is m, each sample comprises a plurality of expression features, and each sample belongs to one category.

For example, if the samples in X belong to E different classes, i.e., class 1, class 2, … …, class i, … …, class E. In class 1 having b ₁ A sample, b ₁ The set of samples is X ₁ In class 2 there are b ₂ A sample, b ₂ The set of samples is X ₂ ，……。

And 2, classifying all samples according to categories.

Taking the column in step 1 as an example, all samples are classified into class E in this step, the sample belonging to class 1 is classified into class 1, and the sample belonging to class 2 is classified into class 1, … ….

And 3, calculating the mean vectors of all types.

Specifically, for any class (e.g., class i), the mean vector is calculated by the following formula:

wherein i is a class identifier, mu _i Is a mean vector of class i, b _i Is the number of samples of the i-th class, j is the sample identification, x _ij And forming a vector for the expressive features of the ith sample of the ith type.

And 4, determining a total mean vector according to the various mean vectors.

Specifically, the overall mean vector is determined by the following formula:

wherein, mu ₀ And E is the total mean vector, and is the total number of different classes to which the samples in X belong.

And 5, calculating an inter-class variance vector and an intra-class variance vector according to the total mean vector.

The specific calculation formula is as follows:

And 6, determining the expression characteristics after dimensionality reduction according to the between-class variance vector and the within-class variance vector to form a dimensionality reduction model.

The specific calculation method is as follows:

1) Calculating a weight vector W = diag (S) composed of the weights of the expression features _b ·/S _w )。

Wherein diag () is a function that takes the elements on the diagonal of the matrix,. Or is an operator that is used to apply S _w And S _b Divided by the corresponding element.

2) And sorting the expression features according to the sequence of the weights of the expression features from large to small.

3) And determining a preset number of expression features which are ranked in the front as the expression features after dimension reduction.

The reduced expressive features may form a feature subset F. The larger the weight is, the more suitable the feature component corresponding to the weight is for the micro expression classification.

And outputting to obtain a feature subset to form a dimension reduction model.

FIG. 1 shows a schematic diagram of a dimension reduction model divided into classes 2.

The method for implementing the actual micro expression recognition process based on the trained dimension reduction model is shown in fig. 2:

s101, acquiring a face video.

Because the duration of the micro expression is short and the action amplitude is very small, the face video image in the step only needs to include the face in each frame, and the precise corresponding video of the micro expression is not needed.

And S102, carrying out face recognition on each frame in the video and extracting a face area.

The present embodiment does not limit the extraction method of the face region, and any existing extraction method may be used.

And S103, extracting the pixel number, the background color and the face brightness of each frame in the video.

The pixel values of video files obtained by video acquisition equipment with different configurations are different, and the pixel numbers of the previous frame and the next frame are different, so that the micro expression recognition is influenced, and therefore, the pixel number of each frame in the video is extracted.

The number of pixels can be expressed in terms of a number, such as a "0.3 megapixel" digital camera, which has a nominal 30 thousand pixels; it may also be represented by a pair of numbers, for example "640 x 480 display", which represents transverse 640 pixels and longitudinal 480 pixels (e.g. VGA display). A pair of numbers can also be converted into one number, for example, 640 × 480=307200 pixels in a 640 × 480 display.

The number of pixels of each frame in this step is the total number of pixels in the frame, and can be calculated by the resolution of the image. If the image resolution of one frame is 1280 × 960, the number of pixels of the frame =1280 × 960=1228800.

In the present embodiment, the number of pixels is not limited, and any conventional extraction method may be used.

For the implementation method of extracting the background color of each frame in the video, the method includes but is not limited to:

for any frame in the video, the video is,

step 1.1, determining a non-face area in any frame as a background area.

Step 1.2, determining the RGB color value of each pixel point in any frame background area.

Wherein, the RGB color value comprises a red color value, a green color value and a blue color value.

And 1.3, calculating the RGB color mean value of the background area of any frame by the following formula.

Wherein, RGB color mean includes red color mean, green color mean, blue color mean:

j is the pixel point identification of any frame background area,

is the average of the red color of the background area of any frame,

is the average of the green color of the background area of any frame,

the average value of blue color of any frame background area, c _1j Red color value of j-th pixel point of any frame background area, c _2j Green color value of j-th pixel point of any frame background area, c _3j The blue color value of the j pixel point of any frame background area, n ₁ The total number of pixel points in any frame background region.

And 1.4, calculating the mean square error of the RGB color of the background area of any frame.

Wherein, the RGB color mean square error comprises a red color mean square error, a green color mean square error and a blue color mean square error:

σ ₁₁ mean square error of red color, σ ₂₁ Mean square error of green color, σ ₃₁ Is the blue color mean square error.

Step 1.5, determining the RGB color interval of any frame background area.

Wherein the RGB color interval comprises a red color interval

Green color interval

Blue color interval

Step 1.6, determining the number n of pixel points with red color values in RGB color values in a red color interval in an RGB color interval, and determining the number n of pixel points with green color values in a blue color interval in all pixel points of any frame of background area ₂ 。

Step 1.7, according to n ₂ The background color of any frame is determined.

Wherein the background color is represented by RGB color values, the RGB color values including a red color value, a green color value, and a blue color value.

Specifically, the pixel number ratio n of any frame is calculated ₃ ＝n ₂ /n ₁ (ii) a The red color value of the background color of any frame is

A green color value of

A blue color value of

The background color extraction method provided in this embodiment does not simply use the mean value of each color channel in each pixel RGB color value in the background as the background color, but dynamically adjusts the mean value according to the distribution condition of each color channel corresponding value in each pixel RGB color value, and uses the adjusted value as the background color, so that the determination of the background color better conforms to the actual situation.

For the implementation scheme of extracting the face brightness of each frame in the video, the implementation schemes include but are not limited to:

for any frame in the video, the video is,

and 2.1, determining the brightness value of each pixel point in the face area of any frame through the following formula.

Wherein k is the pixel point identification of any frame of face region, h _k The brightness value, R, of the kth pixel point of any frame of face region _k Is a red color value, G, of the RGB color values of the k-th pixel point _k Is a green color value, B, of the RGB color values of the k-th pixel point _k RGB color value of k-th pixel pointBlue color value of (1).

And 2.2, determining the maximum brightness value and the minimum brightness value in the brightness values of all pixel points in any frame of human face area.

Step 2.3, calculating the brightness mean value of any frame of human face area

Wherein n is ₄ The total number of pixel points in the face region of any frame.

Step 2.4, according to the maximum brightness value, the minimum brightness value and

the face brightness of any frame in the video is determined.

In particular, the method comprises the following steps of,

1) The first difference d1= maximum luminance value-minimum luminance value is calculated.

2) Calculating a second difference value

3) Calculating a third difference value

4) The luminance ratio d4= | d1-d2|/| d1-d3| is calculated.

5) Calculating the mean square error of brightness of any frame of human face region

6) The face brightness of any frame in the video is

The face brightness extraction method provided by this embodiment does not simply use the average value of the brightness of each pixel in the face region as the face brightness, but dynamically adjusts the average value according to the difference between the brightness of each pixel and the maximum brightness and the minimum brightness, and uses the adjusted value as the face brightness, so that the determination of the face brightness better conforms to the actual situation.

S104, sequentially selecting a non-first frame, and calculating the area difference, the pixel number difference, the background color difference and the face brightness difference of the face area of the selected frame and the previous frame.

Sequentially selecting one frame from the beginning of the second frame to the end of the last frame, determining the difference of the areas of the face regions of the selected frame and the frame before the frame as the area difference of the face regions, determining the difference of the number of pixels as the number difference of the pixels, determining the difference of the background color as the color difference of the background color, and determining the difference of the face brightness as the brightness difference of the face.

For example, face region area difference = face region area of the selected frame-face region area of the frame immediately preceding it. Pixel number difference = the number of pixels of the selected frame-the number of pixels of the frame preceding it. Background color difference = background color of selected frame-background color of its previous frame. Face luminance difference = face luminance of the selected frame-face luminance of the frame immediately preceding it.

And S105, calculating the difference value of each non-first frame.

Wherein, the difference value = (human face area difference ^ human face brightness difference + background color difference) ^ pixel number difference.

And ^ is a power operator.

In addition, in order to increase the execution speed of the scheme provided by this embodiment, before calculating the difference value of each non-first frame, a preliminary screening may be performed on the non-first frame, and frames that are obviously not the same person and obviously do not belong to the micro expression are proposed.

Namely, the specific execution process of S105 is: and performing primary screening on the non-first frame according to the face area difference, the pixel number difference, the background color difference and the face brightness difference of each non-first frame, and calculating the difference value of each frame after primary screening.

The scheme for primarily screening the non-first frame according to the face area difference, the pixel number difference, the background color difference and the face brightness difference of each non-first frame includes but is not limited to:

for any non-first frame, if the area difference of the face region of any non-first frame is not greater than a first value, the pixel number difference is not greater than a second value, the background color difference is not greater than a third value, and the face brightness difference is not greater than a fourth value, passing the primary screening for any non-first frame; or if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then any non-first frame passes through the primary screening; or if the face brightness difference of any non-first frame is not larger than the fourth value, but the face area difference, the pixel number difference and the background color difference are all 0, then any non-first frame passes through the preliminary screening.

And S106, determining the frames with the difference values larger than a preset threshold value and the first frame of the video as candidate frames.

The preset threshold value ensures the difference, the difference amplitude of different micro expressions is different, and the preset threshold value is used for carrying out adaptive selection according to different application fields of the method, so that the universality of the expression recognition method provided by the application is ensured.

And S107, determining the frames which are continuously identified as the micro expression frames in the candidate frames.

For example, if the candidate frames are frame 3, frame 5, frame 6, frame 8, and frame 9, then the frames that identify the succession (frame 5, frame 6, frame 8, and frame 9) are all determined to be microexpressing frames.

It is likely that frame 5, frame 6, and frame 8, frame 9, represent a micro expression at this time.

The foregoing is merely exemplary and does not represent actual situations.

The present application does not limit "continuous" as long as it is a non-individual frame. For example, if there are 2 frames with consecutive marks, the 2 frames with consecutive marks are all determined as the microexpressive frames. For another example, if there are 3 frames with consecutive marks, all the 3 frames with consecutive marks are determined as the microexpressing frames.

And S108, extracting the expression features of the micro-expression frames, carrying out dimension reduction processing on the expression features through a dimension reduction model trained in advance, and identifying the features subjected to dimension reduction to obtain an identification result.

In the step, a micro expression recognition model can be trained in advance, the expression characteristics of the micro expression frame are extracted, the dimension reduction processing is carried out on the expression characteristics through the dimension reduction model trained in advance, and then the feature after dimension reduction is recognized by the micro expression recognition model to obtain a recognition result.

The training process of the micro expression recognition model comprises but is not limited to:

and 3.1, acquiring a plurality of sample videos.

The sample video may be obtained from an existing microexpression dataset.

Since micro-expressions are tiny facial movements that a person makes when trying to mask his mood. In a strict sense, the micro expression that people subjectively simulate cannot be called as the micro expression, so the induction mode of the micro expression determines the reliability degree of the data.

This step may obtain multiple sample videos from one or 2 of the following 2 existing micro-expression datasets:

the micro-expression dataset SMIC, established by the university of orlu, finland, requires the subject to watch a video with large mood swings and to try to mask his mood from being exposed, and the recorder observes the subject's expression without watching the video. If the recorder observes the facial expression of the subject, the subject gets a penalty. Under the induction mechanism, 164 video sequences of 16 persons are formed, the micro-expression categories are 3, namely positive (positive), surprise (negative) and negative (negative), and the number of the video sequences is 70, 51 and 43 respectively.

The micro-expression data set CASME2, established by the psychological research institute of the academy of Chinese sciences, employs a similar inducing mechanism to ensure the reliability of data, and only if the subject successfully suppresses his facial expression and is not discovered by the recorder, the subject is rewarded accordingly. The data set is 5 micro-expression categories consisting of 247 video sequences of 26 individuals, namely happy (happy), dislike (disgust), surprised (surrise), suppressed (suppression) and other (other), and the number of video sequences corresponding to each category is 32, 64, 25, 27 and 99 respectively.

And 3.2, for each sample video, extracting corresponding expression features by adopting a local binary pattern.

A Local Binary Pattern (LBP) descriptor is defined on the central pixel and its surrounding rectangular neighborhood, as shown in fig. 3, with the gray value of the central pixel as a threshold, the neighborhood pixels around the central pixel are Binary-quantized, the code greater than or equal to the central pixel value is 1, and the code smaller than the central pixel value is 0, and a Local Binary Pattern is formed.

And (4) connecting the binary mode in series in a clockwise direction by taking the upper left corner as a starting point to obtain a string of binary digits, wherein the corresponding decimal digits can uniquely identify the central pixel point. In this way, each pixel in the image can be computed using a local binary pattern.

As shown in fig. 3, the center pixel value in the left table is 178, the upper left corner is 65, 65 is straw 178, so the corresponding value is 0, 188>178, and so the corresponding value is 1. By analogy, the table on the right side of fig. 3 is obtained, and then the binary pattern value is 01000100.

In addition, the extension of the LBP static texture descriptor in the time-space domain can also form 2-dimensional local binary patterns on 3 orthogonal planes. As shown in fig. 4, LBP features of video sequences in three orthogonal planes XY, XT, and YT are extracted, and feature vectors in each orthogonal plane are concatenated to form an LBP-TOP feature vector. The method not only considers the local texture information of the image, but also describes the change condition of the video along with the time.

However, the vector dimension of LBP-TOP is 3X 2 ^L And L is the number of the field points. If the expression features extracted in the step 3.2 are directly used for modeling, the model training speed is low due to large feature dimensions, and the effect is poor. Therefore, after the expression features are extracted in the step 3.2, the step 3.3 is executed to reduce the dimension of the practical training model which is the considered expression features and improve the model training efficiency.

And 3.3, carrying out recognition training on each sample video to form a micro expression recognition model.

There may be various training methods in this step, and this embodiment provides that the following training methods are adopted:

3.3.1, clustering each sample video based on the expression characteristics by adopting any clustering algorithm (such as a k-means algorithm) to form micro expression classes to which each sample video belongs.

3.3.2, adjusting the parameters in the clustering algorithm according to the second standard classification result of each sample video.

Since each sample video has a label for identifying the micro-expression category, the label is obtained in the step and is used as a second standard classification result of each sample video.

3.3.3, repeating the steps of 3.3.1 and 3.3.2, finishing training and forming the micro expression recognition model.

The micro expression recognition model in the application is a classifier.

For example: a Support Vector Machine (SVM) method is employed. The key of the SVM is the kernel function, and different SVM classification effects can be achieved by adopting different kernel functions.

For example, the following kernel functions may be employed: linear Kernel, chi-square Kernel, histogram Intersection Kernel.

In addition, in order to improve the classification recognition rate of the finally trained classification model, cross Validation (Cross Validation) can be adopted to test the performance of the micro expression recognition model. Specifically, all sample videos are divided into two subsets, one subset for training the classifier is called a training set, and the other subset for verifying the effectiveness of the analysis classifier is called a testing set. And testing the trained classifier by using the test set as a performance index of the classifier. Common methods are simple cross validation, K-fold cross validation and leave-one-cross validation.

The leave-one-out cross validation method is used for performing micro-expression classification training on SVM classifiers with different kernel functions. And (3) selecting all video sequences of one subject as a test sample and all video sequences of the other I subjects as training samples, repeating the experiment for I +1 times, and calculating the average classification recognition rate for I +1 times.

Based on the above, the training of the micro expression recognition model is completed.

After the micro expression recognition model is trained, the expression features are subjected to dimensionality reduction through a pre-trained dimensionality reduction model, and the characteristics subjected to dimensionality reduction are recognized by the micro expression recognition model to obtain a recognition result.

Because the dimension reduction processing is carried out on the expression characteristics before the micro expression recognition is carried out through the micro expression recognition model, the recognition rate and the recognition accuracy rate of the micro expression recognition model can be improved.

It should be noted that "first", "second", "third", and the like in this embodiment and the subsequent embodiments are only used for distinguishing preset thresholds, classification results, standard classification results, and the like in different steps, and do not have any other special meaning.

Has the advantages that:

the micro expression frame is selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frame related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frame are improved.

Claims

1. A micro-expression recognition method, the method comprising:

acquiring a face video;

2. The method of claim 1, wherein extracting the background color of each frame in the video comprises:

for any one frame in the video, the video is,

determining a non-face area in any frame as a background area;

the red color of the background area of any frameThe average value of the average value is calculated,

is the average of the green color of the background area of any frame,

Green colour interval

Blue color interval

and determining the background color of any frame according to the n 2.

3. The method of claim 2, wherein the background color is represented by RGB color values;

the determining the background color of any frame according to n2 comprises:

calculating the pixel number ratio n of any frame ₃ ＝n ₂ /n ₁ ；

The red color value of the background color of any frame is

A green color value of

A blue color value of

4. The method of claim 1, wherein extracting the face luminance of each frame in the video comprises:

for any one frame in the video, the video is,

determining the brightness value of each pixel point in the face area of any frame through the following formula:

wherein k is the pixel point identification of any frame of face region, h _k The brightness value R of the kth pixel point of any frame of human face region _k Is a red color value, G, of the RGB color values of the k-th pixel point _k Is the green color value of the RGB color values of the k-th pixel point,B _k the color value of blue in the RGB color values of the kth pixel point;

calculating the brightness mean value of any frame of human face region

according to the maximum brightness value, the minimum brightness value and

and determining the face brightness of any frame in the video.

5. The method of claim 4, wherein the method is based on a maximum luminance value, a minimum luminance value, and

determining the face brightness of any frame in the video, comprising:

calculating a second difference value

Calculating a third difference value

Calculating a brightness ratio d4= | d1-d2|/| d1-d3|;

The face brightness of any frame in the video is

6. The method of claim 1, wherein before calculating the disparity value for each non-first frame, further comprising:

performing primary screening on the non-first frame according to the area difference, the pixel number difference, the background color difference and the face brightness difference of the face area of each non-first frame;

the calculating a difference value of each non-first frame includes:

and calculating the difference value of each frame after the initial screening.

7. The method of claim 6, wherein the preliminary screening of the non-first frame according to the face area difference, the pixel number difference, the background color difference, and the face brightness difference of each non-first frame comprises:

for any of the non-leading frames, the frame is,

if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then the non-first frame passes through the primary screening; alternatively, the first and second electrodes may be,

8. The method according to any one of claims 1 to 7, wherein before the dimensionality reduction processing is performed on the expression features through the pre-trained dimensionality reduction model, the method further comprises:

classifying all samples according to categories;

calculating mean vectors of classes

determining the total mean vector according to the mean vectors of all types

9. The method of claim 8, wherein computing the between-class variance vector and the within-class variance vector from the overall mean vector comprises:

10. The method of claim 9, wherein determining the reduced-dimension expression features according to the inter-class variance vector and the intra-class variance vector comprises:

calculating a weight vector W = diag (S) composed of the weights of the expression features _b ·/S _w ) Wherein diag () is a function that takes the elements on the diagonal of the matrix, ·/is an operator that operates on S _w And S _b Is divided by the corresponding element;