CN109190582B

CN109190582B - Novel micro-expression recognition method

Info

Publication number: CN109190582B
Application number: CN201811085510.8A
Authority: CN
Inventors: 张延良; 桂伟峰; 王俊峰; 李赓; 蒋涵笑
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2022-02-08
Anticipated expiration: 2038-09-18
Also published as: CN109190582A

Abstract

The application provides a micro-expression recognition method, which comprises the following steps: extracting each frame of the face video; sequentially comparing the difference between each frame except the first frame and the last frame and the previous frame; sequentially determining the difference between the difference of each frame except the first frame and the last frame and the difference between the frame and the previous frame as the difference value of the frame; determining a micro-expression frame in each frame except the first frame and the last frame; and extracting the expression characteristics of the micro expression frame through a pre-trained micro expression recognition model, reducing the dimension of the expression characteristics, and recognizing the reduced dimension characteristics to obtain a recognition result. According to the method, the difference between each frame and the next frame and the difference between each frame and the previous frame are compared to obtain the difference value of the frame, and the micro expression frame is determined according to the difference value of each frame.

Description

Novel micro-expression recognition method

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a micro-expression identification method.

Background

Micro-expressions are non-verbal behaviors that can reveal a person's own emotions.

Most of the current concerns are mainly common expressions, and besides the common facial expressions, the facial expressions also have micro-expressions generated by uncontrolled contraction of facial muscles in a psychological inhibition state.

The duration of micro expression is short and the amplitude of action is very small. There is considerable difficulty in correctly observing and identifying. The success rate of accurately capturing and identifying the micro-expressions with the naked eye is low. After professional training, the recognition rate can only reach 47%.

Therefore, the recognition method of micro-expressions is receiving increasing attention from researchers.

Disclosure of Invention

In order to solve the above problem, an embodiment of the present application provides a micro expression recognition method.

The method comprises the following steps:

acquiring a face video;

extracting each frame of the face video;

sequentially comparing the difference between each frame except the first frame and the last frame and the previous frame;

sequentially determining the difference between the difference of each frame except the first frame and the last frame and the difference between the frame and the previous frame as the difference value of the frame;

selecting frames with difference values of which the absolute values are not 0 and are smaller than a first preset threshold value from the frames except the first frame and the last frame;

determining the mark of each selected frame;

determining frames with continuous marks as micro-expression frames in the selected frames;

and extracting the expression characteristics of the micro expression frame through a pre-trained micro expression recognition model, reducing the dimension of the expression characteristics, and recognizing the reduced dimension characteristics to obtain a recognition result.

Optionally, the sequentially comparing differences between each frame except the first frame and the last frame and the next frame thereof includes:

for any frame i other than the first and last frames,

acquiring a red channel value, a green channel value and a blue channel value of each pixel in any frame i;

acquiring a red channel value, a green channel value and a blue channel value of each pixel in a frame i +1 subsequent to the frame i;

determining the difference between any frame i and the subsequent frame i +1 according to the red channel value, the green channel value and the blue channel value of each pixel in any frame i, and the red channel value, the green channel value and the blue channel value of each pixel in the subsequent frame i +1 of any frame i;

the sequentially comparing the difference between each frame except the first frame and the last frame and the previous frame comprises:

for any frame i except the first frame and the last frame;

acquiring a red channel value, a green channel value and a blue channel value of each pixel in a previous frame i-1 of any frame i;

and determining the difference between any frame i and the previous frame i-1 thereof according to the red channel value, the green channel value and the blue channel value of each pixel in any frame i and the red channel value, the green channel value and the blue channel value of each pixel in the previous frame i-1 of any frame i.

Optionally, the determining a difference between the any frame i and the subsequent frame i +1 according to the red channel value, the green channel value, and the blue channel value of each pixel in the subsequent frame i +1 of the any frame i, and the red channel value, the green channel value, and the blue channel value of each pixel in the subsequent frame i +1 of the any frame i, includes:

calculating the difference between any frame i and the subsequent frame i +1 by the following formula:

wherein,

is the difference between any frame i and the subsequent frame i +1, j is a pixel identifier, j is more than or equal to 1 and less than or equal to M, M is the total number of pixels of any frame of the face video, W_RAs a red channel weight, W_GAs a green channel weight, W_BAs a blue channel weight, R_i,jIs the red channel value, G, of pixel j in said any frame i_i,jIs the green channel value, B, of pixel j in said any frame i_i,jIs the blue channel value, R, of the pixel j in said any frame i_i+1,jIs the red channel value, G, of pixel j in the subsequent frame i +1_i+1,jIs the green channel value, B, of pixel j in the subsequent frame i +1_i+1,jIs the blue channel value of the pixel j in the next frame i + 1;

determining a difference between any frame i and a previous frame i-1 thereof according to the red channel value, the green channel value, and the blue channel value of each pixel in any frame i, and the red channel value, the green channel value, and the blue channel value of each pixel in a previous frame i-1 of any frame i, includes:

calculating the difference between any frame i and the previous frame i-1 by the following formula:

wherein,

is the difference between any frame i and its previous frame i-1, R_i-1,jIs the red channel value, G, of the pixel j in the frame i-1 preceding said any frame i_i-1,jIs the green channel value, B, of a pixel j in a frame i-1 preceding said any frame i_i-1,jIs the blue channel value of the pixel j in the frame i-1 preceding said any frame i.

Optionally, the W_RThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

determining a red channel value of the skin color of the human face according to the human face area;

W_Rred channel value/255 for face skin color;

the W is_GThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

determining a green channel value of the skin color of the human face according to the human face area;

W_Ggreen channel value/255 for face skin color;

the W is_BThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

determining a blue channel value of the skin color of the human face according to the human face area;

W_Bblue channel value/255 for face skin tone.

Optionally, before obtaining the recognition result, the method further includes:

training a micro expression recognition model;

the training micro-expression recognition model comprises:

acquiring a plurality of sample videos;

for each sample video, extracting corresponding expression features by adopting a local binary pattern;

performing dimensionality reduction processing according to the expression features of all sample videos;

and carrying out recognition training on each sample video based on the features after the dimension reduction processing to form a micro expression recognition model.

Optionally, the performing, according to the expression features of all sample videos, the dimension reduction processing includes:

determining the information increment of each expression characteristic according to the following formula:

determining the expression features with the information increment larger than a second preset threshold value as the features after dimension reduction processing;

wherein s is an expression feature identifier, and s is more than or equal to 1 and less than or equal to N_A，N_AFor the total number of expressive features of all sample videos, G_sis the information increment of the expression feature s, E () represents entropy, E(s) is the entropy of the expression feature s, v is the sample video identification containing the expression feature s, v is more than or equal to 1 and less than or equal to C, C is the total number of all sample videos containing the expression feature s, E (v) is the entropy of the sample video v containing the expression feature s, N_vAnd obtaining the number of the expression features corresponding to the sample video v containing the expression features s.

Alternatively, the calculation formula of E () is:

E()＝-p₊log₂p₊-p_-log₂p_-；

wherein p is₊To classify the probability of correctness, p_-Is the probability of classification error.

Optionally, the calculation formula of e(s) is:

E(s)＝-p₊(s)log₂p₊(s)-p_-(s)log₂p_-(s)；

wherein p is₊(s) probability of correct classification of expression features s, p_-(s) is the probability of misclassification of the expression features s;

said p is₊(s) and p_-The calculation method of(s) is as follows:

s1-1, taking q as 0;

s1-2, sequentially selecting q expression features as first auxiliary expression features from the expression features except the expression feature S;

classifying each sample video based on the expression feature s and the first auxiliary expression feature to obtain a first classification result of each sample video after the first auxiliary expression feature is selected;

obtaining a first standard classification result of each sample video;

comparing the first classification result of each sample video with the first standard classification result, and determining the number N of the sample videos with the first classification result consistent with the first standard classification result₊(s) number of sample videos N inconsistent with the first classification result and the first standard classification result_-(s)；

Determining the probability of correctness

Determining error probability

S1-3, after the q expression features are sequentially selected, determining whether q +1 is N or not_AIf q +1 is not N_AThen S1-4 is executed if q +1 is N_AThen execution of S1-5;

s1-4, q ═ q +1, S1-2 and S1-3 were repeatedly performed;

s1-5, mixing all

Is determined as p₊(s) mixing all

Is determined as p_-(s)。

Optionally, the calculation formula of e (v) is:

E(v)＝-p₊(v)log₂p₊(v)-p_-(v)log₂p_-(v)；

wherein p is₊(v) Probability of correctly classifying v sample video, p_-(v) Probability of classification error for sample video v;

said p is₊(v) And p_-(v) The calculation method comprises the following steps:

s2-1, taking y as 1;

s2-2, sequentially selecting y expression features as second auxiliary expression features from the expression features corresponding to the sample video v;

every time a second auxiliary expression feature is selected, x1 is x1+1, and the sample video v is classified based on the second auxiliary expression feature to obtain a second classification result of the sample video v; wherein x1 is a first counting mark, and x1 is an initial value of 0;

acquiring a second standard classification result of the sample video v;

comparing the second classification result with the second standard classification result, and if the second classification result is consistent with the second standard classification result, x2 is x2+ 1; if the second classification result is inconsistent with the second standard classification result, x3 is x3+ 1; wherein x2 is the second counting mark, x2 is 0 in initial value, x3 is the third counting mark, and x3 is 0 in initial value;

s2-3, after the q expression features are sequentially selected, determining whether y +1 is larger than N or not_vIf y +1 is not greater than N_vThen S2-4 is performed if y +1 is greater than N_vThen execution of S2-5;

s2-4, y +1, repeating S2-2 and S2-3;

s2-5, determining the value of x2/x1 as p₊(s) determining the value of x3/x1 as p_-(s)。

The beneficial effects are as follows:

the difference between each frame and the next frame and the difference between the frame and the previous frame are compared to obtain the difference value of the frame, and the micro expression frame is determined according to the difference value of each frame.

Drawings

Specific embodiments of the present application will be described below with reference to the accompanying drawings, in which:

fig. 1 is a schematic diagram illustrating LBP descriptor calculation according to an embodiment of the present application;

FIG. 2 illustrates a schematic diagram of feature extraction provided by an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating a micro expression recognition method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions and advantages of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and not an exhaustive list of all embodiments. And the embodiments and features of the embodiments in the present description may be combined with each other without conflict.

Due to the short duration of micro-expression and the very small amplitude of the motion. There is considerable difficulty in correctly observing and identifying. Based on the above, the application provides a micro expression frame identification method, the method compares the difference between each frame and the next frame and the difference between each frame and the previous frame to obtain the difference value of the frame, and the micro expression frame is determined according to the difference value of each frame.

The expression recognition method provided by the application comprises 2 major processes, wherein the first major process is a micro expression recognition model training process, and the other major process is an actual micro expression recognition process based on a trained micro expression recognition model.

The micro expression recognition model training process is not a process which is executed every time the expression recognition method provided by the application is executed, only when the expression recognition method provided by the application is executed for the first time, or the expression recognition scene changes, or the recognition result of the actual micro expression recognition process based on the trained micro expression recognition model is not ideal, or other reasons exist, the micro expression recognition model training process is executed, so that the accuracy of the micro expression recognition model is improved, and the accuracy of the result of the actual micro expression recognition process based on the trained micro expression recognition model is improved.

The method and the device do not limit the execution triggering conditions of the process of training the micro-expression recognition model.

The specific implementation method for the process of training the micro-expression recognition model is as follows:

step 1, obtaining a plurality of sample videos.

The sample video may be obtained from an existing microexpression dataset.

Since micro-expressions are the tiny facial movements that a person makes when trying to mask his mood. In a strict sense, the micro expression that people subjectively simulate cannot be called as the micro expression, so the induction mode of the micro expression determines the reliability degree of the data.

This step may obtain multiple sample videos from one or 2 of the following 2 existing micro-expression datasets:

the micro-expression dataset SMIC, established by the university of orlu, finland, requires the subject to watch a video with large mood swings and to try to mask his mood from being exposed, and the recorder observes the subject's expression without watching the video. If the recorder observes the facial expression of the subject, the subject gets a penalty. Under the induction mechanism, 164 video sequences of 16 persons are formed, the micro-expression categories are 3, namely positive (positive), surprise (negative) and negative (negative), and the number of the video sequences is 70, 51 and 43 respectively.

The micro-expression data set CASME2, established by the psychological institute of Chinese academy of sciences, employs a similar inducement mechanism to ensure the reliability of the data, but is rewarded if the subject successfully suppresses his facial expression and is not discovered by the recorder. The data set is 5 micro-expression categories consisting of 247 video sequences of 26 individuals, respectively happy (happy), dislike (disgust), surprised (surrise), depressed (suppression), other (other), the number of video sequences 32, 64, 25, 27, 99.

And 2, for each sample video, extracting corresponding expression features by adopting a local binary pattern.

A Local Binary Pattern (LBP) descriptor is defined on a central pixel and its surrounding rectangular neighborhood, as shown in fig. 1, with the gray value of the central pixel as a threshold, Binary quantizing the neighborhood pixels around the central pixel, and the code greater than or equal to the central pixel value is 1, and if smaller, the code is 0, and a Local Binary Pattern is formed.

And (4) connecting the binary mode in series in a clockwise direction by taking the upper left corner as a starting point to obtain a string of binary digits, wherein the corresponding decimal digits can uniquely identify the central pixel point. In this way, each pixel in the image can be computed using a local binary pattern.

As shown in fig. 1, the center pixel value in the left table is 178, the upper left corner is 65, 65<178, so the corresponding value is 0, 188>178, and so the corresponding value is 1. By analogy, the table on the right side of fig. 1 is obtained, and then the binary pattern value is 01000100.

In addition, the extension of the LBP static texture descriptor in the time-space domain can also form 2-dimensional local binary patterns on 3 orthogonal planes. As shown in fig. 2, LBP features of video sequences in three orthogonal planes XY, XT, and YT are extracted, and feature vectors in each orthogonal plane are concatenated to form an LBP-TOP feature vector. The local texture information of the image is considered, and the change situation of the video along with the time is described.

However, the vector dimension of LBP-TOP is 3X 2^LAnd L is the number of the field points. If the expression features extracted in the step 2 are directly used for modeling, the model training efficiency is low and the effect is poor due to large feature dimension. Therefore, after the expression features are extracted in the step 2, the step 3 is executed to reduce the dimension of the practical training model which is the considered expression features, and improve the training efficiency of the model.

And 3, performing dimension reduction processing according to the expression characteristics of all sample videos.

According to the method, the information increment of each expression feature is used as a dimension screening basis, the expression features of which the information increment is not larger than a second preset threshold are not considered, and the expression features of which the information increment is larger than the second preset threshold are determined as the features after dimension reduction processing.

It should be noted that, in the present application, since a preset threshold is also involved in a subsequent actual micro expression recognition process based on a trained micro expression recognition model, and there is no association relationship between the two preset thresholds, in order to distinguish the preset threshold from a subsequent actual micro expression recognition process based on the trained micro expression recognition model, the preset threshold in the micro expression recognition model training process is referred to as a second preset threshold, and the preset threshold in the actual micro expression recognition process based on the trained micro expression recognition model is referred to as a first preset threshold, where the first and second thresholds are only different stages of distinguishing, and do not have any actual significance.

Specifically, the information increment of each expression feature is determined according to the following formula:

and determining the expression features of which the information increment is larger than a second preset threshold value as the features after the dimension reduction processing.

Wherein s is an expression feature identifier, and s is more than or equal to 1 and less than or equal to N_A，N_AThe total number of the expressive features of all the sample videos (to improve the dimension reduction effect of the expressive features, the expressive features of all the sample videos here may be non-repetitive expressive features), G_sIs the information increment of the expression feature s, E () represents entropy, E(s) is the entropy of the expression feature s, v is the sample video identification containing the expression feature s, v is more than or equal to 1 and less than or equal to C, C is the total number of all sample videos containing the expression feature s, E (v) is the entropy of the sample video v containing the expression feature s, N_vAnd obtaining the number of the expression features corresponding to the sample video v containing the expression features s.

Wherein, the calculation formula of E () is: e () ═ p₊log₂p₊-p_-log₂p_-。

p₊To classify the probability of correctness, p_-Is the probability of classification error.

Based on the E () calculation formula, the entropy of the expressive features can be calculated, as well as the entropy of the sample video.

● for calculating the entropy of the expressive feature, taking the expressive feature s as an example, the calculation formula is:

E(s)＝-p₊(s)log₂p₊(s)-p_-(s)log₂p_-(s)；

p₊(s) and p_-The calculation method of(s) is as follows:

s1-1, taking q as 0;

obtaining a first standard classification result of each sample video;

Determining the probability of correctness

Determining error probability

Wherein C is the total number of sample videos containing the expression feature s.

s1-4, q ═ q +1, S1-2 and S1-3 were repeatedly performed;

s1-5, mixing all

Is determined as p₊(s) mixing all

Is determined as p_-(s)。

It should be noted that, since the present application further relates to an auxiliary expression feature, a classification result, and a standard classification result in the subsequent calculation of e (v) (the entropy of the sample video v including the expression feature s), there is no association between two auxiliary expression features, there is no association between classification results, and there is no association between two standard classification results. In order to distinguish the auxiliary expression features, the classification results and the standard classification results in the subsequent E (v) calculation process, the auxiliary expression features, the classification results and the standard classification results in the E(s) (entropy of the expression features s) calculation process are respectively called as first auxiliary expression features, first classification results and first standard classification results, the auxiliary expression features, the classification results and the standard classification results in the E (v) calculation process are respectively called as second auxiliary expression features, second classification results and second standard classification results, and the first and second features are only distinguishing functions and do not have any practical significance.

The implementation scheme is described by taking as an example that 3 sample videos are respectively a sample video 1 and a sample video 2, the total number of expression features of all the sample videos is 3, the expression features are respectively an expression feature 1, an expression feature 2 and an expression feature 3, the expression feature s is an expression feature 3, and both the sample video 1 and the sample video 2 contain the expression feature 3.

(1) Taking q as 0, sequentially selecting 0 expression features from the expression features (expression features 1 and 2) except the expression feature s (expression feature 3) as first auxiliary expression features (namely, not selecting other expression features).

At this time, each sample video (sample video 1, sample video 2) is classified based on the expression feature s (expression feature 3) only, and the first classification result of the sample video 1 and the first classification result of the sample video 2 are obtained.

And acquiring a first standard classification result of the sample video 1 and a first standard classification result of the sample video 2.

The first classification result of the sample video 1 is compared with the first standard classification result of the sample video 1, and the first classification result of the sample video 2 is compared with the first standard classification result of the sample video 2. If the first classification result of the sample video 1 is inconsistent with the first standard classification result of the sample video 1, the first classification result of the sample video 2 is inconsistent with the first standard classification result of the sample video 2. Determining the number of sample videos N for which the first classification result is consistent with the first standard classification result₊(s) < 0 > and samples whose first classification result is inconsistent with the first standard classification resultNumber of videos N_-(s)＝2。

Since both sample video 1 and sample video 2 contain expressive feature 3, C ═ 2.

Determining the probability of correctness

Determining error probability

After 0 expression features are sequentially selected, whether 0+1 is N or not is determined_A(N_AThe total number of the expressive features of all the sample videos is 3 in this example), if 0+ 1-1 is not 3, then q-0 + 1-1, and enter (2).

(2) At this time, q is 1, and 1 of the expression features (expression feature 1, expression feature 2) other than the expression feature s (expression feature 3) is sequentially selected as the first auxiliary expression feature.

Selecting expression feature 1 as a first auxiliary expression feature

At this time, each sample video (sample video 1, sample video 2) is classified based on the expression feature s (expression feature 3) and the first auxiliary expression feature (expression feature 1), and a first classification result of the sample video 1 and a first classification result of the sample video 2 are obtained.

The first classification result of the sample video 1 is compared with the first standard classification result of the sample video 1, and the first classification result of the sample video 2 is compared with the first standard classification result of the sample video 2. If the first classification result of the sample video 1 is consistent with the first standard classification result of the sample video 1, the first classification result of the sample video 2 is consistent with the first standard classification result of the sample video 2. Determining the number of sample videos N for which the first classification result is consistent with the first standard classification result₊(s) — 2 and the number of sample videos N for which the first classification result and the first standard classification result do not agree_-(s)＝0。

Determining the probability of correctness

Determining error probability

Selecting expression feature 2 as a first auxiliary expression feature

At this time, each sample video (sample video 1, sample video 2) is classified based on the expression feature s (expression feature 3) and the first auxiliary expression feature (expression feature 2), and a first classification result of the sample video 1 and a first classification result of the sample video 2 are obtained.

The first classification result of the sample video 1 is compared with the first standard classification result of the sample video 1, and the first classification result of the sample video 2 is compared with the first standard classification result of the sample video 2. If the first classification result of the sample video 1 is consistent with the first standard classification result of the sample video 1, the first classification result of the sample video 2 is inconsistent with the first standard classification result of the sample video 2. Determining the number of sample videos N for which the first classification result is consistent with the first standard classification result₊(s) ═ 1 and number of sample videos N for which the first classification result and the first standard classification result do not agree_-(s)＝1。

Determining the probability of correctness

Determining error probability

When 1 expression feature is selected in sequence, if it is determined that 1+ 1-2 is not 3, then q is 1+ 1-2, and (3) is entered.

(3) At this time, q is 2, of the expression features (expression feature 1, expression feature 2) other than the expression feature s (expression feature 3), 2 expression features (only 2 remaining expression features) are sequentially selected as the first auxiliary expression feature.

At this time, the sample videos (sample video 1 and sample video 2) are classified based on the expression feature s (expression feature 3) and the first auxiliary expression features (expression feature 1 and expression feature 2), so that a first classification result of the sample video 1 and a first classification result of the sample video 2 are obtained.

Determining the probability of correctness

Determining error probability

When 2 expression features are sequentially selected, if 2+1 is determined to be 3, all the expression features are selected

Is determined as p₊(s) mixing all

Is determined as p_-(s)。

That is to say in (1)

(2) In (1)

(case) &

(case), (3)

Is determined as p₊(s). Such as (0+1+0.5+ 1)/4-2.5/4-0.625.

The compound of (1)

(2) In (1)

(case) &

Case), (3) of

Is determined as p_-(s). Such as (1+0+0.5+ 0)/4-1.5/4-0.375.

It should be noted that the above numbers are merely examples, and actual data is the standard in specific implementation.

● for calculating the entropy of the sample video, taking the sample video v as an example, the calculation formula is:

E(v)＝-p₊(v)log₂p₊(v)-p_-(v)log₂p_-(v)；

p₊(v) and p_-(v) The calculation method comprises the following steps:

s2-1, taking y as 1;

every time the second auxiliary expression feature is selected, x1 is x1+1, the sample video v is classified based on the second auxiliary expression feature, and a second classification result of the sample video v is obtained; wherein x1 is a first counting mark, and x1 is an initial value of 0;

acquiring a second standard classification result of the sample video v;

s2-4, y +1, repeating S2-2 and S2-3;

The implementation scheme is described by taking an example that the sample video v corresponds to 2 expression features, namely expression feature 1 and expression feature 2.

1) And taking y as 1, and sequentially selecting y (1) expression features as second auxiliary expression features from the expression features corresponding to the sample video v.

Selecting expression feature 1 as a first auxiliary expression feature

At this time, x1 is x1+1 (since x1 is the first count flag, and the initial value of x1 is 0, at this time, x1 is 0+ 1), and the sample video v is classified based on the second auxiliary expression feature (expression feature 1), so as to obtain a second classification result of the sample video v;

acquiring a second standard classification result of the sample video v;

comparing the second classification result with the second standard classification result, and if the second classification result is not consistent with the second standard classification result, x3 is equal to x3+1 (x 3 is equal to 0+1 because x3 is the third counter mark and the x3 initial value is 0).

Selecting expression feature 2 as a first auxiliary expression feature

At this time, x1 is 1+1 is 2, and the sample video v is classified based on the second auxiliary expression feature (expression feature 2), so as to obtain a second classification result of the sample video v;

acquiring a second standard classification result of the sample video v;

comparing the second classification result with the second standard classification result, and if the second classification result matches the second standard classification result, x2 is x2+1 (x 2 is 0+1 since x2 is the second count flag and x2 is the initial value 0).

After 1 expression feature is sequentially selected, whether y +1(1+1 ═ 2) is larger than N is determined_v(N_vThe number of the expression features corresponding to the sample video v containing the expression features s is 2), 2 is not greater than 2, and then y +1+ 2 enters 2).

2) At this time, y is 2, and 2 expression features (only 2 expression features) are sequentially selected as the second auxiliary expression feature from the expression features corresponding to the sample video v. Namely, the expressive features 1 and 2 are selected as the first auxiliary expressive feature.

At this time, x1 is 2+1 is 3, and the sample video v is classified based on the second auxiliary expression features (expression features 1 and 2), so as to obtain a second classification result of the sample video v;

acquiring a second standard classification result of the sample video v;

and comparing the second classification result with the second standard classification result, and if the second classification result is consistent with the second standard classification result, x2 is 1+1 is 2.

When 2 expression features are sequentially selected, determining that 3+1 is 3 and 3>2, determining x2/x1 as p 2/3₊(s) determining x3/x1 as p 2/3_-(s)。

And 4, carrying out recognition training on each sample video based on the features after the dimension reduction processing to form a micro expression recognition model.

The training method in this step can be various, for example, any clustering algorithm is adopted to cluster each sample video based on the features after the dimension reduction processing, and the micro-expression class to which each sample video belongs is formed. And adjusting parameters in the clustering algorithm according to a second standard classification result of each sample video, and repeatedly clustering each sample video based on the characteristics after the dimension reduction processing to form a micro-expression class to which each sample video belongs. And then, according to the second standard classification result of each sample video, adjusting parameters in the clustering algorithm to complete training and form a micro expression recognition model.

The micro expression recognition model in the application is a classifier.

For example: a Support Vector Machine (SVM) method is employed. The key of the SVM is the kernel function, and different SVM classification results can be obtained by adopting different kernel functions.

For example, the following kernel functions may be employed: there are Linear kernels (Linear kernels), Chi-square kernels (Chi-square kernels), and Histogram Intersection kernels (Histogram Intersection kernels).

In addition, in order to improve the classification recognition rate of the finally trained classification model, Cross Validation (Cross Validation) can be adopted to verify the performance of the micro expression recognition model. Specifically, all sample videos are divided into two subsets, one subset for training the classifier is called a training set, and the other subset for verifying the effectiveness of the analysis classifier is called a test set. The classifier obtained by training is tested by using the test set, and the test set is used as the performance index of the classifier. Common methods are simple cross validation, K-fold cross validation and leave-one-cross validation.

The leave-one-out cross validation method is used for performing micro-expression classification training on SVM classifiers with different kernel functions. And (3) selecting all video sequences of one subject as a test sample and all video sequences of the other I subjects as training samples, repeating the experiment for I +1 times, and calculating the average classification recognition rate for I +1 times.

Based on the method, the training of the micro expression recognition model is completed, and the micro expression recognition model improves the recognition rate and the recognition accuracy rate of the micro expression recognition model due to the fact that dimension reduction processing is carried out on the expression characteristics.

The implementation method of the actual micro expression recognition process based on the trained micro expression recognition model is shown in fig. 3:

101, acquiring a face video.

Because the duration of the micro expression is short and the action amplitude is very small, the face video image in the step only needs to include the face in each frame, and the precise corresponding video of the micro expression is not needed.

And 102, extracting each frame of the face video.

103, sequentially comparing the difference between each frame except the first frame and the last frame and the next frame, and sequentially comparing the difference between each frame except the first frame and the last frame and the previous frame.

Since the first frame and the last frame generally do not contain valid information, the first frame and the last frame are not considered in the present application.

In particular, the method comprises the following steps of,

(1) comparing the difference between each frame except the first frame and the last frame and the next frame in turn, including:

for any frame i other than the first and last frames,

the red, green and blue channel values for each pixel in any frame i are obtained.

The red channel value, the green channel value and the blue channel value of each pixel in a frame i +1 subsequent to any frame i are obtained.

And determining the difference between any frame i and the subsequent frame i +1 according to the red channel value, the green channel value and the blue channel value of each pixel in any frame i and the red channel value, the green channel value and the blue channel value of each pixel in the subsequent frame i +1 of any frame i.

For example, the difference between any frame i and the subsequent frame i +1 is calculated by the following formula:

wherein,

is the difference between any frame i and the subsequent frame i +1, j is a pixel identifier, j is more than or equal to 1 and less than or equal to M, M is the total number of pixels of any frame of the face video, W_RAs a red channel weight, W_GAs a green channel weight, W_BAs a blue channel weight, R_i,jIs the red channel value, G, of pixel j in any frame i_i,jIs the green channel value, B, of pixel j in any frame i_i,jIs the blue channel value, R, of pixel j in any frame i_i+1,jIs the red channel value, G, of pixel j in the subsequent frame i +1_i+1,jIs the green channel value, B, of pixel j in the subsequent frame i +1_i+1,jIs the blue channel value of pixel j in the next frame i + 1.

Wherein,

①W_Rthe determination method comprises the following steps:

in the face video, any frame including a face is acquired (as long as one frame image including a face is acquired). Face regions in any frame of a face are identified. And determining a red channel value of the skin color of the human face according to the human face area. W_RRed channel value/255 for face skin tone.

②W_GThe determination method comprises the following steps:

in a face video, acquiring any frame including a face; identifying a face region in any frame of a face; determining a green channel value of the skin color of the face according to the face area; w_GGreen channel value of face skin/255.

③W_BThe determination method comprises the following steps:

in a face video, acquiring any frame including a face; identifying a face region in any frame of a face; determining a blue channel value of the skin color of the human face according to the human face area; w_BBlue channel value/255 for face skin tone.

In particular implementations, W may be determined simultaneously_R、W_G、W_B. For example, in a face video, any frame including a face is acquired; identifying a face region in any frame of a face; determining a red channel value, a green channel value and a blue channel value of the skin color of the face according to the face region; w_RRed channel value/255, W for face skin color_GGreen channel value/255, W for face skin color_BBlue channel value/255 for face skin tone.

(2) Comparing the difference between each frame except the first frame and the last frame and the previous frame in turn, including:

for any frame i except the first and last frames.

the red, green and blue channel values are obtained for each pixel in a frame i-1 prior to any frame i.

The difference between any frame i and its previous frame i-1 is determined according to the red channel value, the green channel value, and the blue channel value of each pixel in any frame i, and the red channel value, the green channel value, and the blue channel value of each pixel in the previous frame i-1 of any frame i.

For example, determining the difference between any frame i and its previous frame i-1 according to the red channel value, the green channel value, and the blue channel value of each pixel in any frame i, and the red channel value, the green channel value, and the blue channel value of each pixel in the previous frame i-1 of any frame i comprises:

the difference between any frame i and its previous frame i-1 is calculated by the following formula:

wherein,

is the difference between any frame i and its previous frame i-1, R_i-1,jThe red channel value, G, of pixel j in frame i-1 preceding any frame i_i-1,jThe green channel value, B, of pixel j in frame i-1 preceding any frame i_i-1,jThe blue channel value of pixel j in frame i-1, which is the frame immediately preceding any frame i.

The difference between the two frames is calculated in the step, not only the difference of the actual pixels is considered, but also the skin color of a person is considered, and the pixel difference is finely adjusted according to the difference of the skin color of the person, so that the difference in the step is more consistent with the time condition, and the micro expression is more accurately reflected.

And 104, sequentially determining the difference between the difference of each frame except the first frame and the last frame and the difference between the frame and the frame before as the difference value of the frame.

For example,

and 105, determining the micro expression frame according to the difference value of the frames except the first frame and the last frame.

Since the content of the face video in 101 is not limited in the present application, the face video in 101 is likely to include non-micro expression video segments, such as non-expressive face video segments, or segments of regular expressions (such as laughing, crying, etc.), although expressions, but not micro expressions, or non-expressive face video segments and regular expressions.

In the step, the start frame and the end frame of the micro expression can be accurately found in the face video in the 101, so that the frame only containing the micro expression is extracted, and the frame only containing the micro expression is identified subsequently, so that the identification efficiency can be improved.

Specifically, in each frame except for the first frame and the last frame, selecting a frame of which the absolute value of the difference value is not 0 and is smaller than a first preset threshold value; determining the mark of each selected frame; and in each selected frame, determining the frames with continuous marks as the micro expression frames.

Taking 10 frames of the face video obtained in step 101, which are respectively recorded as frame 1, frame 2, frame 3, frame 4, frame 5, frame 6, frame 7, frame 8, frame 9, and frame 10 in time sequence as an example, except for the first frame (frame 1) and the last frame (frame 10), the remaining 8 frames are respectively frame 2, frame 3, frame 4, frame 5, frame 6, frame 7, frame 8, and frame 9. If the difference value of frame 2 obtained in step 104 is 0.9, the difference value of frame 3 is 0.01, the difference value of frame 4 is 0, the difference value of frame 5 is 0.02, the difference value of frame 6 is 0.01, the difference value of frame 7 is 0, the difference value of frame 8 is 0.01, and the difference value of frame 9 is 0.01. When the first preset threshold is 0.03, selecting frames (frame 3, frame 5, frame 6, frame 8 and frame 9) with the absolute value of the difference value being not 0 and being less than 0.03, and determining the frames (frame 5, frame 6, frame 8 and frame 9) with continuous marks as the microexpression frames.

It is likely that frame 5, frame 6, and frame 8, frame 9, represent a micro expression at this time.

The foregoing is merely exemplary and does not represent actual situations.

The first preset threshold value ensures the difference size, the difference amplitude of different micro expressions is different, and the first preset threshold value is used for carrying out adaptive selection according to different application fields of the method, so that the universality of the expression recognition method provided by the application is ensured.

In addition, the present application does not limit "continuous" but only non-individual frames. For example, if there are 2 frames with consecutive marks, the 2 frames with consecutive marks are all determined as the micro expression frames. For another example, if there are 3 frames with consecutive marks, all the 3 frames with consecutive marks are determined as the microexpressing frames.

And 106, extracting the expression characteristics of the micro expression frame through a pre-trained micro expression recognition model, reducing the dimension of the expression characteristics, and recognizing the reduced dimension characteristics to obtain a recognition result.

The micro expression recognition model in this step is the micro expression recognition model obtained before 101. In this step, the micro-expression frame obtained in 105 is input into the micro-expression recognition model, and the micro-expression recognition model can classify the micro-expressions to obtain the micro-expression result corresponding to the face video obtained in 101.

In the process that the trained micro expression recognition model carries out actual micro expression recognition, the micro expression frame is intelligently extracted, and micro expression recognition is carried out only on the basis of the micro expression frame, so that the micro expression recognition efficiency is improved, and the micro expression recognition quality is also improved.

It should be noted that "first", "second", and the like in this embodiment and the subsequent embodiments are only used for distinguishing preset thresholds, classification results, standard classification results, and the like in different steps, and do not have any other special meanings.

Has the advantages that:

Claims

1. A micro-expression recognition method, the method comprising:

acquiring a face video;

extracting each frame of the face video;

determining the mark of each selected frame;

training a micro expression recognition model;

extracting expression characteristics of the micro expression frame through a trained micro expression recognition model, reducing the dimension of the expression characteristics, and recognizing the reduced dimension characteristics to obtain a recognition result;

wherein, the comparing the difference between each frame except the first frame and the last frame and the next frame in turn comprises:

for any frame i other than the first and last frames,

for any frame i except the first frame and the last frame;

determining the difference between any frame i and the previous frame i-1 thereof according to the red channel value, the green channel value and the blue channel value of each pixel in any frame i, and the red channel value, the green channel value and the blue channel value of each pixel in the previous frame i-1 of any frame i;

the training micro-expression recognition model comprises:

acquiring a plurality of sample videos;

based on the features after the dimension reduction processing, carrying out recognition training on each sample video to form a micro-expression recognition model;

the dimension reduction processing according to the expression features of all sample videos comprises the following steps:

wherein s is an expression feature identifier, and s is more than or equal to 1 and less than or equal to N_A，N_ATotal number of expressive features, G, for all sample videos_sIs the information increment of the expression feature s, E () represents entropy, E(s) is the entropy of the expression feature s, v is the sample video identification containing the expression feature s, v is more than or equal to 1 and less than or equal to C, C is the total number of all sample videos containing the expression feature s, E (v) is the entropy of the sample video v containing the expression feature s, N_vAnd obtaining the number of the expression features corresponding to the sample video v containing the expression features s.

2. The method according to claim 1, wherein the determining the difference between the any frame i and the subsequent frame i +1 according to the red channel value, the green channel value and the blue channel value of each pixel in the any frame i and the red channel value, the green channel value and the blue channel value of each pixel in the subsequent frame i +1 of the any frame i comprises:

wherein,

wherein,

3. The method of claim 2, wherein W is_RThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

W_Rred channel value/255 for face skin color;

the W is_GThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

W_Ggreen channel value/255 for face skin color;

the W is_BThe determination method comprises the following steps:

in the face video, any frame comprising a face is obtained;

identifying a face region in any frame of the face;

W_Bblue channel value/255 for face skin tone.

4. The method of claim 1, wherein the formula for E () is:

E()＝-p₊log₂p₊-p_-log₂p_-；

5. The method of claim 4, wherein the formula for E(s) is:

E(s)＝-p₊(s)log₂p₊(s)-p_-(s)log₂p_-(s)；

said p is₊(s) and p_-The calculation method of(s) is as follows:

s1-1, taking q as 0;

obtaining a first standard classification result of each sample video;

comparing the first classification result of each sample video with the first standard classification result, and determining the number N of the sample videos with the first classification result consistent with the first standard classification result₊(s) and first classificationNumber N of sample videos whose result is inconsistent with the first standard classification result_-(s)；

Determining the probability of correctness

Determining error probability

s1-4, q ═ q +1, S1-2 and S1-3 were repeatedly performed;

s1-5, mixing all

Is determined as p₊(s) mixing all

Is determined as p_-(s)。

6. The method of claim 4, wherein the formula for E (v) is:

E(v)＝-p₊(v)log₂p₊(v)-p_-(v)log₂p_-(v)；

s2-1, taking y as 1;

acquiring a second standard classification result of the sample video v;

s2-4, y +1, repeating S2-2 and S2-3;