CN109614927B - Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction - Google Patents

Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction Download PDF

Info

Publication number
CN109614927B
CN109614927B CN201811499959.9A CN201811499959A CN109614927B CN 109614927 B CN109614927 B CN 109614927B CN 201811499959 A CN201811499959 A CN 201811499959A CN 109614927 B CN109614927 B CN 109614927B
Authority
CN
China
Prior art keywords
frame
difference
value
face
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811499959.9A
Other languages
Chinese (zh)
Other versions
CN109614927A (en
Inventor
张延良
郭辉
李赓
桂伟峰
王俊峰
蒋涵笑
卢冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN201811499959.9A priority Critical patent/CN109614927B/en
Publication of CN109614927A publication Critical patent/CN109614927A/en
Application granted granted Critical
Publication of CN109614927B publication Critical patent/CN109614927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The application provides a micro-expression recognition method, which is used for carrying out face recognition on each frame in a video and extracting a face area; extracting the pixel number, the background color and the face brightness of each frame in the video; sequentially selecting a non-first frame, and calculating the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of the selected frame and the previous frame; calculating a difference value of each non-first frame; determining frames with difference values larger than a preset threshold value and the first frame of the video as candidate frames; in the candidate frames, determining frames with continuous marks as micro-expression frames; and extracting the expression features of the micro-expression frames, performing dimensionality reduction on the expression features through a pre-trained dimensionality reduction model, and identifying the dimensionality-reduced features to obtain an identification result. According to the method and the device, the micro expression frame is selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frame related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frame are improved.

Description

Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a micro-expression identification method.
Background
Micro-expressions are non-verbal behaviors that can reveal a person's own emotions.
At present, common expressions are mainly focused on, and besides the common facial expressions, micro expressions generated by uncontrolled contraction of facial muscles in a psychological suppression state exist.
The duration of micro expression is short and the amplitude of action is very small. There is considerable difficulty in correctly observing and identifying. The success rate of accurately capturing and identifying the micro-expressions with the naked eye is low. After professional training, the recognition rate can only reach 47%.
Therefore, recognition methods of micro-expressions are receiving increasing attention from researchers.
Disclosure of Invention
In order to solve the above problems, the embodiments of the present application provide a micro expression recognition method.
Acquiring a face video;
carrying out face recognition on each frame in the video, and extracting a face area;
extracting the pixel number, the background color and the face brightness of each frame in the video;
sequentially selecting a non-first frame, and calculating the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of the selected frame and the previous frame;
calculating a difference value of each non-first frame, wherein the difference value = (the area difference of the face region = the face brightness difference + the background color difference) ^ the pixel number difference;
determining frames with difference values larger than a preset threshold value and determining the first frame of the video as candidate frames;
in the candidate frames, determining frames with continuous marks as micro-expression frames;
and extracting the expression features of the micro-expression frames, performing dimensionality reduction on the expression features through a pre-trained dimensionality reduction model, and identifying the dimensionality-reduced features to obtain an identification result.
Optionally, extracting a background color of each frame in the video includes:
for any one of the frames in the video,
determining a non-face area in any frame as a background area;
determining RGB color values of all pixel points in the background area of any frame, wherein the RGB color values comprise red color values, green color values and blue color values;
calculating the RGB color mean value of the background area of any frame by the following formula, wherein the RGB color mean value comprises a red color mean value, a green color mean value and a blue color mean value:
Figure BDA0001897945250000021
wherein j is the pixel point identification of the background area of any frame,
Figure BDA0001897945250000022
is the red color mean of the background area of any frame,
Figure BDA0001897945250000023
is the average of the green color of the background area of any frame,
Figure BDA0001897945250000024
is the average value of blue color of the background area of any frame, c 1j A red color value, c, of a j-th pixel point in the background area of any frame 2j Green color value of j-th pixel point in any frame background area, c 3j The blue color value, n, of the j-th pixel point in the background area of any frame 1 The total number of pixel points in any frame background region;
calculating the RGB color mean square error of the background area of any frame, wherein the RGB color mean square error comprises a red color mean square error, a green color mean square error and a blue color mean square error:
Figure BDA0001897945250000025
wherein σ 11 Mean square error of red color, σ 21 Mean square error of green color, σ 31 Is the blue color mean square error;
determining RGB color intervals of any frame background area, wherein the RGB color intervals comprise red color intervals
Figure BDA0001897945250000026
Green color interval
Figure BDA0001897945250000027
Blue color interval
Figure BDA0001897945250000028
Determining the number n2 of pixel points of which the red color values are located in a red color interval of an RGB color interval, the green color values are located in a green color interval, and the blue color values are located in a blue color interval, in all the pixel points of the background area of any frame;
and determining the background color of any frame according to the n 2.
Optionally, the background color is represented by RGB color values;
the determining the background color of any frame according to n2 comprises:
calculating the pixel number ratio n of any frame 3 =n 2 /n 1
The red color value of the background color of any frame is
Figure BDA0001897945250000029
A green color value of
Figure BDA00018979452500000210
A blue color value of
Figure BDA00018979452500000211
Optionally, extracting the face brightness of each frame in the video includes:
for any one frame in the video, the video is,
determining the brightness value of each pixel point in the face area of any frame according to the following formula:
Figure BDA0001897945250000031
wherein k is the pixel point identifier of any frame of face region, h k The brightness value R of the kth pixel point of any frame of human face region k Is a red color value, G, of the RGB color values of the k-th pixel point k Is a green color value, B, of the RGB color values of the k-th pixel point k The color value of blue in the RGB color values of the kth pixel point;
determining a maximum brightness value and a minimum brightness value in the brightness values of all pixel points of any frame of human face region;
calculating the brightness average value of the human face area of any frame
Figure BDA0001897945250000032
Wherein n is 4 The total number of pixel points in the face region of any frame is calculated;
according to the maximum brightness value, the minimum brightness value and
Figure BDA0001897945250000038
and determining the face brightness of any frame in the video.
Optionally, the function is based on a maximum luminance value, a minimum luminance value and
Figure BDA0001897945250000037
determining the face brightness of any frame in the video, comprising:
calculating a first difference d1= maximum luminance value-minimum luminance value;
calculating a second difference value
Figure BDA0001897945250000033
Calculating a third difference value
Figure BDA0001897945250000034
Calculating a brightness ratio d4= | d1-d2|/| d1-d3|;
calculating the mean square error of the brightness of the human face region of any frame
Figure BDA0001897945250000035
The face brightness of any frame in the video is
Figure BDA0001897945250000036
Optionally, before calculating the disparity value of each non-leading frame, the method further includes:
performing primary screening on the non-first frame according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of each non-first frame;
the calculating a difference value of each non-first frame includes:
and calculating the difference value of each frame after the initial screening.
Optionally, the performing a preliminary screening on the non-first frame according to the face area difference, the pixel number difference, the background color difference, and the face brightness difference of each non-first frame includes:
for any of the non-leading frames, the frame is,
if the area difference of the face region of any non-first frame is not larger than a first value, the pixel number difference is not larger than a second value, the background color difference is not larger than a third value, and the face brightness difference is not larger than a fourth value, the non-first frame passes through a primary screen; alternatively, the first and second electrodes may be,
if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then the non-first frame passes through the primary screening; alternatively, the first and second liquid crystal display panels may be,
if the face brightness difference of any non-first frame is not larger than a fourth value, but the face area difference, the pixel number difference and the background color difference are all 0, the non-first frame passes through a primary screen;
the first value is (sum of face area differences of all non-first frames + face area-avg 1 of the first frames)/total frame number of the face video, the second value is (sum of pixel number differences of all non-first frames + pixel number-avg 2 of the first frames)/total frame number of the face video, the third value is (sum of background color differences of all non-first frames + background color-avg 3 of the first frames)/total frame number of the face video, the fourth value is (sum of face brightness differences of all non-first frames + face brightness-avg 4 of the first frames)/total frame number of the face video, avg1= face area sum of all frames/total frame number of the face video, avg2= sum of all frames/total frame number of the face video, avg3= background color sum of all frames/total frame number of the face video, avg4= face brightness sum of all frames/total frame number of the face video.
Optionally, before the performing the dimension reduction processing on the expression features through the pre-trained dimension reduction model, the method further includes:
obtaining a sample set X, wherein the total number of samples in the X is m, each sample comprises a plurality of expression features, and each sample belongs to one category;
classifying all samples according to categories;
calculating mean vectors of classes
Figure BDA0001897945250000041
Wherein i is a class identifier, mu i Is a mean vector of class i, b i Is the number of samples of the i-th class, j is the sample identification, x ij A vector formed by expression characteristics of the ith sample and the jth sample;
determining the total mean vector according to the mean vectors of all types
Figure BDA0001897945250000042
Wherein, mu 0 E is the total mean vector, and E is the total number of different classes to which the samples in X belong;
calculating an inter-class variance vector and an intra-class variance vector according to the total mean vector;
and determining the expression characteristics after dimensionality reduction according to the between-class variance vector and the within-class variance vector to form a dimensionality reduction model.
Optionally, the calculating an inter-class variance vector and an intra-class variance vector according to the overall mean vector includes:
Figure BDA0001897945250000051
Figure BDA0001897945250000052
wherein S is w Is an inter-class variance vector, S b Is an intra-class variance vector, X i Is a set composed of the ith type samples.
Optionally, the determining the expression features after the dimensionality reduction according to the between-class variance vector and the within-class variance vector includes:
calculating a weight vector W = diag (S) composed of the weights of the expression features b ·/S w ) Wherein diag () is a function for taking the elements on the diagonal of the matrix, ·/is an operator for applying S w And S b Is divided by the corresponding element;
sorting the expression features according to the sequence of the weights of the expression features from big to small;
and determining a preset number of expression features which are ranked in the front as the expression features after dimension reduction.
The beneficial effects are as follows:
the micro expression frames are selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frames related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frames are improved.
Drawings
Specific embodiments of the present application will be described below with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram illustrating a principle of a dimension reduction model classified into 2 classes according to an embodiment of the present application;
FIG. 2 is a flow chart of a micro expression recognition method according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating LBP descriptor calculation according to an embodiment of the present application;
fig. 4 shows a schematic diagram of feature extraction provided in an embodiment of the present application.
Detailed Description
In order to make the technical solutions and advantages of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and not an exhaustive list of all embodiments. And the embodiments and features of the embodiments in the present description may be combined with each other without conflict.
Due to the short duration of micro-expression and the very small amplitude of the motion. There is considerable difficulty in correctly observing and identifying. Based on the above, the application provides a micro expression frame identification method, the method compares the difference between each frame and the next frame and the difference between each frame and the previous frame to obtain the difference value of the frame, and the micro expression frame is determined according to the difference value of each frame.
The expression recognition method provided by the application comprises 2 major processes, wherein the first major process is a dimension reduction model training process, and the other major process is an actual micro expression recognition process based on a trained dimension reduction model.
The training dimension reduction model process is not a process which is executed every time the expression recognition method provided by the application is executed, and only when the expression recognition method provided by the application is executed for the first time, or an expression recognition scene is changed, or when the actual micro expression recognition is carried out based on the trained dimension reduction model, the dimension reduction effect of the expression characteristics is not ideal, or other reasons exist, the training dimension reduction model process is executed, so that the dimension reduction effect of the expression characteristics is improved, and the accuracy of the actual micro expression recognition result is improved.
The method and the device do not limit the execution triggering conditions of the process of training the dimension reduction model.
The specific implementation method of the process of training the dimension reduction model is as follows:
step 1, a sample set X is obtained.
Wherein the total number of samples in X is m, each sample comprises a plurality of expression features, and each sample belongs to one category.
For example, if the samples in X belong to E different classes, i.e., class 1, class 2, … …, class i, … …, class E. In class 1 having b 1 A sample, b 1 The set of samples is X 1 In class 2 there are b 2 A sample, b 2 The set of samples is X 2 ,……。
And 2, classifying all samples according to categories.
Taking the column in step 1 as an example, all samples are classified into class E in this step, the sample belonging to class 1 is classified into class 1, and the sample belonging to class 2 is classified into class 1, … ….
And 3, calculating the mean vectors of all types.
Specifically, for any class (e.g., class i), the mean vector is calculated by the following formula:
Figure BDA0001897945250000061
wherein i is a class identifier, mu i Is a mean vector of class i, b i Is the number of samples of the i-th class, j is the sample identification, x ij And forming a vector for the expressive features of the ith sample of the ith type.
And 4, determining a total mean vector according to the various mean vectors.
Specifically, the overall mean vector is determined by the following formula:
Figure BDA0001897945250000071
wherein, mu 0 And E is the total mean vector, and is the total number of different classes to which the samples in X belong.
And 5, calculating an inter-class variance vector and an intra-class variance vector according to the total mean vector.
The specific calculation formula is as follows:
Figure BDA0001897945250000072
Figure BDA0001897945250000073
wherein S is w Is an inter-class variance vector, S b Is an intra-class variance vector, X i Is a set composed of the ith type samples.
And 6, determining the expression characteristics after dimensionality reduction according to the between-class variance vector and the within-class variance vector to form a dimensionality reduction model.
The specific calculation method is as follows:
1) Calculating a weight vector W = diag (S) composed of the weights of the expression features b ·/S w )。
Wherein diag () is a function that takes the elements on the diagonal of the matrix,. Or is an operator that is used to apply S w And S b Divided by the corresponding element.
2) And sorting the expression features according to the sequence of the weights of the expression features from large to small.
3) And determining a preset number of expression features which are ranked in the front as the expression features after dimension reduction.
The reduced expressive features may form a feature subset F. The larger the weight is, the more suitable the feature component corresponding to the weight is for the micro expression classification.
And outputting to obtain a feature subset to form a dimension reduction model.
FIG. 1 shows a schematic diagram of a dimension reduction model divided into classes 2.
The method for implementing the actual micro expression recognition process based on the trained dimension reduction model is shown in fig. 2:
s101, acquiring a face video.
Because the duration of the micro expression is short and the action amplitude is very small, the face video image in the step only needs to include the face in each frame, and the precise corresponding video of the micro expression is not needed.
And S102, carrying out face recognition on each frame in the video and extracting a face area.
The present embodiment does not limit the extraction method of the face region, and any existing extraction method may be used.
And S103, extracting the pixel number, the background color and the face brightness of each frame in the video.
The pixel values of video files obtained by video acquisition equipment with different configurations are different, and the pixel numbers of the previous frame and the next frame are different, so that the micro expression recognition is influenced, and therefore, the pixel number of each frame in the video is extracted.
The number of pixels can be expressed in terms of a number, such as a "0.3 megapixel" digital camera, which has a nominal 30 thousand pixels; it may also be represented by a pair of numbers, for example "640 x 480 display", which represents transverse 640 pixels and longitudinal 480 pixels (e.g. VGA display). A pair of numbers can also be converted into one number, for example, 640 × 480=307200 pixels in a 640 × 480 display.
The number of pixels of each frame in this step is the total number of pixels in the frame, and can be calculated by the resolution of the image. If the image resolution of one frame is 1280 × 960, the number of pixels of the frame =1280 × 960=1228800.
In the present embodiment, the number of pixels is not limited, and any conventional extraction method may be used.
For the implementation method of extracting the background color of each frame in the video, the method includes but is not limited to:
for any frame in the video, the video is,
step 1.1, determining a non-face area in any frame as a background area.
Step 1.2, determining the RGB color value of each pixel point in any frame background area.
Wherein, the RGB color value comprises a red color value, a green color value and a blue color value.
And 1.3, calculating the RGB color mean value of the background area of any frame by the following formula.
Wherein, RGB color mean includes red color mean, green color mean, blue color mean:
Figure BDA0001897945250000081
j is the pixel point identification of any frame background area,
Figure BDA0001897945250000082
is the average of the red color of the background area of any frame,
Figure BDA0001897945250000083
is the average of the green color of the background area of any frame,
Figure BDA0001897945250000084
the average value of blue color of any frame background area, c 1j Red color value of j-th pixel point of any frame background area, c 2j Green color value of j-th pixel point of any frame background area, c 3j The blue color value of the j pixel point of any frame background area, n 1 The total number of pixel points in any frame background region.
And 1.4, calculating the mean square error of the RGB color of the background area of any frame.
Wherein, the RGB color mean square error comprises a red color mean square error, a green color mean square error and a blue color mean square error:
Figure BDA0001897945250000091
σ 11 mean square error of red color, σ 21 Mean square error of green color, σ 31 Is the blue color mean square error.
Step 1.5, determining the RGB color interval of any frame background area.
Wherein the RGB color interval comprises a red color interval
Figure BDA0001897945250000092
Green color interval
Figure BDA0001897945250000093
Blue color interval
Figure BDA0001897945250000094
Step 1.6, determining the number n of pixel points with red color values in RGB color values in a red color interval in an RGB color interval, and determining the number n of pixel points with green color values in a blue color interval in all pixel points of any frame of background area 2
Step 1.7, according to n 2 The background color of any frame is determined.
Wherein the background color is represented by RGB color values, the RGB color values including a red color value, a green color value, and a blue color value.
Specifically, the pixel number ratio n of any frame is calculated 3 =n 2 /n 1 (ii) a The red color value of the background color of any frame is
Figure BDA0001897945250000095
A green color value of
Figure BDA0001897945250000096
A blue color value of
Figure BDA0001897945250000097
The background color extraction method provided in this embodiment does not simply use the mean value of each color channel in each pixel RGB color value in the background as the background color, but dynamically adjusts the mean value according to the distribution condition of each color channel corresponding value in each pixel RGB color value, and uses the adjusted value as the background color, so that the determination of the background color better conforms to the actual situation.
For the implementation scheme of extracting the face brightness of each frame in the video, the implementation schemes include but are not limited to:
for any frame in the video, the video is,
and 2.1, determining the brightness value of each pixel point in the face area of any frame through the following formula.
Figure BDA0001897945250000098
Wherein k is the pixel point identification of any frame of face region, h k The brightness value, R, of the kth pixel point of any frame of face region k Is a red color value, G, of the RGB color values of the k-th pixel point k Is a green color value, B, of the RGB color values of the k-th pixel point k RGB color value of k-th pixel pointBlue color value of (1).
And 2.2, determining the maximum brightness value and the minimum brightness value in the brightness values of all pixel points in any frame of human face area.
Step 2.3, calculating the brightness mean value of any frame of human face area
Figure BDA0001897945250000101
Wherein n is 4 The total number of pixel points in the face region of any frame.
Step 2.4, according to the maximum brightness value, the minimum brightness value and
Figure BDA0001897945250000102
the face brightness of any frame in the video is determined.
In particular, the method comprises the following steps of,
1) The first difference d1= maximum luminance value-minimum luminance value is calculated.
2) Calculating a second difference value
Figure BDA0001897945250000103
3) Calculating a third difference value
Figure BDA0001897945250000104
4) The luminance ratio d4= | d1-d2|/| d1-d3| is calculated.
5) Calculating the mean square error of brightness of any frame of human face region
Figure BDA0001897945250000105
6) The face brightness of any frame in the video is
Figure BDA0001897945250000106
The face brightness extraction method provided by this embodiment does not simply use the average value of the brightness of each pixel in the face region as the face brightness, but dynamically adjusts the average value according to the difference between the brightness of each pixel and the maximum brightness and the minimum brightness, and uses the adjusted value as the face brightness, so that the determination of the face brightness better conforms to the actual situation.
S104, sequentially selecting a non-first frame, and calculating the area difference, the pixel number difference, the background color difference and the face brightness difference of the face area of the selected frame and the previous frame.
Sequentially selecting one frame from the beginning of the second frame to the end of the last frame, determining the difference of the areas of the face regions of the selected frame and the frame before the frame as the area difference of the face regions, determining the difference of the number of pixels as the number difference of the pixels, determining the difference of the background color as the color difference of the background color, and determining the difference of the face brightness as the brightness difference of the face.
For example, face region area difference = face region area of the selected frame-face region area of the frame immediately preceding it. Pixel number difference = the number of pixels of the selected frame-the number of pixels of the frame preceding it. Background color difference = background color of selected frame-background color of its previous frame. Face luminance difference = face luminance of the selected frame-face luminance of the frame immediately preceding it.
And S105, calculating the difference value of each non-first frame.
Wherein, the difference value = (human face area difference ^ human face brightness difference + background color difference) ^ pixel number difference.
And ^ is a power operator.
In addition, in order to increase the execution speed of the scheme provided by this embodiment, before calculating the difference value of each non-first frame, a preliminary screening may be performed on the non-first frame, and frames that are obviously not the same person and obviously do not belong to the micro expression are proposed.
Namely, the specific execution process of S105 is: and performing primary screening on the non-first frame according to the face area difference, the pixel number difference, the background color difference and the face brightness difference of each non-first frame, and calculating the difference value of each frame after primary screening.
The scheme for primarily screening the non-first frame according to the face area difference, the pixel number difference, the background color difference and the face brightness difference of each non-first frame includes but is not limited to:
for any non-first frame, if the area difference of the face region of any non-first frame is not greater than a first value, the pixel number difference is not greater than a second value, the background color difference is not greater than a third value, and the face brightness difference is not greater than a fourth value, passing the primary screening for any non-first frame; or if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then any non-first frame passes through the primary screening; or if the face brightness difference of any non-first frame is not larger than the fourth value, but the face area difference, the pixel number difference and the background color difference are all 0, then any non-first frame passes through the preliminary screening.
The first value is (sum of face area differences of all non-first frames + face area-avg 1 of the first frames)/total frame number of the face video, the second value is (sum of pixel number differences of all non-first frames + pixel number-avg 2 of the first frames)/total frame number of the face video, the third value is (sum of background color differences of all non-first frames + background color-avg 3 of the first frames)/total frame number of the face video, the fourth value is (sum of face brightness differences of all non-first frames + face brightness-avg 4 of the first frames)/total frame number of the face video, avg1= face area sum of all frames/total frame number of the face video, avg2= sum of all frames/total frame number of the face video, avg3= background color sum of all frames/total frame number of the face video, avg4= face brightness sum of all frames/total frame number of the face video.
And S106, determining the frames with the difference values larger than a preset threshold value and the first frame of the video as candidate frames.
The preset threshold value ensures the difference, the difference amplitude of different micro expressions is different, and the preset threshold value is used for carrying out adaptive selection according to different application fields of the method, so that the universality of the expression recognition method provided by the application is ensured.
And S107, determining the frames which are continuously identified as the micro expression frames in the candidate frames.
For example, if the candidate frames are frame 3, frame 5, frame 6, frame 8, and frame 9, then the frames that identify the succession (frame 5, frame 6, frame 8, and frame 9) are all determined to be microexpressing frames.
It is likely that frame 5, frame 6, and frame 8, frame 9, represent a micro expression at this time.
The foregoing is merely exemplary and does not represent actual situations.
The present application does not limit "continuous" as long as it is a non-individual frame. For example, if there are 2 frames with consecutive marks, the 2 frames with consecutive marks are all determined as the microexpressive frames. For another example, if there are 3 frames with consecutive marks, all the 3 frames with consecutive marks are determined as the microexpressing frames.
And S108, extracting the expression features of the micro-expression frames, carrying out dimension reduction processing on the expression features through a dimension reduction model trained in advance, and identifying the features subjected to dimension reduction to obtain an identification result.
In the step, a micro expression recognition model can be trained in advance, the expression characteristics of the micro expression frame are extracted, the dimension reduction processing is carried out on the expression characteristics through the dimension reduction model trained in advance, and then the feature after dimension reduction is recognized by the micro expression recognition model to obtain a recognition result.
The training process of the micro expression recognition model comprises but is not limited to:
and 3.1, acquiring a plurality of sample videos.
The sample video may be obtained from an existing microexpression dataset.
Since micro-expressions are tiny facial movements that a person makes when trying to mask his mood. In a strict sense, the micro expression that people subjectively simulate cannot be called as the micro expression, so the induction mode of the micro expression determines the reliability degree of the data.
This step may obtain multiple sample videos from one or 2 of the following 2 existing micro-expression datasets:
the micro-expression dataset SMIC, established by the university of orlu, finland, requires the subject to watch a video with large mood swings and to try to mask his mood from being exposed, and the recorder observes the subject's expression without watching the video. If the recorder observes the facial expression of the subject, the subject gets a penalty. Under the induction mechanism, 164 video sequences of 16 persons are formed, the micro-expression categories are 3, namely positive (positive), surprise (negative) and negative (negative), and the number of the video sequences is 70, 51 and 43 respectively.
The micro-expression data set CASME2, established by the psychological research institute of the academy of Chinese sciences, employs a similar inducing mechanism to ensure the reliability of data, and only if the subject successfully suppresses his facial expression and is not discovered by the recorder, the subject is rewarded accordingly. The data set is 5 micro-expression categories consisting of 247 video sequences of 26 individuals, namely happy (happy), dislike (disgust), surprised (surrise), suppressed (suppression) and other (other), and the number of video sequences corresponding to each category is 32, 64, 25, 27 and 99 respectively.
And 3.2, for each sample video, extracting corresponding expression features by adopting a local binary pattern.
A Local Binary Pattern (LBP) descriptor is defined on the central pixel and its surrounding rectangular neighborhood, as shown in fig. 3, with the gray value of the central pixel as a threshold, the neighborhood pixels around the central pixel are Binary-quantized, the code greater than or equal to the central pixel value is 1, and the code smaller than the central pixel value is 0, and a Local Binary Pattern is formed.
And (4) connecting the binary mode in series in a clockwise direction by taking the upper left corner as a starting point to obtain a string of binary digits, wherein the corresponding decimal digits can uniquely identify the central pixel point. In this way, each pixel in the image can be computed using a local binary pattern.
As shown in fig. 3, the center pixel value in the left table is 178, the upper left corner is 65, 65 is straw 178, so the corresponding value is 0, 188>178, and so the corresponding value is 1. By analogy, the table on the right side of fig. 3 is obtained, and then the binary pattern value is 01000100.
In addition, the extension of the LBP static texture descriptor in the time-space domain can also form 2-dimensional local binary patterns on 3 orthogonal planes. As shown in fig. 4, LBP features of video sequences in three orthogonal planes XY, XT, and YT are extracted, and feature vectors in each orthogonal plane are concatenated to form an LBP-TOP feature vector. The method not only considers the local texture information of the image, but also describes the change condition of the video along with the time.
However, the vector dimension of LBP-TOP is 3X 2 L And L is the number of the field points. If the expression features extracted in the step 3.2 are directly used for modeling, the model training speed is low due to large feature dimensions, and the effect is poor. Therefore, after the expression features are extracted in the step 3.2, the step 3.3 is executed to reduce the dimension of the practical training model which is the considered expression features and improve the model training efficiency.
And 3.3, carrying out recognition training on each sample video to form a micro expression recognition model.
There may be various training methods in this step, and this embodiment provides that the following training methods are adopted:
3.3.1, clustering each sample video based on the expression characteristics by adopting any clustering algorithm (such as a k-means algorithm) to form micro expression classes to which each sample video belongs.
3.3.2, adjusting the parameters in the clustering algorithm according to the second standard classification result of each sample video.
Since each sample video has a label for identifying the micro-expression category, the label is obtained in the step and is used as a second standard classification result of each sample video.
3.3.3, repeating the steps of 3.3.1 and 3.3.2, finishing training and forming the micro expression recognition model.
The micro expression recognition model in the application is a classifier.
For example: a Support Vector Machine (SVM) method is employed. The key of the SVM is the kernel function, and different SVM classification effects can be achieved by adopting different kernel functions.
For example, the following kernel functions may be employed: linear Kernel, chi-square Kernel, histogram Intersection Kernel.
In addition, in order to improve the classification recognition rate of the finally trained classification model, cross Validation (Cross Validation) can be adopted to test the performance of the micro expression recognition model. Specifically, all sample videos are divided into two subsets, one subset for training the classifier is called a training set, and the other subset for verifying the effectiveness of the analysis classifier is called a testing set. And testing the trained classifier by using the test set as a performance index of the classifier. Common methods are simple cross validation, K-fold cross validation and leave-one-cross validation.
The leave-one-out cross validation method is used for performing micro-expression classification training on SVM classifiers with different kernel functions. And (3) selecting all video sequences of one subject as a test sample and all video sequences of the other I subjects as training samples, repeating the experiment for I +1 times, and calculating the average classification recognition rate for I +1 times.
Based on the above, the training of the micro expression recognition model is completed.
After the micro expression recognition model is trained, the expression features are subjected to dimensionality reduction through a pre-trained dimensionality reduction model, and the characteristics subjected to dimensionality reduction are recognized by the micro expression recognition model to obtain a recognition result.
Because the dimension reduction processing is carried out on the expression characteristics before the micro expression recognition is carried out through the micro expression recognition model, the recognition rate and the recognition accuracy rate of the micro expression recognition model can be improved.
It should be noted that "first", "second", "third", and the like in this embodiment and the subsequent embodiments are only used for distinguishing preset thresholds, classification results, standard classification results, and the like in different steps, and do not have any other special meaning.
Has the advantages that:
the micro expression frame is selected for recognition according to the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference, the frame related to the micro expression in the face video can be accurately extracted, and the recognition efficiency and accuracy of the micro expression frame are improved.

Claims (10)

1. A micro-expression recognition method, the method comprising:
acquiring a face video;
carrying out face recognition on each frame in the video, and extracting a face area;
extracting the pixel number, the background color and the face brightness of each frame in the video;
sequentially selecting a non-first frame, and calculating the area difference of the face area, the pixel number difference, the background color difference and the face brightness difference of the selected frame and the previous frame;
calculating a difference value of each non-first frame, wherein the difference value = (the area difference of the face region = the face brightness difference + the background color difference) ^ the pixel number difference;
determining frames with difference values larger than a preset threshold value and determining the first frame of the video as candidate frames;
in the candidate frames, determining frames with continuous marks as micro-expression frames;
and extracting the expression features of the micro-expression frames, performing dimensionality reduction on the expression features through a pre-trained dimensionality reduction model, and identifying the dimensionality-reduced features to obtain an identification result.
2. The method of claim 1, wherein extracting the background color of each frame in the video comprises:
for any one frame in the video, the video is,
determining a non-face area in any frame as a background area;
determining RGB color values of all pixel points in the background area of any frame, wherein the RGB color values comprise red color values, green color values and blue color values;
calculating the RGB color mean value of the background area of any frame by the following formula, wherein the RGB color mean value comprises a red color mean value, a green color mean value and a blue color mean value:
Figure FDA0001897945240000011
wherein j is the pixel point identification of the background area of any frame,
Figure FDA0001897945240000012
the red color of the background area of any frameThe average value of the average value is calculated,
Figure FDA0001897945240000013
is the average of the green color of the background area of any frame,
Figure FDA0001897945240000014
is the average value of blue color of the background area of any frame, c 1j A red color value, c, of a j-th pixel point in the background area of any frame 2j Green color value of j-th pixel point in any frame background area, c 3j The blue color value, n, of the j-th pixel point in the background area of any frame 1 The total number of pixel points in any frame background region;
calculating the RGB color mean square error of the background area of any frame, wherein the RGB color mean square error comprises a red color mean square error, a green color mean square error and a blue color mean square error:
Figure FDA0001897945240000021
wherein σ 11 Mean square error of red color, σ 21 Mean square error of green color, σ 31 Is the blue color mean square error;
determining RGB color intervals of any frame background area, wherein the RGB color intervals comprise red color intervals
Figure FDA0001897945240000022
Green colour interval
Figure FDA0001897945240000023
Blue color interval
Figure FDA0001897945240000024
Determining the number n2 of pixel points of which the red color values are located in a red color interval of an RGB color interval, the green color values are located in a green color interval, and the blue color values are located in a blue color interval, in all the pixel points of the background area of any frame;
and determining the background color of any frame according to the n 2.
3. The method of claim 2, wherein the background color is represented by RGB color values;
the determining the background color of any frame according to n2 comprises:
calculating the pixel number ratio n of any frame 3 =n 2 /n 1
The red color value of the background color of any frame is
Figure FDA0001897945240000025
A green color value of
Figure FDA0001897945240000026
A blue color value of
Figure FDA0001897945240000027
4. The method of claim 1, wherein extracting the face luminance of each frame in the video comprises:
for any one frame in the video, the video is,
determining the brightness value of each pixel point in the face area of any frame through the following formula:
Figure FDA0001897945240000028
wherein k is the pixel point identification of any frame of face region, h k The brightness value R of the kth pixel point of any frame of human face region k Is a red color value, G, of the RGB color values of the k-th pixel point k Is the green color value of the RGB color values of the k-th pixel point,B k the color value of blue in the RGB color values of the kth pixel point;
determining a maximum brightness value and a minimum brightness value in the brightness values of all pixel points of any frame of human face region;
calculating the brightness mean value of any frame of human face region
Figure FDA0001897945240000031
Wherein n is 4 The total number of pixel points in the face region of any frame is calculated;
according to the maximum brightness value, the minimum brightness value and
Figure FDA0001897945240000032
and determining the face brightness of any frame in the video.
5. The method of claim 4, wherein the method is based on a maximum luminance value, a minimum luminance value, and
Figure FDA0001897945240000033
determining the face brightness of any frame in the video, comprising:
calculating a first difference d1= maximum luminance value-minimum luminance value;
calculating a second difference value
Figure FDA0001897945240000034
Calculating a third difference value
Figure FDA0001897945240000035
Calculating a brightness ratio d4= | d1-d2|/| d1-d3|;
calculating the mean square error of the brightness of the human face region of any frame
Figure FDA0001897945240000036
The face brightness of any frame in the video is
Figure FDA0001897945240000037
6. The method of claim 1, wherein before calculating the disparity value for each non-first frame, further comprising:
performing primary screening on the non-first frame according to the area difference, the pixel number difference, the background color difference and the face brightness difference of the face area of each non-first frame;
the calculating a difference value of each non-first frame includes:
and calculating the difference value of each frame after the initial screening.
7. The method of claim 6, wherein the preliminary screening of the non-first frame according to the face area difference, the pixel number difference, the background color difference, and the face brightness difference of each non-first frame comprises:
for any of the non-leading frames, the frame is,
if the area difference of the face region of any non-first frame is not larger than a first value, the pixel number difference is not larger than a second value, the background color difference is not larger than a third value, and the face brightness difference is not larger than a fourth value, the non-first frame passes through a primary screen; alternatively, the first and second electrodes may be,
if the area difference of the face region of any non-first frame is not larger than a first value, but the pixel number difference, the background color difference and the face brightness difference are all 0, then the non-first frame passes through the primary screening; alternatively, the first and second electrodes may be,
if the face brightness difference of any non-first frame is not larger than a fourth value, but the face area difference, the pixel number difference and the background color difference are all 0, the non-first frame passes through a primary screen;
the first value is (sum of face area differences of all non-first frames + face area-avg 1 of the first frames)/total frame number of the face video, the second value is (sum of pixel number differences of all non-first frames + pixel number-avg 2 of the first frames)/total frame number of the face video, the third value is (sum of background color differences of all non-first frames + background color-avg 3 of the first frames)/total frame number of the face video, the fourth value is (sum of face brightness differences of all non-first frames + face brightness-avg 4 of the first frames)/total frame number of the face video, avg1= face area sum of all frames/total frame number of the face video, avg2= sum of all frames/total frame number of the face video, avg3= background color sum of all frames/total frame number of the face video, avg4= face brightness sum of all frames/total frame number of the face video.
8. The method according to any one of claims 1 to 7, wherein before the dimensionality reduction processing is performed on the expression features through the pre-trained dimensionality reduction model, the method further comprises:
obtaining a sample set X, wherein the total number of samples in the X is m, each sample comprises a plurality of expression features, and each sample belongs to one category;
classifying all samples according to categories;
calculating mean vectors of classes
Figure FDA0001897945240000041
Wherein i is a class identifier, mu i Is a mean vector of class i, b i Is the number of samples of the i-th class, j is the sample identification, x ij A vector formed by expression characteristics of the ith sample and the jth sample;
determining the total mean vector according to the mean vectors of all types
Figure FDA0001897945240000042
Wherein, mu 0 E is the total mean vector, and E is the total number of different classes to which the samples in X belong;
calculating an inter-class variance vector and an intra-class variance vector according to the total mean vector;
and determining the expression characteristics after dimensionality reduction according to the between-class variance vector and the within-class variance vector to form a dimensionality reduction model.
9. The method of claim 8, wherein computing the between-class variance vector and the within-class variance vector from the overall mean vector comprises:
Figure FDA0001897945240000043
Figure FDA0001897945240000044
wherein S is w Is an inter-class variance vector, S b Is an intra-class variance vector, X i Is a set composed of the ith type samples.
10. The method of claim 9, wherein determining the reduced-dimension expression features according to the inter-class variance vector and the intra-class variance vector comprises:
calculating a weight vector W = diag (S) composed of the weights of the expression features b ·/S w ) Wherein diag () is a function that takes the elements on the diagonal of the matrix, ·/is an operator that operates on S w And S b Is divided by the corresponding element;
sorting the expression features according to the sequence of the weights of the expression features from big to small;
and determining a preset number of expression features which are ranked in the front as the expression features after dimension reduction.
CN201811499959.9A 2018-12-10 2018-12-10 Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction Active CN109614927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811499959.9A CN109614927B (en) 2018-12-10 2018-12-10 Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811499959.9A CN109614927B (en) 2018-12-10 2018-12-10 Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction

Publications (2)

Publication Number Publication Date
CN109614927A CN109614927A (en) 2019-04-12
CN109614927B true CN109614927B (en) 2022-11-08

Family

ID=66007965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811499959.9A Active CN109614927B (en) 2018-12-10 2018-12-10 Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction

Country Status (1)

Country Link
CN (1) CN109614927B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991458B (en) * 2019-11-25 2023-05-23 创新奇智(北京)科技有限公司 Image feature-based artificial intelligent recognition result sampling system and sampling method
CN112528945B (en) * 2020-12-24 2024-04-26 上海寒武纪信息科技有限公司 Method and device for processing data stream
CN113076813B (en) * 2021-03-12 2024-04-12 首都医科大学宣武医院 Training method and device for mask face feature recognition model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104796683B (en) * 2014-01-22 2018-08-14 南京中兴软件有限责任公司 A kind of method and system of calibration image color
CN105139039B (en) * 2015-09-29 2018-05-29 河北工业大学 The recognition methods of the micro- expression of human face in video frequency sequence
CN108268859A (en) * 2018-02-08 2018-07-10 南京邮电大学 A kind of facial expression recognizing method based on deep learning

Also Published As

Publication number Publication date
CN109614927A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
Kao et al. Local contrast enhancement and adaptive feature extraction for illumination-invariant face recognition
Chen et al. Detecting and reading text in natural scenes
Kumar et al. Real time face recognition using adaboost improved fast PCA algorithm
KR100882476B1 (en) Method for distinguishing obscene image and apparatus therefor
CN109614927B (en) Micro expression recognition based on difference of front and rear frames and characteristic dimension reduction
CN111008971B (en) Aesthetic quality evaluation method of group photo image and real-time shooting guidance system
Smith et al. Facial expression detection using filtered local binary pattern features with ECOC classifiers and platt scaling
CN109190582B (en) Novel micro-expression recognition method
Shrivastava et al. Conceptual model for proficient automated attendance system based on face recognition and gender classification using Haar-Cascade, LBPH algorithm along with LDA model
Sahu et al. Study on face recognition techniques
Szankin et al. Influence of thermal imagery resolution on accuracy of deep learning based face recognition
Chang et al. Personalized facial expression recognition in indoor environments
Anantharajah et al. Quality based frame selection for video face recognition
Hilado et al. Face detection using neural networks with skin segmentation
Chen et al. Adaboost learning for detecting and reading text in city scenes
Chen et al. An online people counting system for electronic advertising machines
Fradi et al. A new multiclass SVM algorithm and its application to crowd density analysis using LBP features
CN113450369A (en) Classroom analysis system and method based on face recognition technology
Sharma et al. Study and implementation of face detection algorithm using Matlab
Alattab et al. Efficient method of visual feature extraction for facial image detection and retrieval
Shukla et al. Comparison of Face Recognition algorithms & its subsequent impact on side face
Ekinci et al. Kernel Fisher discriminant analysis of Gabor features for online palmprint verification
Chin Face recognition based automated student attendance system
Maken An elementary study on various techniques involved in face recognition systems: a review
Alrikabi et al. Deep Learning-Based Face Detection and Recognition System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant