CN112200065B - Micro-expression classification method based on action amplification and self-adaptive attention area selection - Google Patents
Micro-expression classification method based on action amplification and self-adaptive attention area selection Download PDFInfo
- Publication number
- CN112200065B CN112200065B CN202011070118.3A CN202011070118A CN112200065B CN 112200065 B CN112200065 B CN 112200065B CN 202011070118 A CN202011070118 A CN 202011070118A CN 112200065 B CN112200065 B CN 112200065B
- Authority
- CN
- China
- Prior art keywords
- frame
- apex
- micro
- attention area
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Abstract
The invention relates to a micro-expression classification method based on action amplification and self-adaptive attention area selection. Firstly, acquiring a micro expression data set, and extracting a start frame and a peak frame; then inputting the extracted initial frame and peak frame into an action amplification network to generate an image after action amplification; then preprocessing the amplified image; and finally, identifying the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.
Description
Technical Field
The invention relates to the field of pattern recognition and computer vision, in particular to a micro-expression classification method based on action amplification and self-adaptive attention area selection.
Background
The human body is the highest animal, sometimes disguising or hiding own emotion, in which case, people cannot obtain useful information from the macroscopic expression of the face. To be able to tap useful information from camouflaged facial expressions, ackerman discovered a transient, involuntary, rapid facial emotion, i.e., a micro-expression, that is provoked to involuntarily appear in the face when one tries to hide some kind of real emotion. A standard micro-expression lasts 1/5 to 1/25 seconds and usually appears only in a specific part of the face.
Micro-expressions have great prospects in the aspects of national security, criminal inquiries and medical applications, but the subtlety and conciseness of micro-expressions form great challenges for human eyes, so that in recent years, people put forward a lot of work of realizing automatic identification of micro-expressions by using computer vision and machine learning algorithms.
Disclosure of Invention
The invention aims to provide a micro-expression classification method based on action amplification and self-adaptive attention area selection, which can effectively classify micro-expression images.
In order to achieve the purpose, the technical scheme of the invention is as follows: a micro-expression classification method based on action amplification and self-adaptive attention area selection comprises the following steps:
step S1, acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2, inputting the extracted initial frame and peak frame into an action amplification network to generate an action amplified image;
step S3, preprocessing the amplified image, and dividing a training set and a test set according to an LOSO principle;
and step S4, recognizing the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.
In an embodiment of the present invention, the step S1 specifically includes the following steps:
step S11, acquiring a micro expression data set, and cutting the image into 224 × 224 images after face alignment;
step S12, extracting the initial frame and the peak value frame directly according to the marked content for the micro expression data set with the initial frame and the peak value frame;
step S13, extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked with the initial frame and the peak frame; the frame difference method comprises the following steps: let P be { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture with the first frame of the sequence as the starting frame, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is taken to obtain a difference image Dn, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference Dnavg of the difference image is calculated, wherein the calculation method comprises the following steps:
wherein, Dn.shape [0 ]]Representing the height of the differential image Dn, Dn]Representing the width of the difference image Dn, calculating the average inter-frame difference of all frames except the initial frame and the initial frame, and sequencing, wherein the frame with the maximum average inter-frame difference is the peak value frame p corresponding to the image sequence apex 。
In an embodiment of the present invention, the step S2 specifically includes the following steps:
step S21, designing the encoder to start frame p start Sum peak frame p apex Extracting shape and texture features, wherein the encoder consists of a convolution layer and ResBlock, and if T (-) represents the encoder texture feature extraction module function, T is T (p), wherein T is { T } start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22, designing amplifier pair start frame p start Sum peak frame p apex Amplifying the shape feature, and activating the functional model through convolutional layer and activation function model of neural networkThe operational amplification effect of the pseudo-bandpass filter is enhanced to enhance the signal at a frequency with a large variation intensity and filter the noise at a frequency with a small variation intensity, where G (-) represents a function map formed by k3s1 convolution and the activation function ReLu in the amplifier, and H (-) represents a function map formed by k3s1 convolution and ResBlock in the amplifier, the final amplifier amplification result is:
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
where M (-) denotes the mapping function of the amplifier, α denotes the magnification, s start Representing the shape feature of the start frame, s apex Representing shape features of the peak frames;
step S23, designing a pyramid reconstruction fusion process when a decoder simulates Lagrange method to amplify motion, wherein the decoder part is also a small convolutional neural network, and the input of the neural network is texture characteristic t start|apex And the amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, it is equivalent to the shape feature s that needs to be strengthened start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex And then, after 9 ResBlock, the final output result is obtained by performing up-sampling once and two convolution layers of k3s 1.
In an embodiment of the present invention, the step S3 specifically includes the following steps:
step S31, performing sharpening on the magnified micro expression image to solve the problem of pixel blurring that may exist after the micro expression image is magnified, wherein the calculation method is as follows:
a(i,j)=p(i,j)-k τ ▽ 2 p(i,j)
wherein k is τ Is the coefficient, k, associated with the diffusion effect τ =1;
Step S32, each timeThe method comprises the steps that a plurality of subjects are arranged under a data set, each subject represents a testee, each subject contains a plurality of micro-expression sequences which represent the micro-expression sequences generated by the testee, one subject of one data set is taken as a test set when the data set is divided according to the principle of leave-one-subject-out, all other subjects are combined together to be used as a training set, and the last data set obtains the subject i A training set and a test set, wherein Sub i Representing the number of subjects in a data set.
In an embodiment of the present invention, the step S4 specifically includes the following steps:
step S41, designing a self-adaptive attention area selection network to classify the input amplified and preprocessed micro-expression images, wherein the self-adaptive attention area selection network comprises three scales of sub-networks, the three scales of sub-networks have the same structure but different parameters, and each scale of sub-network comprises two modules which are respectively a classification module and an attention area selection module;
the classification module is composed of a convolution layer, an activation layer and a pooling layer and is used for extracting features of the input micro-expression image, and the calculation process is as follows:
c(X)=u(w i *X)
wherein X represents a vector representation of the input image, w i Parameters representing the network layer, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
the attention area selection module is composed of two stacked fully-connected layers, and let e (-) denote the mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S42, for each sub-network, passing through a classification module and an attention area selection module, where the input value of the next sub-network is the value of the position of the specific area to be cut after the positioning area is obtained by the last sub-network classification detection, and the specific cutting operation is implemented by a rectangular function, first determining the top left corner and the bottom right corner of the attention area:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And t is y(tl) <y<t y(br) If N is 1, otherwise N is 0, the picture can be cropped by the mask matrix N, and finally the cropped result is calculated:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Representing the clipped result;
step S43, during training, the parameters of the pre-trained VGGNet initialization classification module are used, the region with the highest response of the last layer of convolution of the classification module is used for initializing the attention region selection module, and finally the two modules are iteratively trained to converge to obtain the final result.
Compared with the prior art, the invention has the following beneficial effects:
1. the micro-expression classification method based on action amplification and self-adaptive attention area selection constructed by the invention can effectively classify micro-expression images and improve the classification effect of the micro-expression images.
2. The method generates the action amplification result between two frames in a convolution neural network mode, and has less noise and edge blurring compared with the traditional action amplification method, more robustness and better performance.
3. Aiming at the problem that local area attention needs to be carried out in a strict alignment face segmentation mode in the traditional micro expression recognition process, the invention provides a self-adaptive attention area discovery method.
Drawings
Fig. 1 is a schematic diagram of the principle of the present invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a micro-expression classification method based on motion amplification and adaptive attention area selection, which specifically includes the following steps:
step S1: acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2: inputting the extracted initial frame and the extracted peak frame into an action amplification network to generate an action amplified image;
step S3: preprocessing the amplified image, and dividing a training set and a testing set according to an LOSO principle;
step S4: and identifying the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.
In this embodiment, the step S1 includes the following steps:
step S11: acquiring a micro expression data set, aligning the face, and uniformly cutting the face into 224 × 224 sizes;
step S12: for a micro expression data set with initial frame and peak frame labels, extracting the initial frame and the peak frame directly according to the labeled contents;
step S13: extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked by the initial frame and the peak frame;
step S14: the specific content of the frame difference method is that P is { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture, let us start the first frame of the sequence, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is obtained, a difference image Dn is obtained, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference D of the difference image is calculatednavg, calculated as follows:
wherein, Dn.shape [0 ]]Shape [1 ] represents the height of the difference image Dn]Indicating the width of the differential image Dn. Calculating the average interframe difference of all frames except the initial frame and sequencing, wherein the frame with the maximum average interframe difference is the peak value frame p corresponding to the image sequence apex ;
In this embodiment, step S2 specifically includes the following steps:
step S21: designing the encoder to input the start frame p start Sum peak frame p apex Extracting shape and texture features, wherein the encoder mainly comprises a convolution layer and ResBlock, and T (·) represents the texture feature extraction module function of the encoder, and T ═ T (p), where T ═ T · start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22: designing amplifier pair start frame p start Sum peak frame p apex The shape feature is amplified, mainly simulating the action amplification effect of a band-pass filter through the convolution layer of a neural network and an activation function, strengthening signals on frequencies with large variation intensity and filtering noise on frequencies with small variation intensity, wherein G (-) represents a function mapping formed by k3s1 convolution and an activation function ReLu in the amplifier, H (-) represents a function mapping formed by k3s1 convolution and ResBlock in the amplifier, and the final result obtained by amplifying the amplifier is that
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
Where M (-) represents the mapping function of the amplifier and α represents the magnification;
step S23: designing a pyramid reconstruction fusion process when a decoder simulates a Lagrange method to carry out motion amplification, wherein the decoder part is also oneA small convolutional neural network whose input is the texture feature t start|apex And the roughly amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, it is equivalent to the shape feature s that needs to be strengthened start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex Therefore, noise which may be introduced is suppressed, then, after 9 ResBlock, one-time up-sampling and two k3s1 convolutional layers are carried out to obtain a final output result, the ResBlock can perfectly solve the problem of gradient dispersion, so that the network can be well propagated backwards, and in addition, a result of action amplification between two frames is generated in a neural network mode, compared with the result of noise and edge blurring generated by an action amplification method of a traditional method, the amplification effect is more robust, and the performance is better;
in this embodiment, step S3 specifically includes the following steps:
step S31: the micro expression image after being amplified is sharpened, so that the problem of pixel blurring possibly existing after the micro expression image is amplified is solved, and the calculation mode is as follows:
a(i,j)=p(i,j)-k τ ▽ 2 p(i,j)
wherein k is τ Is a coefficient related to the diffusion effect. The coefficient should be reasonable, if k τ Too large, the image contour will produce overshoot; otherwise if k τ When the size is too small, the sharpening effect is not obvious, and the value k is taken in the algorithm τ =1。
Step S32: according to the principle of leave-one-leave-out, when dividing the data set, one leave of one data set is taken as a test set at a time, and all the other leave are combined together to be taken as a training set, so that finally, the data set is subjected to one data setThe Sub can be obtained by the collection i A training set and a test set, Sub i Representing the number of subjects in a data set.
In this embodiment, step S4 specifically includes the following steps:
step S41: the self-adaptive attention area selection network is designed to classify the input amplified and preprocessed micro expression images, and mainly comprises three-scale sub-networks, wherein the three-scale sub-networks have the same structure but different parameters, and each scale sub-network comprises two modules which are respectively a classification module and an attention area selection module;
step S42: the classification module mainly comprises a plurality of convolution layers, an activation layer and a pooling layer and is used for extracting the characteristics of the input micro-expression image, and the calculation process is as follows
c(X)=u(w i *X)
Wherein X represents a vector representation of the input image, w i Parameters representing some network layers, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
step S43: the attention area selection module mainly comprises two stacked fully-connected layers, wherein e (-) represents a mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S44: for each sub-network, the sub-network passes through a classification module and an attention area selection module, the input value of the next sub-network is the value of the position of a specific area cut after the positioning area is obtained by classification detection of the previous sub-network, the specific cutting operation is realized by a rectangular function, and firstly, the upper left corner and the lower right corner of the attention area are determined:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And t is y(tl) <y<t y(br) When N is equal to 1, otherwise N is equal to 0, and the picture can be cropped by the mask matrix N. And finally, calculating a clipped result:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Showing the clipped result. The advantage of using a rectangular function is that: the method has the advantages that the method can have a function equivalent to direct clipping, and in the optimization process, the rectangular function can be propagated reversely so as to optimize the parameters of the rectangular frame;
step S45: during training, parameters of a classification module are initialized by using pre-trained VGGNet, an attention region selection module is initialized by using a region with the highest response of the last layer of convolution of the classification module, and finally, the two modules are trained to converge iteratively to obtain a final result. By utilizing the self-adaptive attention area finding method, the area position finally determining the difference of the micro expressions is positioned from top to bottom through analyzing the same expression on different scales, so that the problem that the key part is difficult to identify and position is solved to a certain extent, and the classification effect is improved.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.
Claims (3)
1. A micro-expression classification method based on action amplification and self-adaptive attention area selection is characterized by comprising the following steps:
step S1, acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2, inputting the extracted initial frame and peak frame into an action amplification network to generate an action amplified image;
step S3, preprocessing the amplified image, and dividing a training set and a test set according to the principle of leave-one-subject-out;
step S4, recognizing the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result;
the step S2 specifically includes the following steps:
step S21, designing the encoder to start frame p start Sum peak frame p apex Extracting shape and texture features, where the encoder consists of convolution layer and ResBlock, and T (-) represents the function of the texture feature extracting module of the encoder, and T is T (p), where T is { T } T (p) start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22, designing amplifier pair start frame p start Sum peak frame p apex Amplifying the shape characteristics, simulating the operation amplification effect of a band-pass filter through the convolution layer of the neural network and an activation function, strengthening the signals on the frequency with large variation intensity, and filtering the noise on the frequency with small variation intensity, wherein G (-) represents a function map formed by k3s1 convolution and an activation function ReLu in the amplifier, H (-) represents a function map formed by k3s1 convolution and ResBlock in the amplifier, and the final amplifier amplification result is as follows:
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
where M (-) denotes the mapping function of the amplifier, α denotes the magnification, s start Representing the shape feature of the start frame, s apex Representing shape features of the peak frames;
step S23, designing a pyramid reconstruction fusion process when a decoder simulates Lagrange method to amplify motion, wherein the decoder part is also a small convolutional neural network, and the input of the neural network is texture characteristic t start|apex And the amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, the shape feature s to be strengthened is start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex Therefore, noise which may be introduced is suppressed, and then after 9 ResBlock, the upsampling and two k3s1 convolutional layers are carried out again to obtain a final output result;
the step S4 specifically includes the following steps:
step S41, designing a self-adaptive attention area selection network to classify the input amplified and preprocessed micro-expression images, wherein the self-adaptive attention area selection network comprises three scales of sub-networks, the three scales of sub-networks have the same structure but different parameters, and each scale of sub-network comprises two modules which are a classification module and an attention area selection module respectively;
the classification module is composed of a convolution layer, an activation layer and a pooling layer and is used for extracting features of the input micro-expression image, and the calculation process is as follows:
c(X)=u(w i *X)
wherein X represents a vector representation of the input image, w i Parameters representing the network layer, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
the attention area selection module is composed of two stacked fully-connected layers, and let e (-) denote the mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S42, for each sub-network, passing through a classification module and an attention area selection module, where the input value of the next sub-network is the value of the position of the specific area to be cut after the positioning area is obtained by the last sub-network classification detection, and the specific cutting operation is implemented by a rectangular function, first determining the top left corner and the bottom right corner of the attention area:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And l y(tl) <y<l y(br) If N is 1, otherwise N is 0, the picture can be cropped by the mask matrix N, and finally the cropped result is calculated:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Representing the clipped result;
step S43, during training, the parameters of the pre-trained VGGNet initialization classification module are used, the region with the highest response of the last layer of convolution of the classification module is used for initializing the attention region selection module, and finally the two modules are iteratively trained to converge to obtain the final result.
2. The micro-expression classification method based on motion amplification and adaptive attention area selection as claimed in claim 1, wherein the step S1 specifically comprises the following steps:
step S11, acquiring a micro expression data set, and cutting the image into 224 × 224 images after face alignment;
step S12, extracting the initial frame and the peak value frame directly according to the marked content for the micro expression data set with the initial frame and the peak value frame;
step S13, extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked with the initial frame and the peak frame; by frame differencingThe method comprises the following steps: let P be { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture with the first frame of the sequence as the starting frame, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is taken to obtain a difference image Dn, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference Dnavg of the difference image is calculated, wherein the calculation method comprises the following steps:
wherein, Dn.shape [0 ]]Shape [1 ] represents the height of the difference image Dn]Representing the width of the difference image Dn, calculating the average inter-frame difference of all frames except the initial frame and the initial frame, and sequencing, wherein the frame with the maximum average inter-frame difference is the peak value frame p corresponding to the image sequence apex 。
3. The micro-expression classification method based on action amplification and adaptive attention area selection according to claim 1, wherein the step S3 specifically comprises the following steps:
step S31, carrying out sharpening processing on the amplified micro expression image, wherein the calculation mode is as follows:
wherein k is τ Is the coefficient, k, associated with the diffusion effect τ =1;
Step S32, under each data set, there are multiple subjects, each subject represents a testee, and each subject contains multiple micro-expression sequences, which represent the multiple micro-expression sequences generated by the testee, according to the principle of leave-one-subject-out, when dividing the data set, one subject of one data set is taken as the test set, and all the other subjects are combinedAnd taken together as a training set, and the last data set is taken as Sub i A training set and a test set, wherein Sub i Representing the number of subjects in a data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011070118.3A CN112200065B (en) | 2020-10-09 | 2020-10-09 | Micro-expression classification method based on action amplification and self-adaptive attention area selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011070118.3A CN112200065B (en) | 2020-10-09 | 2020-10-09 | Micro-expression classification method based on action amplification and self-adaptive attention area selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112200065A CN112200065A (en) | 2021-01-08 |
CN112200065B true CN112200065B (en) | 2022-08-09 |
Family
ID=74013087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011070118.3A Active CN112200065B (en) | 2020-10-09 | 2020-10-09 | Micro-expression classification method based on action amplification and self-adaptive attention area selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112200065B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112950922B (en) * | 2021-01-26 | 2022-06-10 | 浙江得图网络有限公司 | Fixed-point returning method for sharing electric vehicle |
CN115049957A (en) * | 2022-05-31 | 2022-09-13 | 东南大学 | Micro-expression identification method and device based on contrast amplification network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409222A (en) * | 2018-09-20 | 2019-03-01 | 中国地质大学(武汉) | A kind of multi-angle of view facial expression recognizing method based on mobile terminal |
CN110175596A (en) * | 2019-06-04 | 2019-08-27 | 重庆邮电大学 | The micro- Expression Recognition of collaborative virtual learning environment and exchange method based on double-current convolutional neural networks |
CN110287805A (en) * | 2019-05-31 | 2019-09-27 | 东南大学 | Micro- expression recognition method and system based on three stream convolutional neural networks |
CN110516571A (en) * | 2019-08-16 | 2019-11-29 | 东南大学 | Inter-library micro- expression recognition method and device based on light stream attention neural network |
CN110580461A (en) * | 2019-08-29 | 2019-12-17 | 桂林电子科技大学 | Facial expression recognition algorithm combined with multilevel convolution characteristic pyramid |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11361225B2 (en) * | 2018-12-18 | 2022-06-14 | Microsoft Technology Licensing, Llc | Neural network architecture for attention based efficient model adaptation |
-
2020
- 2020-10-09 CN CN202011070118.3A patent/CN112200065B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409222A (en) * | 2018-09-20 | 2019-03-01 | 中国地质大学(武汉) | A kind of multi-angle of view facial expression recognizing method based on mobile terminal |
CN110287805A (en) * | 2019-05-31 | 2019-09-27 | 东南大学 | Micro- expression recognition method and system based on three stream convolutional neural networks |
CN110175596A (en) * | 2019-06-04 | 2019-08-27 | 重庆邮电大学 | The micro- Expression Recognition of collaborative virtual learning environment and exchange method based on double-current convolutional neural networks |
CN110516571A (en) * | 2019-08-16 | 2019-11-29 | 东南大学 | Inter-library micro- expression recognition method and device based on light stream attention neural network |
CN110580461A (en) * | 2019-08-29 | 2019-12-17 | 桂林电子科技大学 | Facial expression recognition algorithm combined with multilevel convolution characteristic pyramid |
Non-Patent Citations (4)
Title |
---|
Facial Micro-expression Recognition with Adaptive Video Motion Magnification;Zhilin Lei et al.;《International Conference in Communications, Signal Processing, and Systems》;20200404;第2107-2116页 * |
基于深度可分卷积神经网络的实时人脸表情和性别分类;刘尚旺 等;《计算机应用》;20200410(第04期);第990-995页 * |
基于深度学习的表情识别算法研究;夏添;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑(月刊)》;20200615(第06期);第1-82页 * |
基于特征融合的注意力双线性池细粒度表情识别;刘力源 等;《鲁东大学学报(自然科学版)》;20200430;第36卷(第02期);第130-136页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112200065A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103116763B (en) | A kind of living body faces detection method based on hsv color Spatial Statistical Character | |
CN111445410B (en) | Texture enhancement method, device and equipment based on texture image and storage medium | |
CN107729820B (en) | Finger vein identification method based on multi-scale HOG | |
CN111967427A (en) | Fake face video identification method, system and readable storage medium | |
CN110728209A (en) | Gesture recognition method and device, electronic equipment and storage medium | |
CN111667400B (en) | Human face contour feature stylization generation method based on unsupervised learning | |
CN112200065B (en) | Micro-expression classification method based on action amplification and self-adaptive attention area selection | |
CN111967363B (en) | Emotion prediction method based on micro-expression recognition and eye movement tracking | |
CN113537008B (en) | Micro expression recognition method based on self-adaptive motion amplification and convolutional neural network | |
CN111476178A (en) | Micro-expression recognition method based on 2D-3D CNN | |
CN111476727B (en) | Video motion enhancement method for face-changing video detection | |
CN111178130A (en) | Face recognition method, system and readable storage medium based on deep learning | |
CN113822157A (en) | Mask wearing face recognition method based on multi-branch network and image restoration | |
CN107506713A (en) | Living body faces detection method and storage device | |
CN112396036A (en) | Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction | |
CN112183419A (en) | Micro-expression classification method based on optical flow generation network and reordering | |
CN109522865A (en) | A kind of characteristic weighing fusion face identification method based on deep neural network | |
CN112861588B (en) | Living body detection method and device | |
CN116311403A (en) | Finger vein recognition method of lightweight convolutional neural network based on FECAGhostNet | |
CN109165551B (en) | Expression recognition method for adaptively weighting and fusing significance structure tensor and LBP characteristics | |
Zabihi et al. | Vessel extraction of conjunctival images using LBPs and ANFIS | |
CN115984919A (en) | Micro-expression recognition method and system | |
CN116030516A (en) | Micro-expression recognition method and device based on multi-task learning and global circular convolution | |
CN104850861A (en) | Fungal keratitis image recognition method based on RX anomaly detection and texture analysis | |
CN115188039A (en) | Depth forgery video technology tracing method based on image frequency domain information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |