CN112200065B - Micro-expression classification method based on action amplification and self-adaptive attention area selection - Google Patents

Micro-expression classification method based on action amplification and self-adaptive attention area selection Download PDF

Info

Publication number
CN112200065B
CN112200065B CN202011070118.3A CN202011070118A CN112200065B CN 112200065 B CN112200065 B CN 112200065B CN 202011070118 A CN202011070118 A CN 202011070118A CN 112200065 B CN112200065 B CN 112200065B
Authority
CN
China
Prior art keywords
frame
apex
micro
attention area
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011070118.3A
Other languages
Chinese (zh)
Other versions
CN112200065A (en
Inventor
柯逍
林艳
王俊强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202011070118.3A priority Critical patent/CN112200065B/en
Publication of CN112200065A publication Critical patent/CN112200065A/en
Application granted granted Critical
Publication of CN112200065B publication Critical patent/CN112200065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Abstract

The invention relates to a micro-expression classification method based on action amplification and self-adaptive attention area selection. Firstly, acquiring a micro expression data set, and extracting a start frame and a peak frame; then inputting the extracted initial frame and peak frame into an action amplification network to generate an image after action amplification; then preprocessing the amplified image; and finally, identifying the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.

Description

Micro-expression classification method based on action amplification and self-adaptive attention area selection
Technical Field
The invention relates to the field of pattern recognition and computer vision, in particular to a micro-expression classification method based on action amplification and self-adaptive attention area selection.
Background
The human body is the highest animal, sometimes disguising or hiding own emotion, in which case, people cannot obtain useful information from the macroscopic expression of the face. To be able to tap useful information from camouflaged facial expressions, ackerman discovered a transient, involuntary, rapid facial emotion, i.e., a micro-expression, that is provoked to involuntarily appear in the face when one tries to hide some kind of real emotion. A standard micro-expression lasts 1/5 to 1/25 seconds and usually appears only in a specific part of the face.
Micro-expressions have great prospects in the aspects of national security, criminal inquiries and medical applications, but the subtlety and conciseness of micro-expressions form great challenges for human eyes, so that in recent years, people put forward a lot of work of realizing automatic identification of micro-expressions by using computer vision and machine learning algorithms.
Disclosure of Invention
The invention aims to provide a micro-expression classification method based on action amplification and self-adaptive attention area selection, which can effectively classify micro-expression images.
In order to achieve the purpose, the technical scheme of the invention is as follows: a micro-expression classification method based on action amplification and self-adaptive attention area selection comprises the following steps:
step S1, acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2, inputting the extracted initial frame and peak frame into an action amplification network to generate an action amplified image;
step S3, preprocessing the amplified image, and dividing a training set and a test set according to an LOSO principle;
and step S4, recognizing the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.
In an embodiment of the present invention, the step S1 specifically includes the following steps:
step S11, acquiring a micro expression data set, and cutting the image into 224 × 224 images after face alignment;
step S12, extracting the initial frame and the peak value frame directly according to the marked content for the micro expression data set with the initial frame and the peak value frame;
step S13, extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked with the initial frame and the peak frame; the frame difference method comprises the following steps: let P be { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture with the first frame of the sequence as the starting frame, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is taken to obtain a difference image Dn, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference Dnavg of the difference image is calculated, wherein the calculation method comprises the following steps:
Figure BDA0002714623300000021
wherein, Dn.shape [0 ]]Representing the height of the differential image Dn, Dn]Representing the width of the difference image Dn, calculating the average inter-frame difference of all frames except the initial frame and the initial frame, and sequencing, wherein the frame with the maximum average inter-frame difference is the peak value frame p corresponding to the image sequence apex
In an embodiment of the present invention, the step S2 specifically includes the following steps:
step S21, designing the encoder to start frame p start Sum peak frame p apex Extracting shape and texture features, wherein the encoder consists of a convolution layer and ResBlock, and if T (-) represents the encoder texture feature extraction module function, T is T (p), wherein T is { T } start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22, designing amplifier pair start frame p start Sum peak frame p apex Amplifying the shape feature, and activating the functional model through convolutional layer and activation function model of neural networkThe operational amplification effect of the pseudo-bandpass filter is enhanced to enhance the signal at a frequency with a large variation intensity and filter the noise at a frequency with a small variation intensity, where G (-) represents a function map formed by k3s1 convolution and the activation function ReLu in the amplifier, and H (-) represents a function map formed by k3s1 convolution and ResBlock in the amplifier, the final amplifier amplification result is:
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
where M (-) denotes the mapping function of the amplifier, α denotes the magnification, s start Representing the shape feature of the start frame, s apex Representing shape features of the peak frames;
step S23, designing a pyramid reconstruction fusion process when a decoder simulates Lagrange method to amplify motion, wherein the decoder part is also a small convolutional neural network, and the input of the neural network is texture characteristic t start|apex And the amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, it is equivalent to the shape feature s that needs to be strengthened start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex And then, after 9 ResBlock, the final output result is obtained by performing up-sampling once and two convolution layers of k3s 1.
In an embodiment of the present invention, the step S3 specifically includes the following steps:
step S31, performing sharpening on the magnified micro expression image to solve the problem of pixel blurring that may exist after the micro expression image is magnified, wherein the calculation method is as follows:
a(i,j)=p(i,j)-k τ2 p(i,j)
wherein k is τ Is the coefficient, k, associated with the diffusion effect τ =1;
Step S32, each timeThe method comprises the steps that a plurality of subjects are arranged under a data set, each subject represents a testee, each subject contains a plurality of micro-expression sequences which represent the micro-expression sequences generated by the testee, one subject of one data set is taken as a test set when the data set is divided according to the principle of leave-one-subject-out, all other subjects are combined together to be used as a training set, and the last data set obtains the subject i A training set and a test set, wherein Sub i Representing the number of subjects in a data set.
In an embodiment of the present invention, the step S4 specifically includes the following steps:
step S41, designing a self-adaptive attention area selection network to classify the input amplified and preprocessed micro-expression images, wherein the self-adaptive attention area selection network comprises three scales of sub-networks, the three scales of sub-networks have the same structure but different parameters, and each scale of sub-network comprises two modules which are respectively a classification module and an attention area selection module;
the classification module is composed of a convolution layer, an activation layer and a pooling layer and is used for extracting features of the input micro-expression image, and the calculation process is as follows:
c(X)=u(w i *X)
wherein X represents a vector representation of the input image, w i Parameters representing the network layer, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
the attention area selection module is composed of two stacked fully-connected layers, and let e (-) denote the mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S42, for each sub-network, passing through a classification module and an attention area selection module, where the input value of the next sub-network is the value of the position of the specific area to be cut after the positioning area is obtained by the last sub-network classification detection, and the specific cutting operation is implemented by a rectangular function, first determining the top left corner and the bottom right corner of the attention area:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
Figure BDA0002714623300000041
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And t is y(tl) <y<t y(br) If N is 1, otherwise N is 0, the picture can be cropped by the mask matrix N, and finally the cropped result is calculated:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Representing the clipped result;
step S43, during training, the parameters of the pre-trained VGGNet initialization classification module are used, the region with the highest response of the last layer of convolution of the classification module is used for initializing the attention region selection module, and finally the two modules are iteratively trained to converge to obtain the final result.
Compared with the prior art, the invention has the following beneficial effects:
1. the micro-expression classification method based on action amplification and self-adaptive attention area selection constructed by the invention can effectively classify micro-expression images and improve the classification effect of the micro-expression images.
2. The method generates the action amplification result between two frames in a convolution neural network mode, and has less noise and edge blurring compared with the traditional action amplification method, more robustness and better performance.
3. Aiming at the problem that local area attention needs to be carried out in a strict alignment face segmentation mode in the traditional micro expression recognition process, the invention provides a self-adaptive attention area discovery method.
Drawings
Fig. 1 is a schematic diagram of the principle of the present invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a micro-expression classification method based on motion amplification and adaptive attention area selection, which specifically includes the following steps:
step S1: acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2: inputting the extracted initial frame and the extracted peak frame into an action amplification network to generate an action amplified image;
step S3: preprocessing the amplified image, and dividing a training set and a testing set according to an LOSO principle;
step S4: and identifying the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result.
In this embodiment, the step S1 includes the following steps:
step S11: acquiring a micro expression data set, aligning the face, and uniformly cutting the face into 224 × 224 sizes;
step S12: for a micro expression data set with initial frame and peak frame labels, extracting the initial frame and the peak frame directly according to the labeled contents;
step S13: extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked by the initial frame and the peak frame;
step S14: the specific content of the frame difference method is that P is { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture, let us start the first frame of the sequence, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is obtained, a difference image Dn is obtained, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference D of the difference image is calculatednavg, calculated as follows:
Figure BDA0002714623300000061
wherein, Dn.shape [0 ]]Shape [1 ] represents the height of the difference image Dn]Indicating the width of the differential image Dn. Calculating the average interframe difference of all frames except the initial frame and sequencing, wherein the frame with the maximum average interframe difference is the peak value frame p corresponding to the image sequence apex
In this embodiment, step S2 specifically includes the following steps:
step S21: designing the encoder to input the start frame p start Sum peak frame p apex Extracting shape and texture features, wherein the encoder mainly comprises a convolution layer and ResBlock, and T (·) represents the texture feature extraction module function of the encoder, and T ═ T (p), where T ═ T · start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22: designing amplifier pair start frame p start Sum peak frame p apex The shape feature is amplified, mainly simulating the action amplification effect of a band-pass filter through the convolution layer of a neural network and an activation function, strengthening signals on frequencies with large variation intensity and filtering noise on frequencies with small variation intensity, wherein G (-) represents a function mapping formed by k3s1 convolution and an activation function ReLu in the amplifier, H (-) represents a function mapping formed by k3s1 convolution and ResBlock in the amplifier, and the final result obtained by amplifying the amplifier is that
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
Where M (-) represents the mapping function of the amplifier and α represents the magnification;
step S23: designing a pyramid reconstruction fusion process when a decoder simulates a Lagrange method to carry out motion amplification, wherein the decoder part is also oneA small convolutional neural network whose input is the texture feature t start|apex And the roughly amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, it is equivalent to the shape feature s that needs to be strengthened start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex Therefore, noise which may be introduced is suppressed, then, after 9 ResBlock, one-time up-sampling and two k3s1 convolutional layers are carried out to obtain a final output result, the ResBlock can perfectly solve the problem of gradient dispersion, so that the network can be well propagated backwards, and in addition, a result of action amplification between two frames is generated in a neural network mode, compared with the result of noise and edge blurring generated by an action amplification method of a traditional method, the amplification effect is more robust, and the performance is better;
in this embodiment, step S3 specifically includes the following steps:
step S31: the micro expression image after being amplified is sharpened, so that the problem of pixel blurring possibly existing after the micro expression image is amplified is solved, and the calculation mode is as follows:
a(i,j)=p(i,j)-k τ2 p(i,j)
wherein k is τ Is a coefficient related to the diffusion effect. The coefficient should be reasonable, if k τ Too large, the image contour will produce overshoot; otherwise if k τ When the size is too small, the sharpening effect is not obvious, and the value k is taken in the algorithm τ =1。
Step S32: according to the principle of leave-one-leave-out, when dividing the data set, one leave of one data set is taken as a test set at a time, and all the other leave are combined together to be taken as a training set, so that finally, the data set is subjected to one data setThe Sub can be obtained by the collection i A training set and a test set, Sub i Representing the number of subjects in a data set.
In this embodiment, step S4 specifically includes the following steps:
step S41: the self-adaptive attention area selection network is designed to classify the input amplified and preprocessed micro expression images, and mainly comprises three-scale sub-networks, wherein the three-scale sub-networks have the same structure but different parameters, and each scale sub-network comprises two modules which are respectively a classification module and an attention area selection module;
step S42: the classification module mainly comprises a plurality of convolution layers, an activation layer and a pooling layer and is used for extracting the characteristics of the input micro-expression image, and the calculation process is as follows
c(X)=u(w i *X)
Wherein X represents a vector representation of the input image, w i Parameters representing some network layers, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
step S43: the attention area selection module mainly comprises two stacked fully-connected layers, wherein e (-) represents a mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S44: for each sub-network, the sub-network passes through a classification module and an attention area selection module, the input value of the next sub-network is the value of the position of a specific area cut after the positioning area is obtained by classification detection of the previous sub-network, the specific cutting operation is realized by a rectangular function, and firstly, the upper left corner and the lower right corner of the attention area are determined:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
Figure BDA0002714623300000081
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And t is y(tl) <y<t y(br) When N is equal to 1, otherwise N is equal to 0, and the picture can be cropped by the mask matrix N. And finally, calculating a clipped result:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Showing the clipped result. The advantage of using a rectangular function is that: the method has the advantages that the method can have a function equivalent to direct clipping, and in the optimization process, the rectangular function can be propagated reversely so as to optimize the parameters of the rectangular frame;
step S45: during training, parameters of a classification module are initialized by using pre-trained VGGNet, an attention region selection module is initialized by using a region with the highest response of the last layer of convolution of the classification module, and finally, the two modules are trained to converge iteratively to obtain a final result. By utilizing the self-adaptive attention area finding method, the area position finally determining the difference of the micro expressions is positioned from top to bottom through analyzing the same expression on different scales, so that the problem that the key part is difficult to identify and position is solved to a certain extent, and the classification effect is improved.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (3)

1. A micro-expression classification method based on action amplification and self-adaptive attention area selection is characterized by comprising the following steps:
step S1, acquiring a micro expression data set, and extracting a start frame and a peak frame;
step S2, inputting the extracted initial frame and peak frame into an action amplification network to generate an action amplified image;
step S3, preprocessing the amplified image, and dividing a training set and a test set according to the principle of leave-one-subject-out;
step S4, recognizing the preprocessed image by using a self-adaptive attention area selection method to obtain a final classification result;
the step S2 specifically includes the following steps:
step S21, designing the encoder to start frame p start Sum peak frame p apex Extracting shape and texture features, where the encoder consists of convolution layer and ResBlock, and T (-) represents the function of the texture feature extracting module of the encoder, and T is T (p), where T is { T } T (p) start|apex Denotes texture features of the input frame, let S (-) denote the encoder shape feature extraction module function, then S (p), where S ═ S start|apex Represents the shape characteristics of the input frame;
step S22, designing amplifier pair start frame p start Sum peak frame p apex Amplifying the shape characteristics, simulating the operation amplification effect of a band-pass filter through the convolution layer of the neural network and an activation function, strengthening the signals on the frequency with large variation intensity, and filtering the noise on the frequency with small variation intensity, wherein G (-) represents a function map formed by k3s1 convolution and an activation function ReLu in the amplifier, H (-) represents a function map formed by k3s1 convolution and ResBlock in the amplifier, and the final amplifier amplification result is as follows:
M(s start ,s apex ,α)=s start +H(α·G(s apex -s start ))
where M (-) denotes the mapping function of the amplifier, α denotes the magnification, s start Representing the shape feature of the start frame, s apex Representing shape features of the peak frames;
step S23, designing a pyramid reconstruction fusion process when a decoder simulates Lagrange method to amplify motion, wherein the decoder part is also a small convolutional neural network, and the input of the neural network is texture characteristic t start|apex And the amplified shape feature M(s) obtained by the amplifier start ,s apex α), in which a texture feature t is first aligned start|apex Upsampling and connecting shape features M(s) start ,s apex Alpha) and texture features t start|apex Here, the shape feature s to be strengthened is start|apex Superimposing the texture features t without magnification back after magnification by alpha start|apex Therefore, noise which may be introduced is suppressed, and then after 9 ResBlock, the upsampling and two k3s1 convolutional layers are carried out again to obtain a final output result;
the step S4 specifically includes the following steps:
step S41, designing a self-adaptive attention area selection network to classify the input amplified and preprocessed micro-expression images, wherein the self-adaptive attention area selection network comprises three scales of sub-networks, the three scales of sub-networks have the same structure but different parameters, and each scale of sub-network comprises two modules which are a classification module and an attention area selection module respectively;
the classification module is composed of a convolution layer, an activation layer and a pooling layer and is used for extracting features of the input micro-expression image, and the calculation process is as follows:
c(X)=u(w i *X)
wherein X represents a vector representation of the input image, w i Parameters representing the network layer, w i X is the last extracted feature, and the u (-) function represents the last full connection layer and the softmax layer and is used for obtaining the probability result of the category corresponding to the last feature;
the attention area selection module is composed of two stacked fully-connected layers, and let e (-) denote the mapping function of the attention area selection module, and the calculation process is as follows:
[l x ,l y ,l half ]=e(w i *X)
wherein l x And l y Area center point coordinates, l, representing the selection result of the attention area selection module half Representing half of the side length of the attention selection area;
step S42, for each sub-network, passing through a classification module and an attention area selection module, where the input value of the next sub-network is the value of the position of the specific area to be cut after the positioning area is obtained by the last sub-network classification detection, and the specific cutting operation is implemented by a rectangular function, first determining the top left corner and the bottom right corner of the attention area:
l x(tl) =l x -l half
l y(tl) =l y -l half
l x(br) =l x +l half
l y(br) =l y +l half
wherein l x(tl) Horizontal axis representing the coordinates of the upper left corner,/ y(tl) Vertical axis representing the coordinates of the upper left corner,/ x(br) Horizontal axis, l, representing the coordinates of the lower right corner y(br) A vertical axis representing the coordinates of the lower right corner;
then, a mask N (-) of the attention area is selected, and the calculation process is as follows:
N(·)=[v(x-l x(tl) )-v(x-l x(br) )]·[v(y-l y(tl) )-v(y-l y(br) )]
Figure FDA0003673742110000031
where N (·) is a two-dimensional square pulse function, where k is a very large positive number, resulting in the value of v (x) to be determined only by the positive and negative of x, y being the horizontal and vertical coordinates of the current image, x > 0, v (x) 1; x < 0, v (x) 0; if and only if x(tl) <x<l x(br) And l y(tl) <y<l y(br) If N is 1, otherwise N is 0, the picture can be cropped by the mask matrix N, and finally the cropped result is calculated:
X att =XΘN(l x ,l y ,l half )
where Θ denotes the element-by-element multiplication, X att Representing the clipped result;
step S43, during training, the parameters of the pre-trained VGGNet initialization classification module are used, the region with the highest response of the last layer of convolution of the classification module is used for initializing the attention region selection module, and finally the two modules are iteratively trained to converge to obtain the final result.
2. The micro-expression classification method based on motion amplification and adaptive attention area selection as claimed in claim 1, wherein the step S1 specifically comprises the following steps:
step S11, acquiring a micro expression data set, and cutting the image into 224 × 224 images after face alignment;
step S12, extracting the initial frame and the peak value frame directly according to the marked content for the micro expression data set with the initial frame and the peak value frame;
step S13, extracting the initial frame and the peak frame of the video sequence by using a frame difference method for the micro expression data set which is not marked with the initial frame and the peak frame; by frame differencingThe method comprises the following steps: let P be { P ═ P i Denotes an input image sequence, where p is 1,2 i Representing the ith input picture with the first frame of the sequence as the starting frame, i.e. p start =p 1 The gray values of pixels corresponding to the first frame and the nth frame of the video sequence are recorded as f1(x, y) and fn (x, y), the gray values of the pixels corresponding to the two frames of images are subtracted, the absolute value of the subtraction is taken to obtain a difference image Dn, Dn (x, y) ═ fn (x, y) -f1(x, y) |, and the average inter-frame difference Dnavg of the difference image is calculated, wherein the calculation method comprises the following steps:
Figure FDA0003673742110000032
wherein, Dn.shape [0 ]]Shape [1 ] represents the height of the difference image Dn]Representing the width of the difference image Dn, calculating the average inter-frame difference of all frames except the initial frame and the initial frame, and sequencing, wherein the frame with the maximum average inter-frame difference is the peak value frame p corresponding to the image sequence apex
3. The micro-expression classification method based on action amplification and adaptive attention area selection according to claim 1, wherein the step S3 specifically comprises the following steps:
step S31, carrying out sharpening processing on the amplified micro expression image, wherein the calculation mode is as follows:
Figure FDA0003673742110000041
wherein k is τ Is the coefficient, k, associated with the diffusion effect τ =1;
Step S32, under each data set, there are multiple subjects, each subject represents a testee, and each subject contains multiple micro-expression sequences, which represent the multiple micro-expression sequences generated by the testee, according to the principle of leave-one-subject-out, when dividing the data set, one subject of one data set is taken as the test set, and all the other subjects are combinedAnd taken together as a training set, and the last data set is taken as Sub i A training set and a test set, wherein Sub i Representing the number of subjects in a data set.
CN202011070118.3A 2020-10-09 2020-10-09 Micro-expression classification method based on action amplification and self-adaptive attention area selection Active CN112200065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011070118.3A CN112200065B (en) 2020-10-09 2020-10-09 Micro-expression classification method based on action amplification and self-adaptive attention area selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011070118.3A CN112200065B (en) 2020-10-09 2020-10-09 Micro-expression classification method based on action amplification and self-adaptive attention area selection

Publications (2)

Publication Number Publication Date
CN112200065A CN112200065A (en) 2021-01-08
CN112200065B true CN112200065B (en) 2022-08-09

Family

ID=74013087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011070118.3A Active CN112200065B (en) 2020-10-09 2020-10-09 Micro-expression classification method based on action amplification and self-adaptive attention area selection

Country Status (1)

Country Link
CN (1) CN112200065B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950922B (en) * 2021-01-26 2022-06-10 浙江得图网络有限公司 Fixed-point returning method for sharing electric vehicle
CN115049957A (en) * 2022-05-31 2022-09-13 东南大学 Micro-expression identification method and device based on contrast amplification network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal
CN110175596A (en) * 2019-06-04 2019-08-27 重庆邮电大学 The micro- Expression Recognition of collaborative virtual learning environment and exchange method based on double-current convolutional neural networks
CN110287805A (en) * 2019-05-31 2019-09-27 东南大学 Micro- expression recognition method and system based on three stream convolutional neural networks
CN110516571A (en) * 2019-08-16 2019-11-29 东南大学 Inter-library micro- expression recognition method and device based on light stream attention neural network
CN110580461A (en) * 2019-08-29 2019-12-17 桂林电子科技大学 Facial expression recognition algorithm combined with multilevel convolution characteristic pyramid

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11361225B2 (en) * 2018-12-18 2022-06-14 Microsoft Technology Licensing, Llc Neural network architecture for attention based efficient model adaptation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409222A (en) * 2018-09-20 2019-03-01 中国地质大学(武汉) A kind of multi-angle of view facial expression recognizing method based on mobile terminal
CN110287805A (en) * 2019-05-31 2019-09-27 东南大学 Micro- expression recognition method and system based on three stream convolutional neural networks
CN110175596A (en) * 2019-06-04 2019-08-27 重庆邮电大学 The micro- Expression Recognition of collaborative virtual learning environment and exchange method based on double-current convolutional neural networks
CN110516571A (en) * 2019-08-16 2019-11-29 东南大学 Inter-library micro- expression recognition method and device based on light stream attention neural network
CN110580461A (en) * 2019-08-29 2019-12-17 桂林电子科技大学 Facial expression recognition algorithm combined with multilevel convolution characteristic pyramid

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Facial Micro-expression Recognition with Adaptive Video Motion Magnification;Zhilin Lei et al.;《International Conference in Communications, Signal Processing, and Systems》;20200404;第2107-2116页 *
基于深度可分卷积神经网络的实时人脸表情和性别分类;刘尚旺 等;《计算机应用》;20200410(第04期);第990-995页 *
基于深度学习的表情识别算法研究;夏添;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑(月刊)》;20200615(第06期);第1-82页 *
基于特征融合的注意力双线性池细粒度表情识别;刘力源 等;《鲁东大学学报(自然科学版)》;20200430;第36卷(第02期);第130-136页 *

Also Published As

Publication number Publication date
CN112200065A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN103116763B (en) A kind of living body faces detection method based on hsv color Spatial Statistical Character
CN111445410B (en) Texture enhancement method, device and equipment based on texture image and storage medium
CN107729820B (en) Finger vein identification method based on multi-scale HOG
CN111967427A (en) Fake face video identification method, system and readable storage medium
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
CN111667400B (en) Human face contour feature stylization generation method based on unsupervised learning
CN112200065B (en) Micro-expression classification method based on action amplification and self-adaptive attention area selection
CN111967363B (en) Emotion prediction method based on micro-expression recognition and eye movement tracking
CN113537008B (en) Micro expression recognition method based on self-adaptive motion amplification and convolutional neural network
CN111476178A (en) Micro-expression recognition method based on 2D-3D CNN
CN111476727B (en) Video motion enhancement method for face-changing video detection
CN111178130A (en) Face recognition method, system and readable storage medium based on deep learning
CN113822157A (en) Mask wearing face recognition method based on multi-branch network and image restoration
CN107506713A (en) Living body faces detection method and storage device
CN112396036A (en) Method for re-identifying blocked pedestrians by combining space transformation network and multi-scale feature extraction
CN112183419A (en) Micro-expression classification method based on optical flow generation network and reordering
CN109522865A (en) A kind of characteristic weighing fusion face identification method based on deep neural network
CN112861588B (en) Living body detection method and device
CN116311403A (en) Finger vein recognition method of lightweight convolutional neural network based on FECAGhostNet
CN109165551B (en) Expression recognition method for adaptively weighting and fusing significance structure tensor and LBP characteristics
Zabihi et al. Vessel extraction of conjunctival images using LBPs and ANFIS
CN115984919A (en) Micro-expression recognition method and system
CN116030516A (en) Micro-expression recognition method and device based on multi-task learning and global circular convolution
CN104850861A (en) Fungal keratitis image recognition method based on RX anomaly detection and texture analysis
CN115188039A (en) Depth forgery video technology tracing method based on image frequency domain information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant