CN112183213B - Facial expression recognition method based on Intril-Class Gap GAN - Google Patents

Facial expression recognition method based on Intril-Class Gap GAN Download PDF

Info

Publication number
CN112183213B
CN112183213B CN202010905875.1A CN202010905875A CN112183213B CN 112183213 B CN112183213 B CN 112183213B CN 202010905875 A CN202010905875 A CN 202010905875A CN 112183213 B CN112183213 B CN 112183213B
Authority
CN
China
Prior art keywords
output
image
facial expression
convolution
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010905875.1A
Other languages
Chinese (zh)
Other versions
CN112183213A (en
Inventor
刘韵婷
陈亮
吴攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Ligong University
Original Assignee
Shenyang Ligong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Ligong University filed Critical Shenyang Ligong University
Publication of CN112183213A publication Critical patent/CN112183213A/en
Application granted granted Critical
Publication of CN112183213B publication Critical patent/CN112183213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

A facial expression recognition method based on Intril-Class Gap GAN, the recognition model is constructed, comprising the following steps: (1) Collecting real-time images of different sources and different expressions of the human face; (2) Inputting the image into an Intril-Class Gap GAN neural network model for recognition; (3) outputting a recognition result; according to the facial expression recognition method based on the generated countermeasure, the facial expression features are automatically extracted by comparing with the traditional method for manually extracting the expression features, and compared with the facial expression recognition of a neural network at a slightly early stage, the facial expression recognition method based on the generated countermeasure has the advantages that the recognition rate is improved, and therefore the expression recognition is accurately carried out.

Description

Facial expression recognition method based on Intril-Class Gap GAN
Technical Field
The invention relates to the field of facial expression recognition of image processing and deep learning, in particular to a facial expression recognition method based on generation countermeasure.
Background
The huge floating population in China forms huge pressure on urban infrastructure and public service, severe injury accident frequently occurs in recent years, security and protection situations are focused, urban management and service systems are seriously lagged, perfection is urgently needed, and urban monitoring is enhanced and facial expression recognition on lawbreakers becomes important. The expression is an emotional state represented by facial muscle changes, and by identifying facial emotional expression of a person, abnormal psychological states can be judged, extreme emotions can be presumed, facial expression of the person in a complex environment can be observed, technical support is provided for further judging the psychology of the person, and the person who is more suspicious can be roughly judged, so that certain criminal activities can be prevented in time. The traditional facial expression recognition is mainly based on a facial expression recognition method of template matching and a neural network. Moreover, the traditional facial expression needs human intervention in the feature selection process, the feature extraction algorithm is finely designed by means of manpower, and the traditional facial expression lacks sufficient computing power, is high in training difficulty and low in accuracy, and is easy to lose original expression information.
Disclosure of Invention
The invention aims to:
according to the proposed intra-class gap existing in facial expression recognition under the real condition, aiming at the technical problems that the difficulty is high in complex environment security check and the requirement of the facial expression recognition rate cannot be met due to the intra-class gap, the facial expression recognition method based on the generated countermeasure is provided.
The technical scheme is as follows:
an facial expression recognition method based on Intril-Class Gap GAN,
the identification model construction comprises the following steps:
(1) Collecting real-time images of different sources and different expressions of the human face;
(2) Inputting the image into an Intril-Class Gap GAN neural network model for recognition;
(3) Outputting a recognition result;
the method for constructing the Intril-Class Gap GAN neural network model in the step (2) comprises the following steps:
(2.1) collecting historical images of different sources and different expressions of the human face;
(2.2) preprocessing the collected face image to construct a face expression data set;
(2.3) constructing an Intril-Class Gap GAN neural network model aiming at the facial expression recognition problem that the data set in the step (2.2) has intra-Class gaps;
(2.4) training the generator and discriminator of the network simultaneously in combination with the pixel differences and the potential vector differences between the input image and the reconstructed image to ensure that the differences between the reconstructed image and the input image are minimal.
The facial expression data set construction method in the step (2.2) comprises the following steps:
s11: based on Multi-PIE and JAFFE expression data sets, facial expression pictures are downloaded on the network through the step (2.1), the facial expression data sets required by homemade are carried out, abomination, happy, neutral, anxious and surprise and fear facial expressions of people with different countries, different age groups, different professions and the like are selected for experiments, a large number of facial expression characteristics with intra-class gaps are increased in complexity and used as input images x of network training
S12: geometric normalization processing the input image, and performing face detection on the normalized image;
s13: the scale normalizes the image after the processing in step S12, unifying the size of the image.
The specific steps in the step (2.4) are as follows:
s14: training a facial expression recognition network model based on an IC-GAN (intra-Class Gap GAN) neural network generating a countermeasure based on the image processed in step S13;
s15: carrying out data enhancement and data expansion processing on the image;
s16: and training the network model and storing the trained network model.
The step S12 includes the following steps:
s121: aiming at the collected image, calibrating characteristic points [ x, y ], calibrating the characteristic points of the two eyes and the nose to obtain coordinate values of the characteristic points;
s122: rotating the image according to the coordinates of eyes on the face to ensure the consistency of the face direction, wherein the distance between the eyes of the person is d, and the middle point of the two eyes is O;
s123: and determining a square frame containing the human face according to the calibrated characteristic points and the geometric model, and respectively cutting d distances from O to left and right, wherein the distances are respectively 0.5d and 1.5d in the up-down direction.
The step S13 includes the following steps:
s131: and (3) carrying out scale normalization on the picture cut in the step (S123), unifying the images to 256 pixels by 256 pixels, and completing geometric normalization on the images.
The step S14 includes the following steps:
s141: constructing a proposed integrated circuit-gate (IC-Class Gap GAN) neural network by using a pytorch deep learning framework, firstly inputting the picture processed in the step S13 into a first layer of convolution layer to perform convolution operation, and performing convolution on an input image through convolution check of 4*4 to output 128×128×64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, wherein the output is 128 x 64; the LeakyReLu activation function is:
a i is represented by the general formula (1), ++ infinity) interval is a fixed parameter of (2);
s142: the convolution operation is carried out on the output of the upper layer by using the convolution kernel of 4*4, the output is 64 x 128, the normalization operation is carried out on the output of the upper layer by adopting the batch norm layer, the nonlinear operation is carried out on the convolution by adopting the LeakyReLu activation function, and the output is 64 x 128
S143: continuing to convolve, batch norm and LeakyReLu operation on the output of the upper layer by using the method of the step S142, wherein the output is 4 x 100;
s144: performing inverse convolution operation of a convolution kernel 4*4 on the output of the S143, wherein the output is 29 x 1, performing batch normalization operation by using a batch norm, performing nonlinear operation on the output by adopting a ReLu activation function, and outputting the output as 32 x 128; the ReLu activation function is:
s145: the convolution, the bachnorm and the ReLu operations described in step S144 are performed again on the output of the previous layer, the output is 64 x 64 the method comprises the steps of carrying out a first treatment on the surface of the;
s146: the output of the upper layer is subjected to nonlinear operation by using a ReLu activation function, and then is subjected to convolution operation by using inverse convolution with a convolution kernel of 4*4, then nonlinear operation is performed using the Tanh activation function, the output is 128 x 128; the Tanh activation function is as follows:
s147: the output of the upper layer is subjected to operations in the S141-S143 again, and the output is 1 x 5;
s148: the image after scale normalization in the step S13 and the output of the step S147 are input into a convolution layer of 4*4 together for convolution operation, then nonlinear activation is carried out by using a nonlinear activation function LeakyReLu, and the output is 128×128×64;
s149: performing convolution operation on the output of the upper layer by using a convolution check of 4*4, performing batch normalization operation by using a batch norm, and performing nonlinear activation by using a LeakyReLu;
s1491: continuing to convolve the output of the upper layer by adopting the process of S142, wherein the output of the non-linear operation is 4 x 1;
s1492: finally, softmax is adopted for the output of the upper layer, and the probability of true judgment is output;
s1493: and (3) performing full-connection operation on the output of the S147 process, and finally training 5 expressions through a Softmax classifier, wherein the 5 expressions are 1=happy, 2=abnomination, 3=neutral, 4=analog and 5= surprise and fear, so that facial expression recognition is realized.
Step S15 includes
S151: the network loss function is also divided into four parts, and for the generation network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:
L con =E x~pX ||x-G(x)|| 1
pX represents data allocation; x is the input image G (X) is the image generated by the generator in the network;
the feature matching method proposed by Salimans et al is used for reducing the instability of training, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is as follows:
L adv =E x~pX ||f(x)-f(G(x))|| 2
wherein f (·) represents a discriminator model transformation;
the third part is a potential vector z and a reconstructed potential vectorThe encoding loss of the facial expression information of the picture is prevented, and the interference with the picture independence information in the network decoding process is prevented:
wherein h (■) represents a transcoding;
the network loss of the fourth part is the cross entropy loss of Softmax layer:
where k (■) represents the cross entropy loss process of Softmax and k (y) represents the trueAs a result of this, the processing time,representing the recognition result;
the overall network loss function is as follows:
L=ω adv L advcon L conp L ps L s
wherein omega adv ,ω con ,ω p ,ω s Is a parameter to regulate losses;
s152: the Optimizer selects Adam Optimizer, the learning rate is set to be 0.0002, training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to be 100, 200, 300 and 400 respectively;
s153: in each training, 1 picture of epoch is obtained first, then loss value of loss is calculated, and then parameters of the network are updated continuously by using an Adam optimizer to minimize loss value of the network.
In the step (3), inputting the picture into a trained IC-GAN network model for recognition, and finally outputting the probability of each type of facial expression, wherein the expression class with the highest output probability is the classification result of the user; the probability calculation formula is as follows:
wherein z is i Representing an ith output of the network; omega ij The j-th weight of the i-th neuron, b is the bias; s is S i Representing the output of the ith neuron, y i Representative is the ith output value of Softmax.
The advantages and effects are that:
the invention designs a facial expression recognition method based on generating countermeasure, which comprises a network training process and an off-line recognition process of facial expression recognition with intra-class gaps; the off-line identification process should include the following steps:
s11: downloading through a network, analyzing video by frame skipping, and collecting an input image x;
s12: geometrically normalizing the input image x, and detecting a normalized image x';
s13: processing the detected and cut image x' to a uniform size;
s14: constructing a network model based on facial expression recognition of the generated countermeasure;
s15: performing data enhancement and data expansion processing on the image x' and unifying the image size;
s16: training the network model and storing the trained network model;
for the identification process, the following steps should be included:
s21: downloading through a network, analyzing video by frame skipping, and collecting an input image I;
s22: then the input image I is input into the trained network model;
s23: and obtaining a recognition result.
The following steps should also be included for the step S12:
s121: performing geometric normalization processing on the input image; the geometric normalization method comprises scale normalization, outer head correction and face twisting correction;
s122: performing face detection on the geometrically normalized image by using a face detection method in an OpenCV open source library, and then performing noise reduction treatment on the detected image;
s23: the geometrically normalized image x' is obtained.
The step S13 should further include:
s131: determining the position of an image according to the coordinates of the face;
s132: obtaining a face image by using OpenCV detection;
s133: and adjusting the face image after clipping to be uniform size, and changing the face image after clipping to be 256 x 256 size.
Still further, step S14 should further comprise: s141: constructing an IC-GAN neural network by using a pytorch deep learning framework, firstly inputting a picture into a con_1 layer for convolution operation, and carrying out convolution on an input image through convolution check of 4*4 to output 128×128×64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, wherein the output is 128 x 64; the LeakyReLu activation function is:
a i is represented by the general formula (1), ++ infinity) interval is a fixed parameter of (2);
s142: the convolution operation is carried out on the output of the upper layer by using the convolution kernel of 4*4, the output is 64×64×128, then the normalization operation is carried out on the output of the upper layer by adopting the batch norm layer, and then the nonlinear operation is carried out on the convolution by adopting the LeakyReLu activation function, and the output is 64×64×128;
s143: the output of the upper layer is continuously subjected to convolution, batch norm and LeakyReLu operation by using the method of S142, and the output is 4 x 100;
s144: performing inverse convolution operation of a convolution kernel 4*4 on the output of the S143, wherein the output is 29 x 1, performing batch normalization operation by using a batch norm, performing nonlinear operation on the output by adopting a ReLu activation function, and outputting the output as 32 x 128;
s145: the outputs of the previous layers are again subjected to the convolution, the bachnorm and the ReLu operations described in S144, the output is 64 x 64 the method comprises the steps of carrying out a first treatment on the surface of the;
s146: the output of the upper layer is subjected to nonlinear operation by using a ReLu activation function, and then is subjected to convolution operation by using inverse convolution with a convolution kernel of 4*4, then nonlinear operation is performed using the Tanh activation function, the output is 128 x 128;
s147: the output of the upper layer is subjected to operations in the S141-S143 again, and the output is 1 x 5;
s148: the original image and the output of S147 are input into a convolution layer of 4*4 together for convolution operation, then nonlinear activation is carried out by using a nonlinear activation function LeakyReLu, and the output is 128 x 64;
s149: performing convolution operation on the output of the upper layer by using a convolution check of 4*4, performing batch normalization operation by using a batch norm, and performing nonlinear activation by using a LeakyReLu;
s1491: continuing to convolve the output of the upper layer by adopting the process of S150, wherein the output of the non-linear operation is 4 x 1;
s1492: and finally, adopting Softmax for the output of the upper layer, and outputting the probability of judging true.
S1493: carrying out full connection operation on the output of the S147 process, and finally training 5 expressions through a Softmax classifier, wherein the 5 expressions are 1=happy, 2=abnomination, 3=neutral, 4=analog and 5= surprise and fear, so that facial expression recognition is realized;
step S15 should also include: s151: according to the network structure and experimental characteristics, the network loss is also divided into four parts, and for the generated network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:
L con =E x~pX ||x-G(x)|| 1
the feature matching method proposed by Salimans et al is used for reducing the instability of training, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is as follows:
L adv =E x~pX ||f(x)-f(G(x))|| 2
where f (·) represents the discriminator model transformation.
The third part is a potential vector z and a reconstructed potential vectorThe encoding loss of the facial expression information of the picture is prevented, and the interference with the picture independence information in the network decoding process is prevented:
where h (·) represents the transcoding.
The network loss of the fourth part is the cross entropy loss of Softmax layer:
where k (·) represents the cross entropy loss process of Softmax, k (y) represents the true result,representing the recognition result.
The overall network loss function is as follows:
L=ω adv L advcon L conp L ps L s
wherein omega adv ,ω con ,ω p ,ω s Is a parameter to regulate losses.
S152: the Optimizer selects Adam Optimizer, the learning rate is set to 0.0002, the training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to 100, 200, 300 and 400 respectively.
S153: in each training, 1 picture of epoch is obtained first, then loss value of loss is calculated, and then parameters of the network are updated continuously by using an Adam optimizer to minimize loss value of the network.
Still further, the step S16 should further include: s161: downloading through a network, frame skipping, analyzing the video and collecting an input image;
s162: performing geometric normalization processing, face detection, opencv processing and unified size on the input image;
s163: and inputting the processed image into a trained IC-GAN network model for recognition, and finally outputting the probability of each expression, wherein the expression with the highest probability is used as the expression which is wanted to be recognized by the network.
Compared with the prior art, the invention has the advantages that:
according to the facial expression recognition method based on the generated countermeasure, the facial expression features are automatically extracted by comparing with the traditional method for manually extracting the expression features, and compared with the facial expression recognition of a neural network at a slightly early stage, the facial expression recognition method based on the generated countermeasure has the advantages that the recognition rate is improved, and therefore the expression recognition is accurately carried out.
Drawings
For a clearer description of embodiments of the present invention or of the prior art, the following will briefly describe all the drawings that are essential to the description of the embodiments or of the prior art, so that the following drawings are some embodiments of the present invention, and for other researchers in this field, other drawings can be obtained from these drawings.
FIG. 1 is a flow chart of the overall process of the present invention.
FIG. 2 is a schematic diagram of an IC-GAN network model according to the present invention
Detailed Description
An facial expression recognition method based on Intril-Class Gap GAN,
the identification model construction comprises the following steps:
(1) Collecting real-time images of different sources and different expressions of the human face;
(2) Inputting the image into an Intril-Class Gap GAN neural network model for recognition;
(3) Outputting a recognition result;
the method for constructing the Intril-Class Gap GAN neural network model in the step (2) comprises the following steps:
(2.1) collecting historical images of different sources and different expressions of the human face;
(2.2) preprocessing the collected face image to construct a face expression data set;
(2.3) aiming at the problem of facial expression recognition that the data set in the step (2.2) has intra-Class gaps (the gaps of the same expression are called intra-Class gaps or the similar expressions have different expression forms, the intra-Class gaps are larger, the acquired images can be influenced by the shooting angles of external environment shielding objects and the like, the laughing expressions can be mistakenly recognized into other types of expressions and the similar expressions due to the reasons, but the characteristic differences are particularly and finally influenced by the recognition accuracy due to the complicated surrounding environment and the like, the similar expressions are smiles and the like, and the situation that the misrecognition into other types of expressions is caused by the influence of the shooting angles of the external environment shielding objects and the like is caused, and an Intral-Class Gap GAN neural network model is constructed;
(2.4) simultaneously training the generator and the discriminator of the network in combination with the pixel differences and the differences of the potential vectors between the input image (training sample input during network training) and the reconstructed image (the image generated during training is used for matching with the original image, and when the reconstructed image is generated, that is, the reconstructed image is not different from the input image, the network is considered to be trained so that the image features can be extracted correctly), so as to ensure that the difference between the reconstructed image and the input image is minimum. (when the network is trained by comparing the original input picture with the network generated picture and the network generated picture is consistent with the input picture, the network is most powerful in identifying the time when the network is trained
The facial expression data set construction method in the step (2.2) comprises the following steps:
s11: based on Multi-PIE and JAFFE expression data sets, facial expression images are downloaded on a network through a (2.1) step, a facial expression data set (sample expansion) required by self-made text is carried out, abomination, happy, neutral, anxious and surprise and fear facial expressions of people with different countries, different age groups, different professions and the like are selected for experiments, a large number of intra-class gaps (the intra-class gaps can be larger, and generally only the intra-class gaps are large), the basic difference is recognized, the same expression form of one expression comprises the same expression (such as smile and smile) and the like, the form presented under the same background environment is called intra-class, the same smile as a person is presented under the same background environment, the condition that the intra-class gap exists is not met, or the intra-class gap is large is not met, for example, the background is different, the expression is different or the same person is different, as long as one person is met, the condition that the intra-class gap belongs to the intra-class gap or the large intra-class gap is met is basically, the difference is met, the form of the difference is large intra-class gap is known as the difference is used as the training data of the complex image data of the facial expression
S12: geometric normalization processing the input image, and performing face detection on the normalized image (obtaining a suitable face image, as described in claim 3, by processing to obtain sample data suitable for network training, for example, rotation may be required to ensure consistency of face directions, etc.);
s13: scale normalization the image after the processing in step S12 unifies the sizes of the images (S12 and S13 are preprocessing processes).
The specific steps in the step (2.4) are as follows:
s14: training a facial expression recognition network model based on an IC-GAN (intra-Class Gap GAN) neural network generating a countermeasure based on the image processed in step S13;
s15: carrying out data enhancement and data expansion processing on the image;
s16: and training the network model and storing the trained network model.
The step S12 includes the following steps:
s121: aiming at the collected image, calibrating characteristic points [ x, y ], calibrating the characteristic points of the two eyes and the nose to obtain coordinate values of the characteristic points;
s122: rotating the image according to the coordinates of eyes on the face to ensure the consistency of the face direction (the rotation invariance of the face in the image plane is reflected in the process of preprocessing the face image), wherein the distance of the eyes of the person is d, and the midpoint of the two eyes is O;
s123: and determining a square frame containing the human face according to the calibrated characteristic points and the geometric model, and respectively cutting d distances from O to left and right, wherein the distances are respectively 0.5d and 1.5d in the up-down direction.
The step S13 includes the following steps:
s131: and (3) carrying out scale normalization on the picture cut in the step (S123), unifying the images to 256 pixels by 256 pixels, and completing geometric normalization on the images.
The step S14 includes the following steps:
s141: constructing a proposed integrated circuit-gate (IC-Class Gap GAN) neural network by using a pytorch deep learning framework, firstly inputting the picture processed in the step S13 into a first layer of convolution layer to perform convolution operation, and performing convolution on an input image through convolution check of 4*4 to output 128×128×64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, wherein the output is 128 x 64; the LeakyReLu activation function is:
a i is represented by the general formula (1), + -infinity) interval.
S142: the output of the upper layer (the first layer of convolution layer) is continuously convolved by using the convolution kernel of 4*4, the output is 64×64×128, the output of the upper layer is normalized by adopting the batch norm layer, the convolution is non-linearly operated by adopting the LeakyReLu activation function, and the output is 64×64×128
S143: continuing to convolve, batch norm and LeakyReLu operation on the output of the upper layer by using the method of the step S142, wherein the output is 4 x 100;
s144: performing inverse convolution operation of a convolution kernel 4*4 on the output of the S143, wherein the output is 29 x 1, performing batch normalization operation by using a batch norm, performing nonlinear operation on the output by adopting a ReLu activation function, and outputting the output as 32 x 128; the ReLu activation function is:
s145: the convolution, the bachnorm and the ReLu operations described in step S144 are performed again on the output of the previous layer, the output is 64 x 64 the method comprises the steps of carrying out a first treatment on the surface of the;
s146: the output of the upper layer is subjected to nonlinear operation by using a ReLu activation function, and then is subjected to convolution operation by using inverse convolution with a convolution kernel of 4*4, then nonlinear operation is performed using the Tanh activation function, the output is 128 x 128; the Tanh activation function is as follows:
s147: the output of the upper layer is subjected to operations in the S141-S143 again, and the output is 1 x 5;
s148: the image after scale normalization in the step S13 and the output of the step S147 are input into a convolution layer of 4*4 together for convolution operation, then nonlinear activation is carried out by using a nonlinear activation function LeakyReLu, and the output is 128×128×64;
s149: performing convolution operation on the output of the upper layer by using a convolution check of 4*4, performing batch normalization operation by using a batch norm, and performing nonlinear activation by using a LeakyReLu;
s1491: continuing to convolve the output of the upper layer by adopting the process of S142, wherein the output of the non-linear operation is 4 x 1;
s1492: and finally, adopting Softmax for the output of the upper layer, and outputting the probability of judging true.
S1493: and (3) performing full-connection operation on the output of the S147 process, and finally training 5 expressions through a Softmax classifier, wherein the 5 expressions are 1=happy, 2=abnomination, 3=neutral, 4=analog and 5= surprise and fear, so that facial expression recognition is realized.
Step S15 includes
S151: according to the constructed IC-GAN network structure, the network loss function is also divided into four parts, and for the generated network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:
L con =E x~pX ||x-G(x)|| 1
pX represents data allocation; x is the input image G (X) is the image generated by the generator in the network;
the feature matching method proposed by Salimans et al is used for reducing the instability of training, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is as follows:
L adv =E x~pX ||f(x)-f(G(x))|| 2
where f (·) represents the discriminator model transformation.
The third part is a potential vector z and a reconstructed potential vectorThe encoding loss of the facial expression information of the picture is prevented, and the interference with the picture independence information in the network decoding process is prevented:
where h (·) represents the transcoding.
The network loss of the fourth part is the cross entropy loss of Softmax layer:
where k (·) represents the cross entropy loss process of Softmax, k (y) represents the true result,representing the recognition result.
The overall network loss function is as follows:
L=ω adv L advcon L conp L ps L s
wherein omega adv ,ω con ,ω p ,ω s Is a parameter to regulate losses.
S152: the Optimizer selects Adam Optimizer, the learning rate is set to 0.0002, the training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to 100, 200, 300 and 400 respectively.
S153: in each training, 1 picture of epoch is obtained first, then loss value of loss is calculated, and then parameters of the network are updated continuously by using an Adam optimizer to minimize loss value of the network.
In the step (3), inputting the picture into the trained IC-GAN network model for recognition, finally outputting the probability of each type of facial expression, and outputting the expression category with the highest probability as the result of our classification
In order to enable a researcher in the art to more clearly understand the solution of the present invention, the following detailed and complete description of the solution of the present invention will be given by way of example only with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments that could be made by the researcher without having to obtain the innovative effort are based on the embodiments of the present invention and are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and in the foregoing description are not intended to describe the sequence or order of similar objects and to distinguish similar objects in the description. Where so used, portions of the data are interchangeable, facilitating description or illustration of the implementation of the unexpected sequence. In addition, the terms "comprising" and "having" and their like terms in the specification clearly describe the other steps listed in the specification as inherent to such processes, methods, products and apparatus.
As shown in fig. 1 and 2, the present invention provides a facial expression recognition method based on generating a countermeasure, which includes a network training process and an offline recognition process of facial expression recognition with intra-class gaps.
As an embodiment, the off-line identification process should include the following steps:
step S11: downloading through a network, analyzing video by frame skipping, and collecting an input image x;
step S12: geometrically normalizing the input image x, and detecting a normalized image x';
step S13: processing the detected and cut image x' to a uniform size;
step S14: constructing a network model based on facial expression recognition of the generated countermeasure;
step S15: performing data enhancement and data expansion processing on the image x' and unifying the image size;
step S16: training the network model and storing the trained network model;
in a specific embodiment, step S12 should further comprise the steps of:
step S121: performing geometric normalization processing on the input image; the geometric normalization method comprises scale normalization, outer head correction and face twisting correction;
step S122: performing face detection on the geometrically normalized image by using a face detection method in an OpenCV open source library, and then performing noise reduction treatment on the detected image;
s23: the geometrically normalized image x' is obtained.
As a preferred embodiment, step S23 further comprises:
s131: determining the position of an image according to the coordinates of the face;
s132: obtaining a face image by using OpenCV detection;
s133: and adjusting the face image after clipping to be uniform size, and changing the face image after clipping to be 256 x 256 size.
Still further, step S14 should further comprise: s141: constructing an IC-GAN neural network by using a pytorch deep learning framework, firstly inputting a picture into a con_1 layer for convolution operation, and carrying out convolution on an input image through convolution check of 4*4 to output 128×128×64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, wherein the output is 128 x 64; the LeakyReLu activation function is:
a i is represented by the general formula (1), ++ infinity) interval is a fixed parameter of (2);
s142: the convolution operation is carried out on the output of the upper layer by using the convolution kernel of 4*4, the output is 64×64×128, then the normalization operation is carried out on the output of the upper layer by adopting the batch norm layer, and then the nonlinear operation is carried out on the convolution by adopting the LeakyReLu activation function, and the output is 64×64×128;
s143: the output of the upper layer is continuously subjected to convolution, batch norm and LeakyReLu operation by using the method of S142, and the output is 4 x 100;
s144: performing inverse convolution operation of a convolution kernel 4*4 on the output of the S143, wherein the output is 29 x 1, performing batch normalization operation by using a batch norm, performing nonlinear operation on the output by adopting a ReLu activation function, and outputting the output as 32 x 128;
s145: the outputs of the previous layers are again subjected to the convolution, the bachnorm and the ReLu operations described in S144, the output is 64 x 64 the method comprises the steps of carrying out a first treatment on the surface of the;
s146: the output of the upper layer is subjected to nonlinear operation by using a ReLu activation function, and then is subjected to convolution operation by using inverse convolution with a convolution kernel of 4*4, then nonlinear operation is performed using the Tanh activation function, the output is 128 x 128;
s147: the output of the upper layer is subjected to operations in the S141-S143 again, and the output is 1 x 5;
s148: the original image and the output of S147 are input into a convolution layer of 4*4 together for convolution operation, then nonlinear activation is carried out by using a nonlinear activation function LeakyReLu, and the output is 128 x 64;
s149: performing convolution operation on the output of the upper layer by using a convolution check of 4*4, performing batch normalization operation by using a batch norm, and performing nonlinear activation by using a LeakyReLu;
s1491: continuing to convolve the output of the upper layer by adopting the process of S150, wherein the output of the non-linear operation is 4 x 1;
s1492: and finally, adopting Softmax for the output of the upper layer, and outputting the probability of judging true.
S1493: carrying out full connection operation on the output of the S147 process, and finally training 5 expressions through a Softmax classifier, wherein the 5 expressions are 1=happy, 2=abnomination, 3=neutral, 4=analog and 5= surprise and fear, so that facial expression recognition is realized;
as a preferred embodiment, the IC-GAN network uses a pytorch build network including an input layer, a convolution layer, an activation function, a pooling layer, a full connection layer, a BN layer, and an output layer.
As a preferred embodiment, the size of the front and back of the convolution layer can be described as the following formula:
the input size of the convolution layer is: w (W) 1 *H 1 *D 1
The output size of the convolution layer is:
D 2 =K
in the above formula, K is the number of convolution kernels, F is the size of the convolution kernels, S is the step size, and P is the boundary filling.
As a preferred embodiment, as a mixed expression dataset of the present application, which has a total of 4455 images and 5 expression labels, 1=happy, 2=abnomination, 3=neutral, 4=analog, 5= surprise and fear, the present invention has the problem of unbalanced distribution of the dataset pseudobook, so that the dataset is expanded by adopting modes of affine transformation, mirror image transformation, contrast adjustment, brightness adjustment and the like, and the number of the mixed expression datasets after expansion is as shown in table 1:
table 1 number of expressions of the mixed dataset after expansion
As a most preferred method of the present application, step S15 should further comprise: s151: according to the network structure and experimental characteristics, defining the network loss as 4 parts;
s152: the Optimizer selects Adam Optimizer, the learning rate is set to be 0.0002, training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to be 100, 200, 300 and 400 respectively;
s153: in each training, 1 picture of epoch is obtained first, then loss value of loss is calculated, and then parameters of the network are updated continuously by using an Adam optimizer to minimize loss value of the network.
Still further, the step S16 should further include: s161: downloading through a network, frame skipping, analyzing the video and collecting an input image;
s162: performing geometric normalization processing, face detection, opencv processing and unified size on the input image;
s163: and inputting the processed image into a trained IC-GAN network model for recognition, and finally outputting the probability of each expression, wherein the expression with the highest probability is the expression which is wanted to be recognized by the network.
Compared with the prior art, the invention has the advantages that:
according to the facial expression recognition method based on the generated countermeasure, the facial expression features are automatically extracted by comparing with the traditional method for manually extracting the expression features, and compared with the facial expression recognition of a neural network at a slightly early stage, the facial expression recognition method based on the generated countermeasure has the advantages that the recognition rate is improved, and therefore the expression recognition is accurately carried out.
As an embodiment of the application, the number of samples after data enhancement is 4455 training samples, 411 test samples, and the idea of model training is that before inputting pictures into a network for training, firstly, images are cut through OpenCV open source codes, secondly, the images are unified to 256 x 256 sizes, and then, the preprocessed pictures are used as the input of the network to train an IC-GAN network model. The Softmax loss function adopts a cross entropy loss function, the Optimizer adopts an Adam Optimizer, the learning rate is set to be 0.0002, training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to be 100, 200, 300 and 400 respectively.
As a preferred embodiment of the present application, the identification process should comprise the steps of:
s21: downloading through a network, analyzing video by frame skipping, and collecting an input image I;
s22: then the input image I is input into the trained network model;
s23: and obtaining a recognition result.
The above-mentioned embodiment numbers of the present invention are only for describing the present invention, and do not represent the quality of any embodiment.
In the embodiments of the present invention, the descriptions of the embodiments have emphasis, and if some portions of the descriptions in one embodiment are not clear, reference may be made to the corresponding descriptions in other embodiments;
in several embodiments provided in the present application, the described technical content may be implemented in other manners. All the above description is merely illustrative.

Claims (5)

1. A facial expression recognition method based on Intril-Class Gap GAN is characterized by comprising the following steps:
the identification model construction comprises the following steps:
(1) Collecting real-time images of different sources and different expressions of the human face;
(2) Inputting the image into an Intril-Class Gap GAN neural network model for recognition;
(3) Outputting a recognition result;
the method for constructing the Intril-Class Gap GAN neural network model in the step (2) comprises the following steps:
(2.1) collecting historical images of different sources and different expressions of the human face;
(2.2) preprocessing the collected face image to construct a face expression data set;
(2.3) constructing an Intril-Class Gap GAN neural network model aiming at the facial expression recognition problem with intra-Class gaps in the data set in the step (2.2);
(2.4) training the generator and discriminator of the network simultaneously in combination with the pixel differences and the potential vector differences between the input image and the reconstructed image, ensuring that the differences between the reconstructed image and the input image are minimal;
the method comprises the following steps:
s14: training a facial expression recognition network model based on the IC-GAN neural network generating the countermeasure based on the image processed in step S13;
the method comprises the following steps:
s141: constructing a proposed integrated circuit-gate (IC-Class Gap GAN) neural network by using a pytorch deep learning framework, firstly inputting the picture processed in the step S13 into a first layer of convolution layer to perform convolution operation, and performing convolution on an input image through convolution check of 4*4 to output 128×128×64; performing nonlinear operation on the convolution by adopting a LeakyReLu activation function, wherein the output is 128 x 64; the LeakyReLu activation function is:
a i is represented by the general formula (1), ++ infinity) interval is a fixed parameter of (2);
s142: the convolution operation is carried out on the output of the upper layer by using the convolution kernel of 4*4, the output is 64×64×128, then the normalization operation is carried out on the output of the upper layer by adopting the batch norm layer, and then the nonlinear operation is carried out on the convolution by adopting the LeakyReLu activation function, and the output is 64×64×128;
s143: continuing to convolve, batch norm and LeakyReLu operation on the output of the upper layer by using the method of the step S142, wherein the output is 4 x 100;
s144: performing inverse convolution operation of a convolution kernel 4*4 on the output of the S143, wherein the output is 29 x 1, performing batch normalization operation by using a batch norm, performing nonlinear operation on the output by adopting a ReLu activation function, and outputting the output as 32 x 128; the ReLu activation function is:
s145: the convolution, the bachnorm and the ReLu operations described in step S144 are performed again on the output of the previous layer, the output is 64 x 64 the method comprises the steps of carrying out a first treatment on the surface of the;
s146: the output of the upper layer is subjected to nonlinear operation by using a ReLu activation function, and then is subjected to convolution operation by using inverse convolution with a convolution kernel of 4*4, then nonlinear operation is performed using the Tanh activation function, the output is 128 x 128; the Tanh activation function is as follows:
s147: the output of the upper layer is subjected to operations in the S141-S143 again, and the output is 1 x 5;
s148: the image after scale normalization in the step S13 and the output of the step S147 are input into a convolution layer of 4*4 together for convolution operation, then nonlinear activation is carried out by using a nonlinear activation function LeakyReLu, and the output is 128×128×64;
s149: performing convolution operation on the output of the upper layer by using a convolution check of 4*4, performing batch normalization operation by using a batch norm, and performing nonlinear activation by using a LeakyReLu;
s1491: continuing to convolve the output of the upper layer by adopting the process of S142, wherein the output of the non-linear operation is 4 x 1;
s1492: finally, softmax is adopted for the output of the upper layer, and the probability of true judgment is output;
s1493: carrying out full connection operation on the output of the S147 process, and finally training 5 expressions through a Softmax classifier, wherein the 5 expressions are 1=happy, 2=abnomination, 3=neutral, 4=analog and 5= surprise and fear, so that facial expression recognition is realized;
s15: carrying out data enhancement and data expansion processing on the image;
comprising:
s151: the network loss function is also divided into four parts, and for the generation network of the first part, the difference between the original image and the reconstructed image is reduced on the pixel level, and the reconstruction error loss is as follows:
L con =E x~pX ||x-G(x)|| 1
pX represents data allocation; x is the input image G (X) is the image generated by the generator in the network;
the feature matching method proposed by Salimans et al is used for reducing the instability of training, the image feature level is optimized, and one feature matching error of the discriminator of the second partial network is as follows:
L adv =E x~pX ||f(x)-f(G(x))|| 2
wherein f (·) represents a discriminator model transformation;
the third part is a potential vector z and a reconstructed potential vectorThe encoding loss of the facial expression information of the picture is prevented, and the interference with the picture independence information in the network decoding process is prevented:
wherein h (■) represents a transcoding;
the network loss of the fourth part is the cross entropy loss of Softmax layer:
where k (■) represents the cross entropy loss process of Softmax, k (y) represents the true result,representing the recognition result;
the overall network loss function is as follows:
L=ω adv L advcon L conp L ps L s
wherein omega adv ,ω con ,ω p ,ω s Is a parameter to regulate losses;
s152: the Optimizer selects Adam Optimizer, the learning rate is set to be 0.0002, training samples are trained in batches, 16 pictures are selected for training in each batch, and epoch is set to be 100, 200, 300 and 400 respectively;
s153: firstly obtaining 1 picture of epoch in each training, then calculating loss value of loss, and then continuously updating parameters of the network by using an Adam optimizer to minimize the loss value of the network;
s16: and training the network model and storing the trained network model.
2. The facial expression recognition method based on intraclass Gap GAN according to claim 1, wherein:
the facial expression data set construction method in the step (2.2) comprises the following steps:
s11: based on Multi-PIE and JAFFE expression data sets, facial expression pictures are downloaded on the network through the step (2.1), the facial expression data sets required by homemade are carried out, abomination, happy, neutral, anxious and surprise and fear facial expressions of different countries, different age groups and different professional groups are selected for experiments, and a large number of facial expression characteristics with intra-class gaps are added as the complexity of the data sets to be used as input images x of network training;
s12: geometric normalization processing the input image, and performing face detection on the normalized image;
s13: the scale normalizes the image after the processing in step S12, unifying the size of the image.
3. The facial expression recognition method based on intraclass Gap GAN according to claim 2, wherein: the step S12 includes the following steps:
s121: aiming at the collected image, calibrating characteristic points [ x, y ], calibrating the characteristic points of the two eyes and the nose to obtain coordinate values of the characteristic points;
s122: rotating the image according to the coordinates of eyes on the face to ensure the consistency of the face direction, wherein the distance between the eyes of the person is d, and the middle point of the two eyes is O;
s123: and determining a square frame containing the human face according to the calibrated characteristic points and the geometric model, and respectively cutting d distances from O to left and right, wherein the distances are respectively 0.5d and 1.5d in the up-down direction.
4. The facial expression recognition method based on intraclass Gap GAN according to claim 2, wherein:
the step S13 includes the following steps:
s131: and (3) carrying out scale normalization on the picture cut in the step (S123), unifying the images to 256 pixels by 256 pixels, and completing geometric normalization on the images.
5. The facial expression recognition method based on Intril-Class Gap GAN according to claim 1, wherein the facial expression recognition method comprises the following steps:
in the step (3), the picture is input into a trained IC-GAN network model for recognition, the probability of each type of facial expression is finally output, and the expression class with the highest output probability is the classification result of the user.
CN202010905875.1A 2019-09-02 2020-09-01 Facial expression recognition method based on Intril-Class Gap GAN Active CN112183213B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910822252 2019-09-02
CN2019108222525 2019-09-02

Publications (2)

Publication Number Publication Date
CN112183213A CN112183213A (en) 2021-01-05
CN112183213B true CN112183213B (en) 2024-02-02

Family

ID=73924606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010905875.1A Active CN112183213B (en) 2019-09-02 2020-09-01 Facial expression recognition method based on Intril-Class Gap GAN

Country Status (1)

Country Link
CN (1) CN112183213B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688799B (en) * 2021-09-30 2022-10-04 合肥工业大学 Facial expression recognition method for generating confrontation network based on improved deep convolution

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778506A (en) * 2016-11-24 2017-05-31 重庆邮电大学 A kind of expression recognition method for merging depth image and multi-channel feature
WO2018054283A1 (en) * 2016-09-23 2018-03-29 北京眼神科技有限公司 Face model training method and device, and face authentication method and device
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks
CN108615010A (en) * 2018-04-24 2018-10-02 重庆邮电大学 Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN109376625A (en) * 2018-10-10 2019-02-22 东北大学 A kind of human facial expression recognition method based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10354159B2 (en) * 2016-09-06 2019-07-16 Carnegie Mellon University Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018054283A1 (en) * 2016-09-23 2018-03-29 北京眼神科技有限公司 Face model training method and device, and face authentication method and device
CN106778506A (en) * 2016-11-24 2017-05-31 重庆邮电大学 A kind of expression recognition method for merging depth image and multi-channel feature
CN108304826A (en) * 2018-03-01 2018-07-20 河海大学 Facial expression recognizing method based on convolutional neural networks
CN108615010A (en) * 2018-04-24 2018-10-02 重庆邮电大学 Facial expression recognizing method based on the fusion of parallel convolutional neural networks characteristic pattern
CN109376625A (en) * 2018-10-10 2019-02-22 东北大学 A kind of human facial expression recognition method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于CMAC神经网络的人脸表情识别;叶芳芳;许力;;计算机仿真(08);全文 *
基于约束性循环一致生成对抗网络的人脸表情识别方法;胡敏;余胜男;王晓华;;电子测量与仪器学报(04);全文 *

Also Published As

Publication number Publication date
CN112183213A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN108932536B (en) Face posture reconstruction method based on deep neural network
Christa et al. CNN-based mask detection system using openCV and MobileNetV2
KR20210025020A (en) Face image recognition using pseudo images
Kim et al. Kernel principal component analysis for texture classification
Kantarcı et al. Thermal to visible face recognition using deep autoencoders
CN108875645B (en) Face recognition method under complex illumination condition of underground coal mine
CN113486752A (en) Emotion identification method and system based on electrocardiosignals
CN112183213B (en) Facial expression recognition method based on Intril-Class Gap GAN
WO2014158345A1 (en) Methods and systems for vessel bifurcation detection
CN113221660B (en) Cross-age face recognition method based on feature fusion
EP4238073A1 (en) Human characteristic normalization with an autoencoder
Fahmy et al. Toward an automated dental identification system
Silva et al. POEM-based facial expression recognition, a new approach
Andayani et al. Identification of the tuberculosis (TB) disease based on XRay images using probabilistic neural network (PNN)
Zabihi et al. Vessel extraction of conjunctival images using LBPs and ANFIS
Nainwal et al. Convolution neural network based covid-19 screening model
Blackledge et al. Texture classification using fractal geometry for the diagnosis of skin cancers
CN115116117A (en) Learning input data acquisition method based on multi-mode fusion network
Jain et al. Brain Tumor Detection using MLops and Hybrid Multi-Cloud
Depuru et al. Hybrid CNNLBP using facial emotion recognition based on deep learning approach
CN114049668B (en) Face recognition method
Amelia Age Estimation on Human Face Image Using Support Vector Regression and Texture-Based Features
Mahmood et al. An investigational FW-MPM-LSTM approach for face recognition using defective data
Praneel et al. Malayalam Sign Language Character Recognition System
CN113269145B (en) Training method, device, equipment and storage medium of expression recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant