CN112232116A

CN112232116A - Facial expression recognition method and device and storage medium

Info

Publication number: CN112232116A
Application number: CN202010932427.0A
Authority: CN
Inventors: 黄建新; 丁永波; 刘超
Original assignee: Shenzhen Weibu Information Co Ltd
Current assignee: Shenzhen Weibu Information Co Ltd
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2021-01-15

Abstract

The invention discloses a facial expression recognition method, a device and a storage medium, wherein the method comprises the following steps: constructing and training a convolutional neural network for expression recognition, wherein the convolutional neural network is added with a preset network depth; acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image; and preprocessing the face detection image, inputting the preprocessed face detection image into the convolutional neural network for expression recognition to perform expression recognition processing, and outputting the expression type in the face detection image. According to the embodiment of the invention, the deepened convolutional neural network is adopted to identify the facial expression of the target face, so that the nonlinear expression is increased, the model expression capability is improved, and more accurate facial expression identification is realized.

Description

Facial expression recognition method and device and storage medium

Technical Field

The invention relates to the technical field of facial expression recognition, in particular to a facial expression recognition method, a device and a storage medium.

Background

The facial expression recognition technology is used for extracting expression features from a given facial expression image and classifying the expression features into a certain type of specific expression classification. The facial expression recognition has wide application value, and the rapid and accurate facial expression recognition is beneficial to analyzing the emotion of a recognized object and enhancing emotion feedback and the like between the fields of man-machine interaction. The existing facial expression recognition method is mainly based on a random forest algorithm, an expression feature dimension reduction method and the like, and because the data volume is huge in a big data era, more expressions to be recognized are complex in category rules, the recognition operation process of the facial expression is complex, and the recognition accuracy is low.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

In view of the above disadvantages of the prior art, an object of the present invention is to provide a method, an apparatus and a storage medium for recognizing human facial expressions, which are used to solve the problem of low efficiency and accuracy in recognizing human facial expressions in the prior art.

The technical scheme of the invention is as follows:

a facial expression recognition method comprises the following steps:

constructing and training a convolutional neural network for expression recognition, wherein the convolutional neural network is added with a preset network depth;

acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image;

and preprocessing the face detection image, inputting the preprocessed face detection image into the convolutional neural network for expression recognition to perform expression recognition processing, and outputting the expression type in the face detection image.

In the facial expression recognition method, a convolution layer is newly added to the convolutional neural network for expression recognition after an input layer, and the convolution kernel of the newly added convolution layer is 1 x 1.

In the facial expression recognition method, the building and training of the convolutional neural network for expression recognition includes:

obtaining a training sample of known expression categories, wherein the expression categories include happiness, anger, fear, surprise, disgust, sadness and neutrality;

constructing the convolutional neural network for expression recognition, and inputting the training samples of the known expression classes into the convolutional neural network;

and performing error evaluation on the output value of the convolutional neural network through a preset loss function, and performing back propagation and adjustment on the weight parameter of the convolutional neural network according to an error result until the output value of the convolutional neural network reaches an expected value to obtain the trained convolutional neural network for expression recognition.

In the facial expression recognition method, the training sample adopts an FER-2013 data set.

In the facial expression recognition method, the acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image includes:

acquiring an image to be recognized, and inputting the image to be recognized into a pre-trained MobileNet cascade neural network model, wherein a Convolution layer of the MobileNet cascade neural network model is Depthwise Separable constraint;

positioning the face position in the image to be recognized through a first neural network in the pre-trained MobileNet cascade neural network model, and cutting the image to be recognized according to a positioning result to obtain a face region image;

and positioning the key points of the human face in the human face region image through a second neural network in the pre-trained MobileNet cascade neural network model, and outputting a human face detection image.

In the facial expression recognition method, the preprocessing the face detection image includes:

the method comprises the steps of predefining key point positions and illumination conditions of a standard face;

aligning the key point position of the current face detection image to the key point position of a standard face through a preset image transformation algorithm to obtain an aligned face detection image;

performing light correction processing on the aligned face detection image according to the illumination condition of the standard face to obtain a face detection image with the illumination condition consistent with that of the standard face;

and sequentially carrying out image graying and image normalization processing on the face detection image subjected to the light ray correction processing.

The invention also provides a facial expression recognition device, which comprises at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described facial expression recognition method.

Another embodiment of the present invention also provides a non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the above-described facial expression recognition method.

Another embodiment of the present invention also provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the above-mentioned facial expression recognition method.

Has the advantages that: compared with the prior art, the embodiment of the invention adopts a deepened convolutional neural network to identify the facial expression of the target face, increases the nonlinear expression, improves the model expression capability and further realizes more accurate facial expression identification.

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flowchart illustrating a method for recognizing facial expressions according to a preferred embodiment of the present invention;

fig. 2 is a schematic diagram of a hardware structure of a facial expression recognition apparatus according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is described in further detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a flowchart illustrating a facial expression recognition method according to a preferred embodiment of the present invention. As shown in fig. 1, it includes the following steps:

s100, constructing and training a convolutional neural network for expression recognition, wherein the convolutional neural network is added with a preset network depth;

s200, acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image;

s300, preprocessing the face detection image, inputting the preprocessed face detection image into the convolutional neural network for expression recognition to perform expression recognition processing, and outputting the expression type in the face detection image.

In the embodiment, a convolutional neural network for expression recognition is constructed and trained, subsequent expression recognition is realized through the convolutional neural network, then an image to be recognized is obtained, face detection is performed on the image to be recognized to obtain a face detection image, the face detection image is preprocessed and then input to the convolutional neural network for expression recognition to perform expression recognition processing, and expression categories in the face detection image are output. Particularly, in the facial expression recognition method provided by the invention, the convolutional neural network increases the preset network depth, that is, a deepened convolutional neural network is adopted to extract and recognize expression features, the receptive field can be effectively increased along with the increase of the network depth, so that the extracted features have more details, more accurate expression recognition is realized, meanwhile, after face detection is carried out to obtain a face detection image, the face detection image is preprocessed and then input to the convolutional neural network to carry out expression recognition, the interference of information such as posture, illumination and the like in the image to be recognized is further eliminated, and the accurate recognition of the expression is ensured.

Specifically, the convolutional neural network for expression recognition newly adds a convolutional layer after an input layer, and the convolution kernel of the newly added convolutional layer is 1 x 1. Namely, the convolutional neural network adopted in the embodiment adds a 1 × 1 convolutional layer after the input layer, and the 1 × 1 convolutional layer increases the nonlinear expression of the input, deepens the network depth, can extract more detailed features, improves the expression capability of the neural network, does not significantly increase the calculated amount, does not affect the original calculation speed, and simultaneously ensures the efficiency and the accuracy of expression recognition.

Further, the building and training of the convolutional neural network for expression recognition comprises:

In this embodiment, training data for training a convolutional neural network is obtained first, where the training data is a training sample of a known expression class, where the expression class includes happy, angry, fear, surprise, disgust, sadness, and neutrality, that is, a plurality of expression samples in the seven expression classes are collected in advance, for example, at least 100 expression samples are obtained every expression class, and specifically, the training sample adopts an FER-2013 data set, so that overfitting can be avoided for a large amount of training data, and recognition accuracy is improved; then, constructing the convolutional neural network for expression recognition, inputting the training samples of known expression categories into the convolutional neural network for learning training, namely training the deepened convolutional neural network through the training samples, performing error evaluation on the output value of the convolutional neural network through a preset loss function, adjusting the weight parameters of the convolutional neural network according to the back propagation of error results until the output value of the convolutional neural network reaches an expected value to obtain the trained convolutional neural network for expression recognition, specifically, the preset loss function can perform error evaluation on the output value of the convolutional neural network by adopting, for example, a cross entropy loss function, calculate the value of each node through the forward propagation of input signals, perform the back propagation of errors after calculating the output errors, and adjust the weight parameters of each layer of the convolutional neural network according to an error gradient descent method, after repeated adjustment, the final output of the convolutional neural network reaches an expected value, thereby completing the training of the convolutional neural network to obtain a model finally used for expression recognition,

preferably, the last layer of the convolutional neural network for expression recognition shown in this embodiment uses a classifier to classify facial expressions, and supervised learning is performed on the classifier before classification of facial expressions is performed to obtain classification capability, specifically, the classifier can use a Softmax classifier, and the classification process specifically includes that after expression features are extracted, the classifier uses a plurality of neurons to correspond to different expression categories, each neuron outputs a numerical value with a value between 0 and 1, the numerical value represents the probability that the input sample belongs to the category, the category corresponding to the neuron with the largest output value is selected as the expression classification result of the current input sample, so as to realize accurate expression classification recognition after deepening network expression feature extraction,

further, the acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image includes:

and positioning the key points of the face in the face region image through a second neural network in the pre-trained MobileNet cascade neural network model, and outputting the face image to be recognized.

In the embodiment, when performing face region detection and face key point detection, a cascade neural network model is trained in advance, wherein the cascade neural network model comprises a first neural network and a second neural network which are respectively used for face region detection and face key point detection, the two neural networks realize face detection from rough to fine in a cascade mode, the two neural networks are respectively trained through a training data set until convergence, and a final cascade neural network model for face detection is obtained, wherein a Convolution layer of the cascade neural network model is a Depthwise Separable Convolution structure in a Mobilene series, namely, a tripthwise and Pointwise Convolution is used for replacing a Sptial Convolution in Pnet, Rnet and Onet, the network computation speed is greatly increased, and an obtained image to be identified is input into the cascade neural network model, positioning the face position in the image to be recognized through a first neural network, cutting the image to be recognized according to the positioning result to obtain a face region image, wherein the face position refers to the information of the position of the face in the image, and usually includes the pixel coordinates of the upper left corner or the central point of the face in the image, the length and the width of the face, and the like, cutting the image to be recognized according to the positioning information, removing the region not including the face to obtain the face region image, then positioning the face key points in the face region image through a second neural network, and outputting the face image to be recognized, wherein the face key points include the coordinate values of eyes, nose tip, mouth corner tip, eyebrows and face contour, the positions of the face key points can be used for indicating the pose and the label of the face, and the face image can be further corrected by using the information so as to extract the face features at the later stage, the accuracy of facial expression recognition is ensured.

Further, the preprocessing the face detection image includes:

In this embodiment, before performing expression recognition, a face detection image to be recognized is preprocessed to eliminate the influence of an external environment and a human body pose on expression recognition accuracy, specifically, a key point position and an illumination condition of a standard face are predefined, for example, a key point position and an illumination condition of a certificate photo, and then a preset image transformation algorithm is used to align the key point position of the current face detection image to the key point position of the standard face to obtain an aligned face detection image, thereby achieving the purpose of correcting the face pose, in this embodiment, the image transformation algorithm may be a basic image transformation method or combination such as similarity transformation, affine transformation, or in other embodiments, the pose correction may also be directly realized by the landmark alignment, specifically, the landmark detection is performed on the detected face to obtain a series of landmark points, and calculating an affine matrix H by using the detected landmark points and the landmark points of the standard template posture, and then directly calculating by using the affine matrix H to obtain an aligned image, thereby realizing the correction of the face posture and eliminating the influence of the face posture on the face expression recognition accuracy as much as possible.

Meanwhile, the light correction processing is also carried out on the aligned face detection image according to the illumination condition of the standard face to obtain a face detection image with the illumination condition consistent with that of the standard face, namely, the illumination condition of the aligned face image is converted into the illumination condition of the standard face through the light correction, for example, a gamma value can be used for correcting and adjusting an image pixel value, so that the processed image has proper contrast and the details of the face are clear and visible G. The three-dimensional color image is converted into a two-dimensional gray image represented by a gray value, and the calculation amount of subsequent work can be greatly reduced by carrying out image graying processing, so that the influence of illumination on the image is further reduced, and the processing efficiency is improved; the image normalization is to use a linear normalization algorithm to perform normalization processing on the two-dimensional gray level image to obtain an image with fixed pixels, the statistical distribution of unified image samples can be summarized by performing the image normalization processing, so that the input with different physical meanings and dimensions can be used equally, the influence of different collected environments, such as illumination, equipment performance difference and other factors, on the image of the face detection image to be recognized can be compensated through image preprocessing operation, interference information is removed, the image contrast is improved, and the expression recognition accuracy is further improved.

Another embodiment of the present invention provides a facial expression recognition apparatus, as shown in fig. 2, the apparatus 10 includes:

one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 2, the processor 110 and the memory 120 may be connected by a bus or other means, and the connection by the bus is illustrated in fig. 2.

The processor 110 is used to implement various control logic for the device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip, an ARM (Acorn RISC machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The memory 120 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the facial expression recognition method in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the apparatus 10, i.e. implements the facial expression recognition method in the above-described method embodiments, by running non-volatile software programs, instructions and units stored in the memory 120.

The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the device, at least one function; the storage data area may store data created according to the use of the device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more units are stored in the memory 120, which when executed by the one or more processors 110, perform the facial expression recognition method in any of the above-described method embodiments, e.g., performing the above-described method steps S100-S300 in fig. 1.

Embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S300 of fig. 1 described above.

By way of example, non-volatile storage media can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.

Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to perform the facial expression recognition method of the above-described method embodiment. For example, the method steps S100 to S300 in fig. 1 described above are performed.

In summary, in the facial expression recognition method, apparatus and storage medium disclosed in the present invention, the method includes: constructing and training a convolutional neural network for expression recognition, wherein the convolutional neural network is added with a preset network depth; acquiring an image to be recognized, and performing face detection on the image to be recognized to obtain a face detection image; and preprocessing the face detection image, inputting the preprocessed face detection image into the convolutional neural network for expression recognition to perform expression recognition processing, and outputting the expression type in the face detection image. According to the embodiment of the invention, the deepened convolutional neural network is adopted to identify the facial expression of the target face, so that the nonlinear expression is increased, the model expression capability is improved, and more accurate facial expression identification is realized.

The above-described embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a general hardware platform, and may also be implemented by hardware. With this in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer electronic device (which may be a personal computer, a server, or a network electronic device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Conditional language such as "can," "might," or "may" is generally intended to convey that a particular embodiment can include (yet other embodiments do not include) particular features, elements, and/or operations, unless specifically stated otherwise or otherwise understood within the context as used. Thus, such conditional language is not generally intended to imply that features, elements, and/or operations are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without student input or prompting, whether such features, elements, and/or operations are included or are to be performed in any particular embodiment.

What has been described herein in the specification and drawings includes examples capable of providing a facial expression recognition method, apparatus, and storage medium. It will, of course, not be possible to describe every conceivable combination of components and/or methodologies for purposes of describing the various features of the disclosure, but it can be appreciated that many further combinations and permutations of the disclosed features are possible. It is therefore evident that various modifications can be made to the disclosure without departing from the scope or spirit thereof. In addition, or in the alternative, other embodiments of the disclosure may be apparent from consideration of the specification and drawings and from practice of the disclosure as presented herein. It is intended that the examples set forth in this specification and the drawings be considered in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A facial expression recognition method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the convolutional neural network for expression recognition adds a convolutional layer after the input layer, and the convolutional kernel of the newly added convolutional layer is 1 x 1.

3. The facial expression recognition method of claim 2, wherein the constructing and training of the convolutional neural network for expression recognition comprises:

4. The method according to claim 3, wherein the training sample employs FER-2013 data set.

5. The method for recognizing facial expressions according to claim 1, wherein the acquiring an image to be recognized and performing face detection on the image to be recognized to obtain a face detection image comprises:

6. The method of claim 5, wherein the preprocessing the face detection image comprises:

7. An apparatus for facial expression recognition, the apparatus comprising at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of facial expression recognition according to any one of claims 1-6.

8. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the method of facial expression recognition of any one of claims 1-6.

9. A computer program product, characterized in that the computer program product comprises a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method of facial expression recognition according to any one of claims 1-6.