CN112836680A

CN112836680A - Visual sense-based facial expression recognition method

Info

Publication number: CN112836680A
Application number: CN202110234838.7A
Authority: CN
Inventors: 赵雪专; 裴利沈; 李玲玲; 赵中堂; 薄树奎; 马腾; 杨勇; 张湘熙; 刘汉卿
Original assignee: Zhengzhou University of Aeronautics
Current assignee: Zhengzhou University of Aeronautics
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2021-05-25
Also published as: WO2022184133A1

Abstract

The invention provides a visual-based facial expression recognition method. The present invention relates to the field of information processing. The invention comprises the following steps: acquiring a face image sample set, and preprocessing the face image sample set; extracting the characteristics of the preprocessed face image sample set to obtain an expression characteristic sample set; introducing the expression feature sample set into an expression recognition classifier for learning training to obtain a trained expression recognition classifier; and putting the test sample into the trained expression recognition classifier, and evaluating the test sample by the expression recognition classifier. The invention can identify the expression degree of the facial expression.

Description

Visual sense-based facial expression recognition method

Technical Field

The invention relates to the field of information processing, in particular to a visual-based facial expression recognition method.

Background

In recent years, with the rise of artificial intelligence technology and the great promotion of computer technology, people put forward higher and higher requirements on human-computer interaction tasks. Meanwhile, in the communication between people, not only the language symbols, but also the facial expressions and other body languages are the components for transmitting information.

At present, because human expressions have different degrees, for example, smiling expressions can be classified into laughing, smiling and the like, crying expressions can be classified into choking, crying and the like, and the existing expression recognition systems cannot recognize different degrees of the same expression.

Disclosure of Invention

The invention aims to provide a visual-based facial expression recognition method, which can recognize the expression degree of facial expressions.

The embodiment of the invention is realized by the following steps:

the embodiment of the application provides a visual-based facial expression recognition method, which comprises the following steps of;

acquiring a face image sample set, and preprocessing the face image sample set;

extracting the characteristics of the preprocessed face image sample set to obtain an expression characteristic sample set;

introducing the expression feature sample set into an expression recognition classifier for learning training to obtain a trained expression recognition classifier;

and putting the test sample into the trained expression recognition classifier, and evaluating the test sample by the expression recognition classifier.

In some embodiments of the present invention, the obtaining a face image sample set, and the preprocessing the face image sample set includes converting color images of the face image sample set into grayscale images:

wherein, RGB is the color component of each color image pixel point, and g is the converted gray value.

In some embodiments of the present invention, the expression feature samples obtained by extracting the features of the preprocessed face image sample set comprise face geometric features.

In some embodiments of the present invention, the geometric features of the human face include glasses, eyebrows, nose, and mouth.

In some embodiments of the present invention, the change of the expressive feature comprises a change of a texture shape of a geometric feature of the human face.

In some embodiments of the present invention, the facial geometric feature extraction method of the expression feature samples of the expression feature sample set is based on the spatiotemporal features of the difference image.

In some embodiments of the present invention, the expression recognition classifier is a BP neural network classifier.

In some embodiments of the present invention, the BP neural network classifier takes the expression feature samples as input, performs linear combination in the BP neural network, and outputs each neuron through a nonlinear activation function, where each neuron obtains a calculation result.

In some embodiments of the present invention, the operation process of the BP neural network classifier includes two stages:

forward propagation: firstly, inputting an expression feature sample set into an input layer, performing weighting calculation through a hidden layer, and finally outputting the expression feature sample set by an output layer, wherein the former layer is equivalent to the input layer of the next layer in the processing of each layer;

and (3) back propagation: when the information is transmitted to an output layer, comparing the result with a given label, judging whether a convergence condition is reached, and if so, ending the training process; if not, the layer-by-layer backward propagation is carried out, and the weights are sequentially adjusted until the convergence condition is met.

In some embodiments of the invention, the activation function:

where x is the actual output.

Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:

a vision-based facial expression recognition method comprises the following steps;

acquiring a face image sample set, and preprocessing the face image sample set;

In the above embodiment, a visual-based facial expression recognition method includes four steps in total, first acquiring a face image sample set, and preprocessing the face image sample set, wherein, the method for preprocessing the face image sample set is to convert color images in RGB space into gray level images, since the color of each pixel in a color image is determined by R, G, B three components, and each component has 256 values, therefore, the color change range of 1600 or more tens of thousands of such a pixel point can be obtained, the storage box is calculated to be burdened by too large data volume, and the change range of one pixel point of the gray image is only 256 cases, therefore, by converting the face image sample into the gray level image, the subsequent image processing calculation amount is greatly reduced, meanwhile, the gray-scale image can still reflect the distribution and the characteristics of the overall and local chromaticity and brightness levels of the whole image which is consistent from the color image. Extracting the characteristics of the preprocessed face image sample set to obtain an expression characteristic sample set, wherein the expression characteristic sample is used as an obvious and effective strategy in order to identify the type and the change condition of the face expression; introducing the expression characteristic sample set into an expression recognition classifier for learning training to obtain a trained expression recognition classifier, wherein the expression recognition classifier can adopt a BP neural network classifier, the BP neural network classifier is a learning model based on an artificial neural network structure, the network weight can be continuously modified by setting a plurality of layers of hidden layers and in a reverse back propagation way, therefore, the purpose of learning is achieved, 5000 human face pictures are adopted as a training sample level in the application, 5000 human face pictures are selected as a test sample set, in order to ensure the training effect, after each training, the training result is compared with the last training result, if the error is increased, the weight value is adjusted towards the negative direction, if the error is decreased, the weight value is adjusted towards the positive direction, therefore, the identification accuracy is continuously improved, and the training process is completed when the preset training times or the preset convergence condition is reached. And finally, putting the test sample into the trained expression recognition classifier, and evaluating the test sample by the expression recognition classifier.

In the embodiment, different degrees of the same expression are distinguished more accurately according to facial expression recognition, facial expression recognition is achieved through four steps, image preprocessing adopts a method of converting RGB space into gray level images, feature extraction adopts texture analysis technology of gray level co-occurrence matrixes, expression space-time extraction based on difference images is achieved, a BP neural network method is adopted as a classifier, the BP neural network has three layers, feature vectors are used as input, expressions after fine classification are used as coding output, a learning model is established through continuous learning training, a convolutional neural network is constructed by combining a deep learning technology, and accuracy of expression recognition is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart of a visual sense-based facial expression recognition method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Examples

Please refer to fig. 1. The method for recognizing the facial expression based on the vision comprises the following steps of;

acquiring a face image sample set, and preprocessing the face image sample set;

In the above embodiment, a visual-based facial expression recognition method includes four steps in total, first acquiring a face image sample set and preprocessing the face image sample set, wherein,

because other body parts such as shoulders, necks and the like can be acquired while the three-dimensional face is scanned, the face needs to be separated to remove non-face redundant information, the face data is acquired, because factors such as the posture, the distance and the like can have the difference of the size and the inclination angle, in order to reduce the difference of the two aspects, the face needs to be normalized and aligned, in order to determine the normalized coordinate value of each point cloud data of the face, the nose cusp point is selected as a characteristic reference point in the embodiment, and therefore the original three-dimensional point data is preprocessed through four steps of nose cusp positioning, face normalization, posture correction and depth map acquisition; the nose tip is used as a local highest point and has a top cap shape to extract nose tip points, for each point P, the adjacent points can be regarded as being distributed on a sphere with the point as the center of circle, and effective energy (P) is defined_i-P) and N_pInner product of (d); d_i＝(P_i-P)*N_p＝(P_i-P)cosɑ；N_pRepresenting two normalized normals, P being the point to be selected, its three-dimensional coordinates and normal vector N_pIs known as P_iFor the data points at the peak top, the angle a between the normal line and the adjacent point is greater than 90 °, so the effective energy d is_iNegative values, this constraint is not sufficient to detect the nose tip, since many points of the cheek, chin, etc. also satisfy this condition, but this step greatly reduces the search range; then calculate the energy d_iMean values μ and Be of statistical properties of²Variance of (a):

all points fall into two categories: nasal and non-nasal cusps, (x)_i,y_i),i＝1,2… …, l are training samples, where x_iIs a two-dimensional vector (mu, Be)²)，y_iE { -1,1} represents a class, and the support vector SVM is combined to distinguish two classes by finding the minimum value of an energy function:

wherein the kernel function K

For test vector x, the function class is determined by,

b^*is a parameter obtained by optimizing an energy function, and L is the number of the obtained support vectors; when the nose tip point is detected, the nose tip point is positioned at the origin of coordinates, so that all three-dimensional face models in a database are aligned, and meanwhile, the size range of the face is determined by taking the nose tip point as the center according to the theory of three five eyes, the face is normalized to be the standard size, and the size of the normalized face is 130 to 100. although the collected object is required to be kept as still as possible when the three-dimensional face data is collected, the face has many degrees of freedom in three-dimensional space, the input postures at different angles can increase the recognition difficulty, the posture correction is a step of setting all input three-dimensional faces under the same coordinate, and the posture correction coordinate is constructed based on the steps: suppose face point cloud V is a set of N vertices, V ═ V_i∈R³|1≤I≤N}；

Calculating the mass center of all point clouds V of the human face:

constructing a coefficient matrix of point cloud vertex distribution:

the face curved surface has unequal length, the coefficient C is subjected to characteristic decomposition, and 3 characteristic values lambda with different sizes can be obtained₁≥λ₂≥λ₃And correspondingEigenvector Be₁，б₂，б₃(ii) a Mixing O with_vAs origin, is₁As the Y axis, is₂As the X-axis, is₃As the Z-axis, a pose coordinate system PCS is defined, in which all faces have the same frontal pose, and the original point cloud is transformed to a new coordinate space:

the preprocessed three-dimensional point cloud can be mapped to a two-dimensional plane through orthogonal projection, the depth information of the point cloud is obtained, and corresponding color information is converted into a gray map. The gray scale image method is to convert the color image in the RGB space into a gray scale image, because the color of each pixel in the color image is determined by R, G, B components, and each component has 256 value conditions, so that one pixel point can have 1600 more than ten thousand color change ranges, the data volume is too large, which causes a certain burden to the calculation of a storage box, and one pixel point change range of the gray scale image has only 256 conditions, so that the human face image sample is converted into the gray scale image, the subsequent image processing calculation amount is greatly reduced, and simultaneously, the gray scale image can still reflect the distribution and the characteristics of the integral and local chromaticity and brightness levels of the whole image consistent with the color image. Extracting the characteristics of the preprocessed face image sample set to obtain an expression characteristic sample set, wherein the expression characteristic sample is used as an obvious and effective strategy in order to identify the type and the change condition of the face expression; introducing the expression characteristic sample set into an expression recognition classifier for learning training to obtain a trained expression recognition classifier, wherein the expression recognition classifier can adopt a BP neural network classifier, the BP neural network classifier is a learning model based on an artificial neural network structure, the network weight can be continuously modified by setting a plurality of layers of hidden layers and in a reverse back propagation way, therefore, the purpose of learning is achieved, 5000 human face pictures are adopted as a training sample level in the application, 5000 human face pictures are selected as a test sample set, in order to ensure the training effect, after each training, the training result is compared with the last training result, if the error is increased, the weight value is adjusted towards the negative direction, if the error is decreased, the weight value is adjusted towards the positive direction, therefore, the identification accuracy is continuously improved, and the training process is completed when the preset training times or the preset convergence condition is reached. And finally, putting the test sample into the trained expression recognition classifier, and evaluating the test sample by the expression recognition classifier.

In some embodiments of the present invention, a face image sample set is obtained, and the pre-processing of the face image sample set comprises converting color images of the face image sample set into grayscale images:

In this embodiment, since the color of each pixel in the color image is determined by R, G, B components, and each component has 256 value cases, such a pixel point can have 1600 or more tens of thousands of color change ranges, the data size is too large, which causes a certain burden on the calculation of the storage box, and the change range of a pixel point of the gray image is only 256 cases, so that the human face image sample is converted into the gray image, thereby greatly reducing the subsequent image processing calculation amount, and simultaneously, the gray image can still reflect the distribution and the characteristics of the overall and local chromaticity and brightness levels of the whole image consistent with the color image.

In some embodiments of the present invention, the facial feature samples obtained by extracting the features of the preprocessed facial image sample set are composed of facial geometric features.

In this embodiment, the geometric features of the human face refer to the position changes of various organs, eyes, eyebrows, nose, mouth, and the like on the human face, and further include the position changes of the canthus, eyebrow tip, and mouth corner.

In some embodiments of the invention, the geometric features of the human face include eyes, eyebrows, nose, and mouth.

In this embodiment, the eyes, eyebrows, nose, and mouth can significantly and effectively express facial expressions.

In some embodiments of the invention, the changes in expressive features include changes in the shape of the texture of the geometric features of the face.

In this embodiment, the texture analysis uses a spatial gray level co-occurrence matrix method, and obtains a matrix by calculating the number of times that two gray levels are adjacent in a certain direction in an image, where the direction generally includes horizontal, 45 °, 90 °, and 135 °.

In some embodiments of the present invention, the facial geometric feature extraction method of the expression feature samples of the expression feature sample set passes through the spatiotemporal features of the differential image.

In this embodiment, the data set of the facial expression video may extract a feature vector from a facial image of each frame in the obtained video, where the feature vector is used for geometric features of expression recognition, and refers to changes of several major organs on the face, and may directly perform a difference operation on each frame image and a neutral expression frame in a video sequence, and use the facial expression texture features extracted by the gray level co-occurrence matrix method, where the difference operation refers to performing matrix subtraction on gray level co-occurrence matrices of two frames of images, and then expanding a new gray level co-occurrence matrix into a vector in a row to obtain a facial geometric feature vector.

In some embodiments of the invention, the expression recognition classifier is a BP neural network classifier.

In this embodiment, the BP neural network is a learning model based on an artificial neural network structure, and the learning training purpose can be achieved by setting multiple hidden layers and continuously modifying the network weight in a reverse back propagation manner.

In some embodiments of the present invention, the BP neural network classifier performs linear combination in the BP neural network using the expression feature samples as input, and outputs each neuron through a nonlinear activation function, where each neuron obtains a calculation result.

In some embodiments of the present invention, the working process of the BP neural network classifier comprises two phases:

In some embodiments of the invention, the activation function:

where x is the actual output.

In this embodiment, in order to adjust the weight in the back propagation process, an error function is determined before the learning training, and the square sum of the difference between the actual output x and the expected output y is used to represent:

after a large amount of learning and training, the BP neural network reduces errors according to continuously adjusted network weights, and therefore expected effects are achieved, in order to guarantee accuracy, the BP neural network adopts a three-layer neural network structure, wherein an input layer is arranged, the input layer comprises 10 nodes and corresponds to a group of facial expression feature input vectors, a hidden layer comprises 10 nodes, and 9 nodes in total in the output layer represent output results of 9 expressions.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application.

In summary, the method for recognizing facial expressions based on vision provided by the embodiment of the application includes the following steps;

acquiring a face image sample set, and preprocessing the face image sample set;

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A vision-based facial expression recognition method is characterized by comprising the following steps;

acquiring a face image sample set, and preprocessing the face image sample set;

2. A vision-based facial expression recognition method according to claim 1, wherein the face image sample set is obtained, and the preprocessing of the face image sample set comprises converting color images of the face image sample set into grayscale images:

r, G, B is the color component of each color image pixel point, and g is the converted gray value.

3. The vision-based facial expression recognition method of claim 1, wherein the expression feature samples obtained by extracting the features of the preprocessed facial image sample set are composed of facial geometric features.

4. A vision-based facial expression recognition method as claimed in claim 3, wherein the geometric features of the human face include glasses, eyebrows, nose and mouth.

5. A visual-based facial expression recognition method as claimed in claim 3 or 4, wherein the changes in the expressive features include changes in the texture and shape of geometric features of the face.

6. The vision-based facial expression recognition method of claim 3, wherein the facial geometric feature extraction method of the expression feature samples of the expression feature sample set is based on the spatiotemporal features of the difference image.

7. The vision-based facial expression recognition method of claim 1, wherein the expression recognition classifier is a BP neural network classifier.

8. The vision-based facial expression recognition method of claim 7, wherein the BP neural network classifier takes expression feature samples as input, linear combination is performed in the BP neural network, each neuron outputs through a nonlinear activation function, and each neuron obtains a calculation result.

9. The vision-based facial expression recognition method of claim 8, wherein the working process of the BP neural network classifier comprises two stages:

10. A method for visual-based facial expression recognition as recited in claim 8, wherein the activation function:

where x is the actual output.