CN116631042A

CN116631042A - Expression image generation, expression recognition model, method, system and memory

Info

Publication number: CN116631042A
Application number: CN202310911547.6A
Authority: CN
Inventors: 范联伟; 高景银; 孙仁浩; 刘升; 王佐成; 洪日昌
Original assignee: Data Space Research Institute
Current assignee: Data Space Research Institute
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-08-22
Anticipated expiration: 2043-07-25
Also published as: CN116631042B

Abstract

The invention relates to the technical field of image data generation and computer models, in particular to an expression image generation and expression recognition model, an expression image generation and expression recognition method, an expression recognition system and a storage. According to the expression image generation method, the noise vectors are randomly generated and input into the pre-trained image coding model, the generator is used for mapping to obtain potential codes of the facial image, under the condition that the facial expression adjustment vectors are known, a selection strategy based on the adjustment steps is designed to generate high-quality facial expression image data, meanwhile, random sampling is carried out in the value interval of the adjustment steps, the diversity of the generated facial expression image data is guaranteed, and finally the facial expression image data is stored to form an expression image expansion data set. The invention realizes a high-efficiency high-precision expression image annotation data expansion method without manual annotation cost.

Description

Expression image generation, expression recognition model, method, system and memory

Technical Field

The invention relates to the technical field of image data generation and computer models, in particular to an expression image generation and expression recognition model, an expression image generation and expression recognition method, an expression recognition system and a storage.

Background

Most of the conventional methods use features of manually designed direction gradient Histogram (HOG), local Binary Pattern (LBP), scale Invariant Feature Transform (SIFT) and the like for facial expression recognition, however, these features often have difficulty in solving the influence caused by facial appearance changes, so that the model is not robust and accurate enough. Currently, deep learning-based methods are dominant in many computer vision tasks, which can obtain a robust algorithm model by using large-scale training data and strong characterization capability, but there are still some problems in the research of facial expression recognition algorithms: 1) The standard data set contains too little data, so that an effective model is difficult to train and popularize and apply; 2) The uncertainty of the expression is marked, and although a large number of facial images exist on the network, the expression itself has a certain uncertainty, so that the cost of manual marking is quite expensive; 3) The problem of unbalance exists in facial expression image data of different categories, so that the algorithm model is biased to the category with more data quantity during training, and recognition is influenced.

Disclosure of Invention

In order to overcome the defect that the accuracy of the expression recognition model in the prior art is limited by the quantity and the balance of the labeling data, the invention provides an expression image generation method which can provide a large quantity of facial expression labeling data.

The invention provides a method for generating an expression image, which comprises the following steps:

SA1, acquiring a pre-trained image coding model, wherein the image coding model comprises a generator and a synthesizer; the input of the generator is noise in n2 dimension, and the output is a matrix vector in n1 multiplied by n2 dimension corresponding to different facial expressions; the input of the synthesizer is a matrix vector with n1 multiplied by n2 dimensions, and the matrix vector is output as an expression image;

SA2, combining the image coding model to obtain n1 multiplied by n2 matrixes associated with different facial expressions as facial expression vectors associated with the facial expressions; giving an adjustment stride interval of each facial expression, and enabling a kth facial expression vector to be marked as f (k), wherein the adjustment stride interval of the kth facial expression is [ a (k), b (k) ]; f (k) is a matrix vector of dimension n1×n2; setting the initial value of the iteration times n, n as 1;

SA3, judging whether N is smaller than N; if yes, executing step SA4; if not, executing step SA7; n is the set number of facial expression formations for each type;

SA4, randomly generating an n 2-dimensional noise vector, inputting the noise vector into a generator, and obtaining an n1 multiplied by n 2-dimensional matrix vector output by the generator as a potential code f;

SA5, updating the potential code f by combining each facial expression vector; the updated code of f is marked as fnew (k) by combining the facial expression vector f (k), wherein the fnew (k) is identical with the f data structure, and the facial expressions corresponding to the fnew (k) and the f (k) are of the same category;

SA6, inputting the updated fnew (k) into a generator, and obtaining an expression image output by the generator and adding the expression image into an expression image data set associated with the kth facial expression; then, updating n to n+1, and returning to the step SA3;

SA7, outputting an expression image expansion data set, wherein the expression image expansion data set is a set of expression image data sets of all facial expressions, and an initial value of the expression image data set associated with each facial expression is an empty set.

Preferably, SA5 is specifically: the matrix vector of dimension n1×n2 is equivalent to the vector of dimension n1 n 2; the update method of fnew (k) is as follows: firstly, taking a random number on [ a (k), b (k) ], multiplying the random number by f (k), adding the product and a potential coding f element to obtain an intermediate matrix, and replacing n3 n 2-dimensional vectors in the intermediate matrix with corresponding n 2-dimensional vectors in f, wherein n3 is smaller than n1.

Preferably, the image coding model adopts a StyleGAN2 network; n1=18, n2=512, n3=8.

The expression image generation system provides a carrier for the expression image generation method, and provides a storage space for the expression image expansion data set, so that the direct application of the expression image expansion data set is facilitated. The expression image generation system comprises a generator, a synthesizer, a storage unit, a data expansion module and an expression image expansion data set, wherein the expression image expansion data set is provided with an expression image data set corresponding to each facial expression; the generator and the synthesizer are extracted from the pre-trained image coding model; the storage unit stores facial expression vectors and adjustment stride intervals corresponding to various facial expressions; the data expansion module is respectively connected with the generator, the synthesizer, the storage unit and the expression image expansion data set;

the data expansion module is used for combining the generator and the synthesizer to generate N batches of expression images, and storing different expression images in each batch into a corresponding expression image data set; when generating an nth expression image, the data expansion module updates the potential code f generated by the generator based on random noise and combines different facial expression vectors to obtain an updated code; the generator generates an expression image aiming at the updated codes under each facial expression and stores the expression image into a corresponding expression image data set; n is more than or equal to 1 and less than or equal to N.

According to the training method of the expression recognition model, provided by the invention, the annotation data for training the model is expanded by combining the expression image expansion data set, so that the model training precision is greatly improved, and the data annotation cost is reduced. Firstly, obtaining an expression image expansion data set by adopting the expression image generation method, and then combining the expression image expansion data set with a facial expression standard data set tool training image set; enabling a basic model to be trained to perform machine learning on a training image set, wherein the input of the basic model is an expression image, and the output of the basic model is a facial expression type; and taking the learned basic model as an expression recognition model.

Preferably, the category of the facial expression associated with the expression image extension dataset is contained within the category of the facial expression associated with the expression image extension dataset; the facial expression associated with the expression image expansion data set is consistent with the facial expression associated with the facial expression standard data set; alternatively, the categories of facial expressions associated with the expression image extension dataset are all included in the categories of facial expressions associated with the facial expression standard dataset.

Preferably, the categories of facial expressions associated with the expression image extension dataset include: neutral, happy, sad, surprise, fear, anger and aversion; neutral corresponding stride adjustment interval [9, 10], happy corresponding stride adjustment interval [6, 9], sad corresponding stride adjustment interval [18, 19], surprise corresponding stride adjustment interval [5, 6], fear corresponding stride adjustment interval [7, 9], anger and aversion corresponding stride adjustment interval [15, 17].

The invention provides an expression recognition method, which comprises the steps of firstly, acquiring an expression recognition model by adopting a training method of the expression recognition model; and then inputting the expression image to be recognized into an expression recognition model to acquire the output of the expression recognition model as a facial expression prediction result.

The invention also provides an expression recognition system and a memory, which provide a carrier for the expression recognition model and the expression recognition method, thereby facilitating popularization and application of the expression recognition model and the expression recognition method.

The invention provides an expression recognition system, which further comprises a storage module and a processing module, wherein a computer program and a facial expression standard data set are stored in the storage module, the processing module is connected with the storage module, and the processing module is used for executing the computer program to realize a training method of an expression recognition model and obtain the expression recognition model.

The memory is used for storing a computer program, and the computer program is used for realizing a training method of an expression recognition model when being executed to acquire the expression recognition model.

The invention has the advantages that:

(1) According to the expression image generation method, noise vectors are randomly generated and input into a pre-trained image coding model, potential codes of facial images under different facial expressions are obtained through mapping of a generator, under the condition that facial expression adjustment vectors are known, a selection strategy based on adjustment steps is designed to generate high-quality facial expression image data, meanwhile, random sampling is conducted in a value interval of the adjustment steps, the diversity of the generated facial expression image data is guaranteed, and finally the facial expression image data are stored to form an expression image expansion data set. The invention realizes a high-efficiency high-precision expression image annotation data expansion method without manual annotation cost.

(2) In the standard data set, the span of each type of facial expression is often quite large, certain noise data exist, convergence of an expression recognition model is not easy, and the effect is often not ideal in practical application; according to the expression image generation method, different facial expression images with high quality can be obtained by utilizing the stride adjustment strategy, meanwhile, the facial expression images with discriminant diversity can be obtained by adjusting in a certain range, and the facial expression image generation method is used for training the expression recognition model, so that the model is easier to converge, and in an actual scene, the actual requirements can be met, and better recognition accuracy can be obtained.

(3) According to the invention, the potential code f is updated through the known different facial expression vectors, and constraint replacement of matrix vector elements is adopted during updating, so that the updated matrix vector can generate expression images of different facial expression categories, and a large number of expression images marked with accurate facial expressions can be obtained without manual marking.

(4) In the embodiment, a StyleGAN2 network is adopted as an image coding model, only elements except the first 8 rows are updated when the potential coding f is updated according to each facial expression vector f (k), and a strategy for adjusting the expression amplitude is designed, so that the accuracy of the generated expression image on different categories is ensured, and a high-quality facial expression image is obtained. Meanwhile, the replacement elements of the constraint matrix vectors are used, the facial features are not changed, and grabbing of the expression category features and the expression amplitude features in the model training process is facilitated.

(5) The expression image generation system provided by the invention provides a carrier for the expression image generation method, so that the expression image generation method is convenient to popularize and apply.

(6) According to the training method of the expression recognition model, the training data set of the expression recognition model is expanded by adopting the expression image expansion data set. The expression image generated by the invention can update and adjust the amplitude of the expression change of the potential code f through the known matrix vector f (k), which is beneficial to grabbing expressions with different amplitudes in the model training process, thereby improving the recognition precision of the model on the expression. Meanwhile, the expression image expansion data set is expanded through the variation amplitude of the expression image, so that the model is favorable for fully learning the tiny features of different expressions below the similar surface, the grabbing capacity of the expression recognition model on the similar surface and tiny expression actions is improved, and the model training precision and the convergence speed are greatly improved.

(7) According to the invention, in the model training process, the expression images can be selected from the expression image expansion data set according to the requirements, so that the problem of unbalanced annotation data in the model training process is solved.

(8) The expression recognition method provided by the invention can realize high-precision expression recognition based on the expression recognition model provided by the invention.

Drawings

FIG. 1 is a flowchart of a method for generating an expression image;

FIG. 2 is a graph showing experimental model convergence in the examples.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment provides a method for generating an expression image based on a StyleGAN2 network, which comprises the following steps:

SA1, acquiring a pre-trained StyleGAN2 network, extracting a generator and a synthesizer of the StyleGAN2 network, wherein the input of the generator is noise in n2 dimension, and outputting a matrix vector in n1 multiplied by n2 dimension corresponding to various set facial expressions; the input of the synthesizer is a matrix vector with n1 multiplied by n2 dimensions, and the matrix vector is output as an expression image. Specifically, the generator in the StyleGAN2 network outputs a matrix vector of dimension n1×n2 of the set 7 facial expressions in combination with the input noise. The 7 facial expressions are: neutral (NE), happy (HA, happy), sad (SA, sad), surprise (SU, surrise), fear (FE, fear), gas (AN, anger), aversion (DI, disgust).

SA2, setting a plurality of facial expressions and corresponding facial expression vectors, and giving an adjustment stride interval of each facial expression, wherein the kth facial expression vector is marked as f (k), and the adjustment stride interval of the kth facial expression is [ a (k), b (k) ]; f (k) is a matrix vector of dimension n1×n2; setting the initial value of the iteration times n, n as 1;

specifically, n1=18, n2=512.

The facial expression vector is obtained by combining the trained image coding model, and specifically comprises the following steps of:

and inputting the random noise in the n2 dimension into a generator, and acquiring a matrix vector in the n1 multiplied by n2 dimension corresponding to each facial expression output by the generator as a facial expression vector.

SA3, judging whether N is smaller than N; if yes, executing step SA4; if not, go to step SA7.N is the set generation number of each facial expression and is also the algorithm circulation number.

SA5, combining the facial expression vectors f (k), and carrying out different updates on the potential codes f to obtain updated fnew (k); the matrix vector of dimension n1×n2 is equivalent to the vector of dimension n1 n 2; the method for obtaining fnew (k) is as follows: firstly, taking a random number on [ a (k), b (k) ], multiplying the random number by f (k), adding the product and a potential coding f element to obtain an intermediate matrix, replacing n3 n 2-dimensional vectors in the intermediate matrix with corresponding n 2-dimensional vectors in f, and taking the replaced intermediate matrix as fnew (k). n3< n1.

It can be seen that the obtaining of fnew (k) is equivalent to fixing the vectors of the first n3 n2 dimensions of the potential code f, replacing other vector elements with corresponding vector elements in the intermediate matrix, and the formula is:

fnew(k)[:n3] = (f+ sample(a(k),b(k)) × f(k))[:n3]

sample (a (k), b (k)) represents taking a random number on [ a (k), b (k) ]; n3 represents the first n3 dimensions over the n1 dimension of the update range; fnew (k) represents the matrix vector after f is updated in combination with the kth facial expression.

SA6, inputting the updated fnew (k) into a generator, and obtaining an expression image output by the generator and adding the expression image into an expression image data set associated with the kth facial expression; then let n update to n+1, return to step SA3.

In this step, for each updated fnew (k), an expression image is generated and added to the corresponding expression image dataset.

SA7, outputting an expression image expansion data set, wherein the expression image expansion data set is a set of expression image data sets of all facial expressions, and the initial value of the expression image data set of the facial expressions is an empty set.

The embodiment also provides an expression image generating system based on the StyleGAN2 network, which is used for executing the expression image generating method based on the StyleGAN2 network; the system comprises a generator, a synthesizer, a storage unit, a data expansion module and an expression image expansion data set, wherein the expression image expansion data set is provided with an expression image data set corresponding to each facial expression; the generator and the synthesizer are extracted from the pre-trained StyleGAN2 network; the storage unit stores facial expression vectors and adjustment stride intervals corresponding to various facial expressions; the data expansion module is respectively connected with the generator, the synthesizer, the storage unit and the expression image expansion data set.

The data expansion module is used for carrying out N iterations on the facial expression vectors, and the generator generates an expression image according to the facial expression vectors after each iteration and stores the expression image into a corresponding expression image data set; when n-th iteration is carried out on the facial expression, the generator takes the matrix vector with n1 multiplied by n2 dimension generated based on random noise as potential coding f, and then carries out the above-mentioned fnew (k) updating formula for each facial expression vector to obtain an updated facial expression vector; n is more than or equal to 1 and less than or equal to N.

The embodiment also provides a training method of the expression recognition model, which comprises the following steps:

SB1, acquiring an expression image data set marked in the field as a facial expression standard data set, and acquiring an expression image expansion data set by combining the expression image generation method based on the StyleGAN2 network;

SB2, constructing a training image set, wherein each facial expression in the training image set is associated with a plurality of expression images serving as training samples, part of the training samples associated with each facial expression is from a facial expression standard data set, and the other part of the training samples associated with each facial expression is from an expression image expansion data set;

SB3, constructing a basic model which is input into an expression image and output into a facial expression category, enabling the basic model to perform machine learning on the training image set, and enabling the basic model after learning to be used as an expression recognition model.

SB3 specifically comprises the following steps:

SB31, selecting n4 training samples from the training image set, and enabling the basic model to learn the n4 training samples so as to update parameters of the basic model; n4 is a set value;

SB32, selecting n5 samples from the facial expression standard data set as test samples, and enabling the basic model to label the test samples so as to obtain the labeling types of the test samples; calculating the mean square error loss by combining the labeling category and the true category of the test sample; n5 is a set value;

SB33, judging whether the basic model is converged; if not, updating the basic model by combining the mean square error loss, and returning to the step SB31; and if yes, outputting the basic model as an expression recognition model.

The expression recognition model provided by the invention is verified by combining a specific embodiment.

In this embodiment, the following 7 facial expressions and adjustment stride intervals corresponding to the various facial expressions are defined. Details are given in table 1 below.

Table 1: adjusting stride interval of facial expression

In this embodiment, n1=18, n2=512, and n3=8 are defined.

In this embodiment, first, corresponding facial expression vectors are obtained for 7 facial expressions in table 1, respectively, and then the above-mentioned expression image generation method is executed to obtain an expression image expansion dataset, which is denoted as GCEF. In this embodiment, the GCEF generates 5000 expression images for 7 facial expressions in table 1, respectively, and a total of 35000 expression images. The GCEF is divided into a training set and a testing set, wherein the training set comprises 31500 Zhang Biaoqing images, and each facial expression corresponds to 4500 images; the test set contains 3500 expression images, 500 for each facial expression.

In this embodiment, the selected facial expression standard dataset is a RAFDB containing 7 expressions of table 1. The RAFDB is divided into a training set and a testing set, wherein the training set comprises 12271 Zhang Biaoqing images, and the testing set comprises 3068 expression images.

In the embodiment, a mixed data set is also constructed, the mixed data set is divided into a training set and a testing set, the training set of the mixed data set is a set of a GCEF training set and a RAFDB training set, and 43771 training samples are counted by the training set of the mixed data set; the test set of the mixed dataset was a collection of the test set of GCEF and the test set of RAFDB, totaling 6568 test samples.

In this embodiment, the basic model is selected as the convolutional neural network model SCN.

In this embodiment, the training set in the hybrid data set is made to be the training image data set, the above-mentioned training method for the expression recognition model is executed, and the obtained expression recognition model is recorded as the experimental model.

In this embodiment, two types of contrast models are additionally provided, wherein the contrast model 1 is obtained by performing machine learning on a training set in the surface image expansion data set GCEF through a basic model; the comparative model 2 is obtained by machine learning a training set in the facial expression standard dataset RAFDB through a base model.

In this embodiment, the experimental model, the comparative model 1, and the comparative model 2 all employ a learning rate of 0.0001 in the learning process.

In this embodiment, the accuracy of the experimental model on the training set and the test set of the mixed dataset, the accuracy of the comparative model 1 on the training set and the test set of the expression image extension dataset GCEF, and the accuracy of the comparative model 2 on the training set and the test set of the facial expression standard dataset being the RAFDB are shown in table 2.

Table 2: precision comparison of three models

As can be seen from the combination of Table 2, the expression recognition model provided by the invention has higher accuracy and better effect on a given data set.

In this embodiment, the convergence curve of the experimental model is further counted, as shown in fig. 2. As can be seen from fig. 2, the experimental model can converge about 10 iterations, and compared with tens or hundreds of iterations in the prior art, the method greatly improves the convergence rate of the expression recognition model, saves the model training time, and ensures the model accuracy.

It will be understood by those skilled in the art that the present invention is not limited to the details of the foregoing exemplary embodiments, but includes other specific forms of the same or similar structures that may be embodied without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

The technology, shape, and construction parts of the present invention, which are not described in detail, are known in the art.

Claims

1. The expression image generation method is characterized by comprising the following steps of:

2. The expression image generation method of claim 1, wherein SA5 specifically comprises: the matrix vector of dimension n1×n2 is equivalent to the vector of dimension n1 n 2; the update method of fnew (k) is as follows: firstly, taking a random number on [ a (k), b (k) ], multiplying the random number by f (k), adding the product and a potential coding f element to obtain an intermediate matrix, and replacing n3 n 2-dimensional vectors in the intermediate matrix with corresponding n 2-dimensional vectors in f, wherein n3 is smaller than n1.

3. The expression image generation method of claim 2, wherein the image coding model adopts a StyleGAN2 network; n1=18, n2=512, n3=8.

4. An expression image generating system adopting the expression image generating method according to any one of claims 1 to 3, characterized by comprising a generator, a synthesizer, a storage unit, a data expansion module and an expression image expansion dataset, wherein the expression image expansion dataset is provided with an expression image dataset corresponding to each facial expression; the generator and the synthesizer are extracted from the pre-trained image coding model; the storage unit stores facial expression vectors and adjustment stride intervals corresponding to various facial expressions; the data expansion module is respectively connected with the generator, the synthesizer, the storage unit and the expression image expansion data set;

5. A training method of an expression recognition model using the expression image generation method according to any one of claims 1 to 3, characterized in that firstly, an expression image expansion dataset is obtained using the expression image generation method, and then an expression image expansion dataset and a facial expression standard dataset tool training image set are combined; enabling a basic model to be trained to perform machine learning on a training image set, wherein the input of the basic model is an expression image, and the output of the basic model is a facial expression type; and taking the learned basic model as an expression recognition model.

6. The method of training an expression recognition model of claim 5, wherein the category of facial expression associated with the expression image expansion dataset is contained within the category of facial expression associated with the expression image expansion dataset; the facial expression associated with the expression image expansion data set is consistent with the facial expression associated with the facial expression standard data set; alternatively, the categories of facial expressions associated with the expression image extension dataset are all included in the categories of facial expressions associated with the facial expression standard dataset.

7. The method of claim 5, wherein the categories of facial expressions associated with the expression image expansion dataset include: neutral, happy, sad, surprise, fear, anger and aversion; neutral corresponding stride adjustment interval [9, 10], happy corresponding stride adjustment interval [6, 9], sad corresponding stride adjustment interval [18, 19], surprise corresponding stride adjustment interval [5, 6], fear corresponding stride adjustment interval [7, 9], anger and aversion corresponding stride adjustment interval [15, 17].

8. An expression recognition method using the expression image generation method according to any one of claims 1 to 3, characterized in that an expression recognition model is obtained first using the training method of an expression recognition model according to claim 5; and then inputting the expression image to be recognized into an expression recognition model to acquire the output of the expression recognition model as a facial expression prediction result.

9. An expression recognition system adopting the expression image generation method according to any one of claims 1 to 3, further comprising a storage module and a processing module, wherein a computer program and a facial expression standard dataset are stored in the storage module, the processing module is connected with the storage module, and the processing module is used for executing the computer program to realize the training method of the expression recognition model according to any one of claims 5 to 7, and acquiring the expression recognition model.

10. A memory, wherein a computer program is stored, which when executed, is adapted to implement the method for training an expression recognition model according to claim 5, and to obtain the expression recognition model.