CN112381019B

CN112381019B - Compound expression recognition method and device, terminal equipment and storage medium

Info

Publication number: CN112381019B
Application number: CN202011304521.8A
Authority: CN
Inventors: 易苗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-11-09
Anticipated expiration: 2040-11-19
Also published as: CN112381019A; WO2022105130A1

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a compound expression recognition method, a device, terminal equipment and a storage medium, wherein the method comprises the following steps: identifying an image to be identified by using a first expression identification model, a first target model and a second target model, and respectively obtaining a first probability value, a first composite probability value and a second composite probability value of each composite expression; acquiring a first misclassification probability of the first expression recognition model, a first composite misclassification probability of each first target model and a second composite misclassification probability of each second target model; and obtaining a target classification result according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value and the second composite misclassification probability. By the aid of the composite expression recognition method, when the image to be recognized contains composite expressions, the master expression and the slave expression in the image to be recognized can be accurately predicted by combining prediction results and misclassification probabilities of a plurality of expression recognition models.

Description

Compound expression recognition method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a composite expression recognition method and device, terminal equipment and a storage medium.

Background

Expression recognition has been developed for decades as an important field of human-computer interaction and is widely used in many fields. However, because of the diversity of facial expression features and the difference of different individual expressions, expression recognition is still a big problem in the visual field. The composite expression recognition requires that the primary expression and the secondary expression are recognized at the same time, and the diversity of expression combinations and the indistinguishability of the primary expression and the secondary expression make the composite expression recognition difficult.

In the existing method, expression recognition is mostly used as a classification task of a face picture, feature extraction and expression classification are carried out on the picture, and the method has a good effect on partial expressions with obvious features such as happiness, surprise and the like on the expression recognition task of a single picture. However, it is difficult to distinguish a plurality of composite expressions with similar characteristics, such as sadness, aversion, etc., and it is difficult to accurately identify a primary expression and a secondary expression in the composite expressions.

Disclosure of Invention

The embodiment of the application provides a composite expression recognition method and device, terminal equipment and a storage medium, and can solve the problem that the main expression and the secondary expression in the composite expression are difficult to accurately recognize in the prior art.

In a first aspect, an embodiment of the present application provides a compound expression recognition method, including:

identifying the compound expressions in the image to be identified by using a first expression identification model to obtain first probability values of the compound expressions which are respectively in one-to-one correspondence;

determining a predicted composite expression according to the maximum value of the first probability value, determining a corresponding first target model in a second expression recognition model set based on the predicted composite expression, and inputting the image to be recognized into the first target model to obtain a first composite probability value for predicting the image to be recognized as a first composite expression; each first target model respectively corresponds to two predicted compound expressions;

inputting the image to be recognized into a third expression recognition model, and predicting a plurality of target single expressions contained in the image to be recognized;

determining a corresponding second target model in the second expression recognition model set according to the target single expressions, and inputting the image to be recognized into the second target model to obtain a second composite probability value for predicting the image to be recognized as a second composite expression; in the two predicted compound expressions respectively corresponding to each second target model, the predicted compound expressions can be respectively obtained by correspondingly combining the plurality of target single expressions one by one;

acquiring a first misclassification probability corresponding to the first expression recognition model, a first composite misclassification probability corresponding to each second expression recognition model, and a second composite misclassification probability corresponding to each second expression recognition model;

and obtaining a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value and the second composite misclassification probability.

In an embodiment, the obtaining a first misclassification probability corresponding to the first expression recognition model includes:

acquiring a plurality of training images corresponding to a plurality of compound expressions in training data, and inputting the training images into the first expression recognition model to obtain a prediction result of each training image;

counting the error number of the error of the prediction result in each compound expression;

and calculating a first misclassification probability of the first expression recognition model when the first expression recognition model predicts each compound expression based on the total number of the training images corresponding to each compound expression and the error number.

In an embodiment, the obtaining a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability includes:

adjusting a first composite probability value of the image to be recognized, predicted by the first target model, as a non-first composite expression and a second composite probability value of the image to be recognized, predicted by the second target model, as a non-second composite expression to preset values;

calculating a classification value corresponding to each compound expression in the image to be recognized according to a first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, the second composite misclassification probability and the preset value corresponding to the compound expression of the same category;

and determining the maximum value of the plurality of classification values, and taking the composite expression corresponding to the maximum value as a target classification result of the image to be recognized.

In one embodiment, the images to be recognized comprise a plurality of images, and the images to be recognized all belong to the same compound expression category; the obtaining a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability includes:

obtaining a target classification result of each image to be recognized in the plurality of images to be recognized with the same compound expression;

obtaining the classification number of the same target classification result from a plurality of target classification results;

and determining the target classification result with the maximum classification quantity as the final target classification result of the plurality of images to be recognized.

In an embodiment, before obtaining a classification result of each image to be recognized in the plurality of images to be recognized of the same compound expression, the method further includes:

performing key point clustering processing on a plurality of images to be identified including multi-class compound expressions to obtain key point characteristic information of each image to be identified;

and taking the plurality of images to be identified with the same key point characteristic information as a plurality of images to be identified with the same compound expression to obtain a plurality of images to be identified with each type of the same compound expression.

continuously acquiring multiple frames of adjacent video images from a preset video;

and determining the multi-frame video images as a plurality of images to be identified with the same compound expression.

In an embodiment, after obtaining the target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability, the method further includes:

and uploading the target classification result to a block chain.

In a second aspect, an embodiment of the present application provides a compound expression recognition apparatus, including:

the first prediction module is used for identifying the compound expressions in the image to be identified by utilizing the first expression identification model to obtain first probability values of various compound expressions which are respectively in one-to-one correspondence;

the first compound prediction module is used for determining a predicted compound expression according to the maximum value of the first probability value, determining a corresponding first target model in a second expression recognition model set based on the predicted compound expression, and inputting the image to be recognized into the first target model to obtain a first compound probability value for predicting the image to be recognized as the first compound expression; each first target model respectively corresponds to two predicted compound expressions;

the single expression prediction module is used for inputting the image to be recognized into a third expression recognition model and predicting a plurality of target single expressions contained in the image to be recognized;

the second composite prediction module is used for determining a corresponding second target model in the second expression recognition model set according to the target single expressions, and inputting the image to be recognized into the second target model to obtain a second composite probability value for predicting the image to be recognized as a second composite expression; in the two predicted compound expressions respectively corresponding to each second target model, the predicted compound expressions can be respectively obtained by correspondingly combining the plurality of target single expressions one by one;

the obtaining module is used for obtaining a first misclassification probability corresponding to the first expression recognition model, obtaining a first composite misclassification probability corresponding to each second expression recognition model, and obtaining a second composite misclassification probability corresponding to each second expression recognition model;

and the identification module is used for obtaining a target classification result of the image to be identified according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value and the second composite misclassification probability.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the method according to any one of the above first aspects.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method according to any one of the above first aspects.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.

In the embodiment of the application, a first probability value of each compound expression is predicted through a first expression recognition model and is used as a classification result; then, based on the prediction result of the first expression recognition model, determining a first target model, predicting the image to be recognized again, and taking the obtained first composite probability value as another classification result; and then, performing single expression recognition through a third expression recognition model which only predicts the target single expression in the image to be recognized, determining a second target model to predict the image to be recognized again based on the prediction result of the third expression recognition model, and taking the obtained second composite probability value as a further classification result. And finally, integrating the three classification results and the misclassification probability corresponding to each compound expression in the three classification results, calculating the prediction probability value of each compound expression, and determining a target classification result from the multiple compound expressions according to the prediction probability values. Therefore, when a composite expression recognition task is aimed at, the prediction results of the multiple expression recognition models can be integrated to serve as a basis for preliminarily recognizing the composite facial expression, and on the basis, the misclassification probability corresponding to each prediction result is combined to serve as correction information to correct the prediction results to obtain target classification results, so that the accuracy rate of recognizing the composite facial expression is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart illustrating an implementation of a compound expression recognition method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating an implementation manner of S105 of a compound expression recognition method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating an implementation manner of S106 of a compound expression recognition method according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating another implementation manner of S106 of a compound expression recognition method according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating another implementation manner of S106 of a compound expression recognition method according to an embodiment of the present application;

fig. 6 is a block diagram illustrating a composite expression recognition apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

The compound expression recognition method provided by the embodiment of the application can be applied to terminal devices such as a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), and the like, and the embodiment of the application does not limit the specific types of the terminal devices.

Referring to fig. 1, fig. 1 shows a flowchart of an implementation of a method for recognizing a compound expression provided in an embodiment of the present application, where the method includes the following steps:

s101, recognizing the compound expressions in the image to be recognized by using the first expression recognition model to obtain first probability values of various compound expressions which are respectively in one-to-one correspondence.

The image to be recognized is a face image containing a face, so that the facial expression can be recognized from the face image. The compound expression in the face image can be understood as that the face expression simultaneously contains at least two expressions, and the compound expression can be regarded as a master-slave expression having a master-slave relationship. The master-slave expressions are expressed by which expression the primary expression is and which expression the secondary expression is, among the plurality of expressions included in the face image. It is understood that for two expressions, the primary expression and the secondary expression each occupy one. For more than two expressions, the main expression in the face image can be considered to be clearly distinguished from the rest of the expressions, and at this time, the rest of the expressions can be considered to be secondary expressions. For convenience of explanation, the present embodiment is explained with an image to be recognized including two expressions.

It should be added that the facial expressions include, but are not limited to, happy, surprised, hated, frightened, sad, happy, surprised, and natural, and any combination of the two facial expressions can be considered as a compound expression. However, it should be noted that the compound expressions with the main expression being happy and the slave expressions being surprised belong to different master and slave expressions (compound expressions) with the main expression being surprised (corresponding to the slave expressions) and the slave expressions being happy (corresponding to the master expressions).

In application, the first expression recognition model is a model obtained by training according to first training data. The first training data can be regarded as a face image with a composite expression composed of the eight single expressions. In addition, for the 7 expressions of happy feeling, surprise, disgust, fear, sadness, happiness and surprise, 42 composite expressions can be correspondingly combined. In addition, the first training data may further include facial images of 8 single expressions, which constitute facial images including 50 compound expressions. In this case, the single expression may be considered as a composite expression in which the master expression and the slave expression are both consistent, and this is not limited.

In a specific application, the first training data may be obtained from a composite expression competition data set, where the composite expression competition data set includes 31250 expression pictures, the composite expression competition data set includes 125 individuals, and each expression includes 5 pictures, and each of the 50 composite expression pictures of each individual. For the above data set, the data set may be divided into 20650 pictures in total for 83 individuals as a training set (first training data), 2250 pictures in total for 9 individuals as a verification set, and the remaining 33 individuals as a test set. Then, a residual error network model is used as a basic network, first training data are input into the residual error network model for training, in the whole training process, the residual error network can extract the features of the face area to obtain 512-dimensional face features, and the face features and the coordinate normalization values of 136-dimensional key points (key points such as eyes and a nose of a face) are collected to be used as composite expression features in the composite expression picture. And then, sending the composite expression features into a classification layer, outputting a classification result, calculating a classification loss according to the actual composite expression result, and iteratively updating a residual error network model according to the classification loss to obtain a first expression recognition model. Wherein, when calculating the classification loss, the cross entropy can be adopted as the loss function. The cross entropy can represent the distance between the actual output (probability) and the expected output (probability), that is, the smaller the value of the cross entropy is, the closer the two probability distributions are, so that the accuracy of the first expression recognition model after iterative updating for recognizing the compound expression is high. For the first training data, the real label of each composite expression picture can be labeled in advance, that is, the probability (expected output) of which type of composite expression each composite expression belongs to specifically.

In application, after the first expression recognition model is obtained, the image to be recognized is input into the first model, and the obtained first probability values are multiple. That is, for 50 compound expressions, the first expression recognition model may output 50 first probability values, each of which is a numerical value at which the first expression recognition model predicts that the image to be recognized is the compound expression of the type.

S102, determining a predicted composite expression according to the maximum value of the first probability value, determining a corresponding first target model in a second expression recognition model set based on the predicted composite expression, and inputting the image to be recognized into the first target model to obtain a first composite probability value for predicting the image to be recognized as the first composite expression; each first target model corresponds to two kinds of predicted compound expressions respectively.

In application, the second expression recognition model is obtained by training according to second training data. Wherein the second training data is obtainable based on the first training data. Specifically, for the master expression and the slave expression in each of the compound expressions, if the master expression and the slave expression in the compound expression are opposite to the master expression and the slave expression in the other compound expressions, a second expression recognition model can be trained according to the training data corresponding to the two opposite compound expressions. Thus, a second expression recognition model set comprising a plurality of second expression recognition models is obtained. Illustratively, for a compound expression with a primary expression being happy and a secondary expression being surprised, and a compound expression with a primary expression being surprised and a secondary expression being happy, expression pictures corresponding to the two compound expressions can be acquired from the compound expression competition dataset, and specific primary and secondary expression labels (the surprised and surprised labels) of each expression picture are determined as second training data, so that a binary classification model (a second expression recognition model) about the compound expression with the surprised and the surprised can be trained. Therefore, the obtained second expression recognition model is only used for predicting the master-slave relationship between worry and surprise in the compound expression of the image to be recognized.

Specifically, for the 42 composite expressions obtained by the combination, according to the above description, the 42 composite expressions can be combined into 21 second training data, and then 21 second expression recognition models can be trainable. Each second expression recognition model is a binary model and is used for predicting the master expression and the slave expression in the expression picture. For example, a composite expression picture composed of the master expression a and the slave expression B, and a composite expression picture composed of the master expression B and the slave expression a may be used as a second training data to train to obtain a second expression recognition model related to AB master-slave expression classification. And then, when the second expression recognition model is used for recognizing the image to be recognized, a first composite probability value that the composite expression in the image to be recognized is the master expression A and the slave expression B and/or a first composite probability value that the composite expression in the image to be recognized is the master expression B and the slave expression A can be obtained. In addition, for 8 single expressions, it can be considered that the single expression is a composite expression with the primary expression and the secondary expression both consistent, and thus 29 second expression recognition models are formed.

In application, for the obtained first probability values corresponding to the multiple compound expressions, the compound expression corresponding to the largest first probability value can be used as the predicted compound expression according to the size of the first probability value. In addition, each second expression recognition model is respectively used for recognizing one type of compound expressions in the image to be recognized. Therefore, the predicted compound expression also belongs to a category of compound expression recognition performed by a second expression recognition model, and the second expression recognition model can be used as the first target model. The first compound expression predicted by the first target model may be the same as or opposite to the predicted compound expression of the first expression recognition model, and this is not limited. It should be noted that the first target model predicts the image to be recognized, the obtained prediction result is the first compound expression, and the numerical value for predicting the image to be recognized as the first compound expression is the first compound probability value.

S103, inputting the image to be recognized into a third expression recognition model, and predicting a plurality of target single expressions contained in the image to be recognized.

In application, the third emotion recognition model is obtained by training according to third training data. Wherein the third training data is obtainable based on the first training data. Specifically, if the master expression and the slave expression in the compound expression are opposite to the master expression and the slave expression in the rest compound expressions, the two compound expressions are used as a new compound expression category. Therefore, a plurality of new compound expression categories can be obtained. Illustratively, for a compound expression with a primary expression of happy feeling and a secondary expression of surprise, the compound expression with the primary expression of surprise and the secondary expression of happy feeling is realized. The two composite expressions can be used as third training data, and the master-slave relationship of the composite expressions in the third training data is not considered when the third expression recognition model is trained, namely, only a plurality of single expressions correspondingly contained in each expression picture are marked. Thus, the third training data including 29 types of data without considering the master-slave relationship in the compound expression can be obtained. The training data of 21 types in S102 and the training data of 8 types with the same master-slave expression may be combined into the third training data. And then, obtaining the expression picture corresponding to each new compound expression from the compound expression competition data set as third training data again, and training a third expression recognition model. At this time, there is only one third emotion recognition model, and the third emotion recognition model is only used for predicting the target single expression contained in the image to be recognized.

It can be understood that, when the third expression recognition model performs expression recognition on the image to be recognized, the third probability values corresponding to the composite expressions formed by every two single expressions can be predicted. That is, the third expression recognition model only outputs the probability value of the composite expression of which two single expressions the image to be recognized is composed of, and does not consider the master-slave relationship between the two single expressions, which is different from the first probability value representing each composite expression (the specific master-slave relationship between the master expression and the slave expression). Therefore, the third expression recognition model can output 29 third probability values, and based on this, the single expression included in the composite expression corresponding to the maximum third probability value can be taken as the target single expression from the 29 third probability values. For example, if the third emotion recognition model predicts that the third probability value of happy and surprised in the compound expression is the maximum, the happy and surprised expressions can be taken as the target single expression respectively.

S104, determining a corresponding second target model in the second expression recognition model set according to the target single expressions, and inputting the image to be recognized into the second target model to obtain a second composite probability value for predicting the image to be recognized as a second composite expression; and in the two predicted compound expressions corresponding to each second target model, the two predicted compound expressions can be obtained by respectively combining the plurality of target single expressions in a one-to-one correspondence manner.

In application, in S102, it has been described that each second expression recognition model is respectively used for recognizing a type of compound expression in the image to be recognized. Therefore, according to the composite expression composed of the target single expression, the consistent composite expression can be determined from the composite expressions correspondingly recognized by each second expression recognition model to serve as the second target model. Namely, two predicted compound expressions for identifying the image to be identified by each second target model can be obtained by correspondingly combining a plurality of target single expressions one by one.

For example, when it is determined that a plurality of target single expressions are surprised and happy respectively, a second expression recognition model for performing second classification on happy and surprised master and slave expressions in the compound expression, that is, a second target model, may be determined. Identifying the image to be identified again by using a second target model to obtain a probability value that the output main expression is happy and the slave expression is surprised, namely a second composite probability value; or, the output main expression is surprise, and the probability value of the slave expression being happy is the second composite probability value.

S105, acquiring a first misclassification probability corresponding to the first expression recognition model, acquiring a first composite misclassification probability corresponding to each second expression recognition model, and acquiring a second composite misclassification probability corresponding to each second expression recognition model.

In application, the first misclassification probability is a probability that the first expression recognition model predicts a wrong prediction for the master expression and the slave expression in each compound expression respectively. Specifically, after the first expression recognition model is obtained through training, the first expression recognition model can be determined by using training data in a test set. For example, for 50 compound expressions, each compound expression comprises 5 expression pictures, the 5 expression pictures corresponding to each compound expression are predicted through the first expression recognition model, and the number of pictures with accurate prediction in each compound expression is counted. And then, calculating the misclassification probability of the first expression recognition model for recognizing each compound expression according to the number and the total number (5) of the pictures with accurate prediction. Therefore, after the first training model is trained, the misclassification probability of the first expression recognition model corresponding to each compound expression can be determined in the above mode. The specific calculation formula may be: y is 1-a_ij. Wherein i is 1, 2, 3, j is a number between 1 and 50; when i is equal to 1, a_ijThe classification accuracy rate of predicting the jth composite expression in the first expression recognition model is represented; when i is equal to 2, a_ijThe classification accuracy rate of the second expression recognition model for predicting j-type compound expressions on the basis of the prediction result of the first expression recognition model is expressed; and when i equals 3, a_ijAnd the classification accuracy of the second expression recognition model for predicting j-type compound expressions on the basis of the prediction result of the third expression recognition model is shown. Wherein, the classification accuracy is the number of samples predicted to be correct/the total number of samples.

It can be understood that the second misclassification probability is that the first expression recognition model is used for conducting expression prediction on the image to be recognized of the compound expression, and on the basis of the prediction result of the first expression recognition model, the second model corresponding to the corresponding compound expression is used for conducting prediction again. And calculating a first compound misclassification probability when each second expression recognition model recognizes the corresponding compound expression. Similarly, a second composite misclassification probability for predicting the corresponding composite expression by each second expression recognition model on the basis of the prediction result of the third expression recognition model is not described in detail. In addition, the first misclassification probability, the first composite misclassification probability and the second composite misclassification probability can be obtained after the expression recognition model is trained, and are stored in the terminal device, so that the terminal device can be called at any time.

S106, obtaining a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value and the second composite misclassification probability.

In application, the target classification result is a final prediction result of the image to be recognized, namely a master-slave expression in the compound expression is finally predicted. Specifically, the formula for calculating the target classification result is as follows: 1 ═ Σ (1-a)_1j)l_1j+(1-a_2j)l_2j+(1-a_3j)l_3jJ is 1, 2. Wherein, a_ijFor explanation of (i ═ 1, 2, 3), reference may be made specifically to S105 described above. l_ijDescription of i in (i ═ 1, 2, 3) and a_ijWhere i is interpreted consistently, it is exemplary that for i equal to 1, l_1jAnd the first probability value is used for representing the first probability value when the first expression recognition model predicts that the image to be recognized is the j-th type compound expression. Whereby l can be determined_2jAnd l_3jWill not be described in detail. It should be noted that, when the image to be recognized is recognized by using the binary classification model (the first target model) corresponding to the compound expression, the obtained prediction result is only the first compound probability value for predicting that the image to be recognized belongs to the class of the master-slave compound expression. For example, if the first target model predicts that the first composite probability value of the composite expression AB is 1, the first composite probability value of the composite expression BA is 0. At this time, since the first target model does not output the first composite probability values corresponding to the remaining 48 types of composite expressions, it is necessary to calculate the first composite probability values of the remaining 48 types of composite expressions (AC, CA, AD, DA..) all at 0 when participating in the above calculation. For the same reason, for the second iterationResultant probability value l_3jWith a first composite probability value l_2jThe processing method of (3) is consistent, and the detailed description is omitted.

It can be understood that, according to the above formula, 50 prediction probability values corresponding to 50 types of compound expressions of the predicted image to be recognized can be obtained through the three expression recognition models. Then, the maximum value of the 50 predicted probability values can be used as a target probability value, and the composite expression (composite expression with master-slave relationship) corresponding to the target probability value is used as a final target classification result of the image to be recognized.

In the embodiment, a first probability value of each compound expression is predicted through the first expression recognition model as a classification result; then, based on the prediction result of the first expression recognition model, determining a first target model, predicting the image to be recognized again, and taking the obtained first composite probability value as another classification result; and then, performing single expression recognition through a third expression recognition model which only predicts the target single expression in the image to be recognized, determining a second target model to predict the image to be recognized again based on the prediction result of the third expression recognition model, and taking the obtained second composite probability value as a further classification result. And finally, integrating the three classification results and the misclassification probability corresponding to each compound expression in the three classification results, calculating the prediction probability value of each compound expression, and determining a target classification result from the multiple compound expressions according to the prediction probability values. Therefore, when a composite expression recognition task is aimed at, the prediction results of the multiple expression recognition models can be integrated to serve as a basis for preliminarily recognizing the composite facial expression, and on the basis, the misclassification probability corresponding to each prediction result is combined to serve as correction information to correct the prediction results to obtain target classification results, so that the accuracy rate of recognizing the composite facial expression is further improved.

Referring to fig. 2, in an embodiment, the step S105 of obtaining the first misclassification probability of the first expression recognition model for predicting each type of compound expression further includes the following substeps S1051-S1053, which are detailed as follows:

s1051, acquiring a plurality of training images corresponding to a plurality of compound expressions in training data, and inputting the training images into the first expression recognition model to obtain a prediction result of each training image.

In application, the first expression recognition model is obtained by training through first training data, and the training data does not contain the first training data in order to ensure the accuracy of the first misclassification probability when the first expression recognition model predicts each compound expression. Namely, the multiple training images of each compound expression in the test set in the S101 data set can be used to perform compound expression recognition.

S1052, counting the error number of the prediction result error in each compound expression.

In application, when the prediction result is wrong, namely the training image of the compound expression in the first prediction model prediction test set, the prediction result (predicted compound expression) is inconsistent with the actual compound expression of the training image. It should be noted that the number of the training images of each compound expression may be the same or different. In this embodiment, in order to make the first misclassification probability of the first expression recognition model when predicting each compound expression fairer, the number of training images of each compound expression may be the same.

S1053, calculating a first misclassification probability of the first expression recognition model when predicting each compound expression based on the total number of the training images corresponding to each compound expression and the error number.

In application, the first misclassification probability of the first expression recognition model for predicting each compound expression is calculated, and specific reference may be made to the formula and the explanation of the calculation of the first misclassification probability in S105, which will not be described in detail.

Referring to fig. 3, in an embodiment, the step S106 of obtaining the target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability further includes the following sub-steps S1061-S1063, which are detailed as follows:

s1061, adjusting a first composite probability value of the image to be recognized, predicted by the first target model, as a non-first composite expression and a second composite probability value of the image to be recognized, predicted by the second target model, as a non-second composite expression to preset values.

In application, it has been described above that the first target model and the second target model are both binary models, and only probability values of two complex expressions can be output. At this time, the first target model does not output the first composite probability values of the remaining 48 composite expressions (non-first composite expressions). Therefore, in order to calculate the predicted probability value of each compound expression, the first compound probability values of the remaining 48 compound expressions can be set to be 0 (preset value). The preset value may be set by the user according to actual conditions, and refer to the example description of the first composite probability value in S106. Similarly, the second composite probability values of the images to be recognized predicted by the second target model and not the second composite expression can be adjusted to preset values. Based on this, the first composite probability values (the first composite probability value corresponding to the first composite expression and the first composite probability value corresponding to the non-first composite expression) predicted by the first target model will have 50, each corresponding to one type of composite expression. Similarly, there will be 50 corresponding second composite probability values (second composite probability value corresponding to the second composite expression, and second composite probability value corresponding to the non-second composite expression) predicted by the second target model, each corresponding to a type of composite expression.

S1062, calculating a classification value corresponding to each compound expression in the image to be recognized according to a first probability value, the first misclassification probability, the first compound probability value, the first compound misclassification probability, the second compound probability value, the second compound misclassification probability and the preset value corresponding to the same class of compound expressions.

In application, the above calculation formula and explanation for calculating the classification value corresponding to each compound expression in the image to be recognized may specifically refer to the calculation formula and the corresponding explanation in S106, and will not be described in detail. It is understood that the classification value corresponding to each compound expression is the value of l in the above calculation formula of S106.

S1063, determining the maximum value of the classification values, and taking the composite expression corresponding to the maximum value as a target classification result of the image to be recognized.

In application, the classification value is a numerical value comprehensively predicted based on the three-type expression recognition model, the maximum value of the classification value corresponding to each predicted compound expression can be determined from the multiple classification values, and the compound expression corresponding to the maximum value is used as a target classification result closest to the real compound expression of the image to be recognized, so that the accuracy of recognizing the compound expression of the image to be recognized is improved.

Referring to fig. 4, in an embodiment, the image to be recognized includes a plurality of images, and the plurality of images to be recognized all belong to the same compound expression category; s106 obtains a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability, and further includes the following sub-steps S1064-S1066, which are detailed as follows:

s1064, obtaining a target classification result of each image to be recognized in the plurality of images to be recognized with the same compound expression.

In application, the multiple images to be recognized with the same compound expression can be video images of multiple continuous frames in a video clip, or continuously shot character pictures. In practical situations, for a piece of video containing a person, the change of the expression of the person in the video images of a plurality of continuous frames of the video is very small. Therefore, the expression of a person in video images of consecutive frames can be generally regarded as the same kind of compound expression. Furthermore, the target classification result of each image to be recognized in a plurality of images to be recognized with the same compound expression can be obtained through the compound expression recognition method.

In application, the number of frames of the continuous multiple frames may be a number set by a user according to actual conditions, for example, 5 images to be recognized with the same compound expression. The expression invariance of the character image in the video images based on the continuous multiple frames of the video and the misclassification probabilities comprehensively predicted by the multiple expression recognition models enable the composite expression recognition method to further improve the accuracy of the composite expression recognition of the characters when multiple images of the same composite expression category are recognized.

And S1065, acquiring the classification number of the same target classification result from the multiple target classification results.

S1066, determining the target classification result with the maximum classification quantity as the final target classification result of the plurality of images to be recognized.

In application, although the change of the expression of the person in the video images of the consecutive frames is very small, different compound expressions may be predicted for each frame of video image after the processing by the compound expression recognition method. Therefore, for a plurality of images to be recognized with the same type of compound expression, the classification number of the same target classification result can be counted, and the target classification result with the largest classification number is determined as the final target classification result (the final predicted compound expression) of the plurality of images to be recognized. Therefore, the accuracy of predicting a plurality of images to be recognized of the same compound expression category can be improved.

Referring to fig. 5, in an embodiment, before the step S1064 obtains the classification result of each image to be recognized in the plurality of images to be recognized with the same compound expression, the following steps S1064a-S1064b are further included, which are detailed as follows:

s1064a, performing key point clustering processing on a plurality of images to be identified including multi-class compound expressions to obtain key point feature information of each image to be identified.

S1064b, taking the multiple images to be recognized with the same key point feature information as multiple images to be recognized with the same compound expression, and obtaining multiple images to be recognized with the same compound expression in each category.

In application, the images to be recognized are images containing multiple types of compound expressions, and each type of compound expression image can correspond to multiple images. The above-mentioned key points may be understood as taking the eyes, nose, mouth, etc. of a person in each image to be recognized as key points in a face image, which may be detected by a face detection technique for each image to be recognized, and determine the coordinate information and feature information of each key point in the image to be recognized. Clustering can be understood as determining whether the two images to be identified belong to the same type of compound expression according to whether the difference between the coordinate information and the characteristic information exceeds a preset value after the coordinate information and the characteristic information of the key points in each image to be identified are obtained.

In practical situations, if the two images to be recognized both belong to the same type of compound expression, the difference between the feature information and the coordinate information of the same key point of the two images to be recognized is very small. Therefore, a plurality of images to be identified which belong to the same compound expression category can be determined in the plurality of images to be identified according to the key point clustering mode. Thereafter, a plurality of images to be recognized of each same compound expression category may be subjected to the above-described steps S1064-S1066, which will not be explained again.

In an embodiment, in the step S1064, before the obtaining of the classification result of each image to be recognized in the plurality of images to be recognized of the same compound expression, the method further includes:

In the application, S1064 mentioned above has described the reason why the expression of the person in the video images of consecutive frames is regarded as the same type of compound expression, and the description thereof will not be given. The preset video may be a video cached in the terminal device in advance in a designated storage path, or may be a video uploaded to the terminal device by the user, which is not limited herein. For a preset video, the terminal equipment can play the video and monitor an initial video image of a face image initially appearing in the video. Normally, when the video frame rate is not lower than 24 frame rate (fps), the human eye will feel that the video is coherent. Therefore, the frame rate of video playback is typically 24 frames per second. Therefore, the subsequent played 4 continuous frames of video images can be considered as a plurality of images to be recognized with the same compound expression as the initial video image.

In an embodiment, after S106 according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability, further comprising:

and uploading the target classification result to a block chain.

Specifically, in all embodiments of the present application, a corresponding target classification result is obtained based on the terminal device, and specifically, the target classification result is obtained by processing the target classification result by the terminal tool. Uploading the target classification result to the block chain can ensure the safety and the fair transparency to the user. The user equipment may download the target classification result from the blockchain to verify whether the target classification result is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

Referring to fig. 6, fig. 6 is a block diagram of a composite expression recognition apparatus according to an embodiment of the present disclosure. The units included in the terminal device in this embodiment are used to execute the steps in the embodiments corresponding to fig. 1 to 5. Please refer to fig. 1 to 5 and fig. 1 to 5 for related descriptions. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 6, the composite expression recognition apparatus 600 includes: a first prediction module 610, a first composite prediction module 620, a single expression prediction module 630, a second composite prediction module 640, an acquisition module 650, and an identification module 660, wherein:

the first prediction module 610 is configured to recognize a compound expression in an image to be recognized by using a first expression recognition model, and obtain first probability values corresponding to multiple compound expressions one to one.

A first composite prediction module 620, configured to determine a predicted composite expression according to a maximum value of the first probability value, determine a corresponding first target model in a second expression recognition model set based on the predicted composite expression, and input the image to be recognized to the first target model to obtain a first composite probability value for predicting the image to be recognized as the first composite expression; each first target model corresponds to two kinds of predicted compound expressions respectively.

The single expression prediction module 630 is configured to input the image to be recognized into a third expression recognition model, and predict a plurality of target single expressions included in the image to be recognized.

A second composite prediction module 640, configured to determine a corresponding second target model in the second expression recognition model set according to the multiple target single expressions, and input the image to be recognized into the second target model to obtain a second composite probability value for predicting the image to be recognized as a second composite expression; and in the two predicted compound expressions corresponding to each second target model, the two predicted compound expressions can be obtained by respectively combining the plurality of target single expressions in a one-to-one correspondence manner.

The obtaining module 650 is configured to obtain a first misclassification probability corresponding to the first expression recognition model, obtain a first composite misclassification probability corresponding to each second expression recognition model, and obtain a second composite misclassification probability corresponding to each second expression recognition model.

The identifying module 660 is configured to obtain a target classification result of the image to be identified according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability.

In one embodiment, the obtaining module 650 is further configured to:

In an embodiment, the identification module 660 is further configured to:

In one embodiment, the images to be recognized comprise a plurality of images, and the images to be recognized all belong to the same compound expression category; the identification module 660 is further configured to:

In an embodiment, the identification module 660 is further configured to:

In one embodiment, the composite expression recognition apparatus 600 further comprises

And uploading the target classification result to a block chain.

It should be understood that, in the structural block diagram of the compound expression recognition apparatus shown in fig. 6, each unit/module is used to execute each step in the embodiment corresponding to fig. 1 to 5, and each step in the embodiment corresponding to fig. 1 to 5 has been explained in detail in the above embodiment, and specific reference is made to the relevant description in the embodiment corresponding to fig. 1 to 5 and fig. 1 to 5, which is not repeated herein.

Fig. 7 is a block diagram of a terminal device according to another embodiment of the present application. As shown in fig. 7, the terminal device 700 of this embodiment includes: a processor 701, a memory 702, and a computer program 703, such as a program of a compound expression recognition method, stored in the memory 702 and executable on the processor 701. The processor 701 implements the steps in each embodiment of the compound expression recognition method described above, such as S101 to S106 shown in fig. 1, when executing the computer program 703. Alternatively, when the processor 701 executes the computer program 703, the functions of the units in the embodiment corresponding to fig. 6, for example, the functions of the modules 610 to 660 shown in fig. 6, are implemented, and refer to the related description in the embodiment corresponding to fig. 6 specifically.

Illustratively, the computer program 703 may be divided into one or more units, which are stored in the memory 702 and executed by the processor 701 to accomplish the present application. One or more of the units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 703 in the terminal device 700. For example, the computer program 703 may be divided into a first prediction module, a first composite prediction module, a single expression prediction module, a second composite prediction module, an acquisition module, and an identification module, each of which functions as described above.

The terminal equipment may include, but is not limited to, a processor 701, a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of a terminal device 700 and does not constitute a limitation of terminal device 700 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The processor 701 may be a central processing unit, but may also be other general purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 702 may be an internal storage unit of the terminal device 700, such as a hard disk or a memory of the terminal device 700. The memory 702 may also be an external storage device of the terminal device 700, such as a plug-in hard disk, a smart memory card, a flash memory card, etc. provided on the terminal device 700. Further, the memory 702 may also include both internal and external memory units of the terminal device 700.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A compound expression recognition method is characterized by comprising the following steps:

acquiring a first misclassification probability corresponding to the first expression recognition model, a first composite misclassification probability corresponding to each second expression recognition model and a second composite misclassification probability corresponding to each second expression recognition model;

2. The method for recognizing compound expressions according to claim 1, wherein the obtaining of the first misclassification probability corresponding to the first expression recognition model comprises:

3. The method for recognizing compound expressions according to claim 1, wherein the obtaining of the target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first compound probability value, the first compound misclassification probability, the second compound probability value and the second compound misclassification probability comprises:

4. The compound expression recognition method according to claim 3, wherein the image to be recognized includes a plurality of images, and the plurality of images to be recognized all belong to the same compound expression category;

the obtaining a target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability includes:

5. The compound expression recognition method according to claim 4, wherein before the obtaining of the classification result of each image to be recognized in the plurality of images to be recognized of the same compound expression, the method further comprises:

6. The compound expression recognition method according to claim 4, wherein before the obtaining of the classification result of each image to be recognized in the plurality of images to be recognized of the same compound expression, the method further comprises:

7. The compound expression recognition method of any one of claims 1-6, further comprising, after the obtaining of the target classification result of the image to be recognized according to the first probability value, the first misclassification probability, the first compound probability value, the first compound misclassification probability, the second compound probability value, and the second compound misclassification probability:

and uploading the target classification result to a block chain.

8. A composite expression recognition apparatus, comprising:

the obtaining module is used for obtaining a first misclassification probability corresponding to the first expression recognition model, obtaining a first composite misclassification probability corresponding to each second expression recognition model and obtaining a second composite misclassification probability corresponding to each second expression recognition model;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.