CN114170651A

CN114170651A - Expression recognition method, device, equipment and computer storage medium

Info

Publication number: CN114170651A
Application number: CN202111359971.1A
Authority: CN
Inventors: 姜明伟
Original assignee: Beijing Zijing Photoelectric Equipment Co ltd
Current assignee: Beijing Zijing Photoelectric Equipment Co ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-03-11

Abstract

The embodiment of the application provides an expression recognition method, an expression recognition device, equipment and a computer storage medium, wherein the expression recognition method comprises the steps of recognizing a plurality of characteristics of the face of a target object in target image data; comparing the plurality of features with the features of the preset reference expression of the target object to obtain a feature vector corresponding to each feature in the plurality of features; comparing a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature; and combining the target micro expressions corresponding to the multiple features to obtain the target facial expression of the target object. According to the method and the device, the complex expression change of the target object can be captured, so that the complex expression change of the virtual image is accurately driven, and the accuracy of the expression recognition result is improved.

Description

Expression recognition method, device, equipment and computer storage medium

Technical Field

The present application belongs to the field of image recognition, and in particular, to a method, an apparatus, a device, and a computer storage medium for facial expression recognition.

Background

With the development of image recognition technology, face recognition has been widely applied to aspects of virtual anchor, social interaction, advertisement, entertainment and the like, and especially in a scene in which a real person needs to be replaced by a virtual image, how to accurately recognize the expression of a person, so that the virtual image simulates the expression of the person, which becomes a problem to be solved urgently.

In the prior art, the simple character expression of a character is mainly recognized by analyzing character video stream data, and then the obtained information is used for controlling a virtual three-dimensional character to make the same expression.

Disclosure of Invention

The embodiment of the application provides an expression recognition method, device, equipment and computer storage medium, and can solve the problem that in the prior art, the accuracy of expression recognition results is low when complex and various expressions are changed.

In a first aspect, an embodiment of the present application provides an expression recognition method, where the method includes:

identifying a plurality of features of a face of a target object in target image data;

comparing the plurality of features with features of a preset reference expression of the target object to obtain a feature vector corresponding to each feature in the plurality of features;

comparing a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature;

and combining the target micro expressions corresponding to the plurality of characteristics to obtain the target facial expression of the target object.

In an alternative embodiment, the target image data includes a face size of the target object; prior to said identifying a plurality of features of a target object's face in the target image data, the method further comprises:

collecting video stream data;

identifying a plurality of features of a face of a target object in the video stream data;

preprocessing image data corresponding to the video stream data to obtain the target image data, wherein the target image data comprises each feature of the plurality of features, and the size of the face of the target object in the target image data is consistent with the size of the preset reference expression.

In an optional implementation manner, the comparing each of the plurality of features with a feature of a preset reference expression of the target object to obtain a feature vector corresponding to each feature includes:

comparing the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of each feature in the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

and determining a characteristic vector corresponding to each characteristic according to the displacement vector of each characteristic in the plurality of characteristics in the three-dimensional space and the change vector of the curve formed by displacement.

In an optional implementation manner, before the comparing the plurality of features with the features of the preset reference expression of the target object in the preset facial basic feature library to obtain the displacement vector of the plurality of features in the three-dimensional space and the variation vector of the curve formed by the displacement, the method further includes:

acquiring first image data of the target object;

identifying features of the preset reference expression of the target object in the first image data, wherein the features of the preset reference expression comprise a whole facial structure feature, a five sense organs structure feature and a chin structure feature, and the five sense organs structure feature comprises position information of five sense organs on the face;

and storing the features of the preset reference expression of the target object into the preset facial basic feature library.

In an optional implementation manner, the comparing the feature vector corresponding to each of the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each of the features includes:

matching a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro-expression in a preset expression feature vector library based on a preset expression classification model to obtain the matching degree of each feature and the preset micro-expression;

determining the preset micro expression with the maximum matching degree with the features as a first micro expression corresponding to the features;

and when the matching degree of the features and the first micro expression is larger than a preset threshold value, determining the first micro expression as a target micro expression corresponding to the features.

In an optional implementation manner, before the matching, based on a preset expression classification model, a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression in a preset expression feature vector library to obtain a matching degree between each feature and the preset micro expression, the method further includes:

acquiring a plurality of second image data of the target object;

identifying features of the preset micro-expression of the target object in each of the plurality of second image data, the features of the preset micro-expression including the facial overall structural feature, the facial feature of the five sense organs and the chin structural feature;

comparing the feature of the preset micro expression of the target object in each second image data with the feature of a preset reference expression of the target object, and determining a feature vector corresponding to each preset micro expression;

and storing the feature vector corresponding to each preset micro expression into the preset expression feature vector library.

In an optional embodiment, the pre-processing the image data corresponding to the video stream data includes performing image rotation rectification, image stretching, and image alignment on the image data corresponding to the video stream data.

In an optional implementation manner, after the combining the target micro expressions corresponding to the plurality of features and performing integration processing to obtain the target facial expression of the target object, the method further includes:

based on the target facial expression, an expression of the avatar corresponding to the target facial expression is reproduced.

In a second aspect, an embodiment of the present application provides an expression recognition apparatus, including:

a recognition module for recognizing a plurality of features of a face of a target object in target image data;

the comparison module is used for comparing the plurality of features with the features of the preset reference expression of the target object to obtain a feature vector corresponding to each feature in the plurality of features;

the comparison module is further configured to compare a feature vector corresponding to each feature of the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature;

and the combination module is used for combining the target micro-expressions corresponding to the plurality of characteristics to obtain the target facial expression of the target object.

In an alternative embodiment, the target image data includes a face size of the target object; the device also comprises an acquisition module and a pretreatment module;

the acquisition module is used for acquiring video stream data before identifying a plurality of characteristics of the face of a target object in the target image data;

the identification module is further used for identifying a plurality of characteristics of the face of the target object in the video stream data;

the preprocessing module is configured to preprocess image data corresponding to the video stream data to obtain the target image data, where the target image data includes each of the plurality of features, and a size of a face of the target object in the target image data is consistent with a size of the preset reference expression.

In an optional embodiment, the apparatus further comprises a determination module;

the comparison module is further configured to compare the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of each feature in the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

and the determining module is used for determining the characteristic vector corresponding to each characteristic according to the displacement vector of each characteristic in the plurality of characteristics in the three-dimensional space and the change vector of the curve formed by displacement.

In an optional embodiment, the apparatus further comprises a storage module;

the comparison module is further configured to acquire first image data of the target object before comparing the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

the recognition module is further configured to recognize features of the preset reference expression of the target object in the first image data, where the features of the preset reference expression include a whole facial structure feature, a structural feature of five sense organs and a structural feature of a chin, and the structural feature of five sense organs includes position information of five sense organs on a face;

the storage module is used for storing the features of the preset reference expression of the target object to the preset facial basic feature library.

In an optional embodiment, the apparatus further comprises a matching module;

the matching module is used for matching a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro-expression in a preset expression feature vector library based on a preset expression classification model to obtain the matching degree of each feature and the preset micro-expression;

the determining module is further configured to determine the preset micro expression with the largest matching degree with the feature as a first micro expression corresponding to the feature;

the determining module is further configured to determine the first micro expression as a target micro expression corresponding to the feature when the matching degree of the feature and the first micro expression is greater than a preset threshold.

In an optional implementation manner, the acquisition module is further configured to acquire a plurality of second image data of the target object before the feature vector corresponding to each feature in the plurality of features is matched with a preset feature vector corresponding to a preset micro expression in a preset expression feature vector library based on a preset expression classification model to obtain a matching degree between each feature and the preset micro expression;

the recognition module is further configured to recognize a feature of the preset micro expression of the target object in each of the plurality of second image data, where the feature of the preset micro expression includes the whole facial structure feature, the five sense organs structure feature, and the chin structure feature;

the comparison module is further configured to compare the feature of the preset micro expression of the target object in each second image data with the feature of a preset reference expression of the target object, and determine a feature vector corresponding to each preset micro expression;

the storage module is further configured to store the feature vector corresponding to each preset micro expression into the preset expression feature vector library.

In an optional embodiment, the apparatus further comprises a reproduction module;

the reproduction module is used for reproducing the expression of the virtual image corresponding to the target facial expression based on the target facial expression after the target micro-expression corresponding to the plurality of characteristics is combined and integrated to obtain the target facial expression of the target object.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements an expression recognition method as described in any embodiment of the first aspect.

In a fourth aspect, the present application provides a computer storage medium having computer program instructions stored thereon, where the computer program instructions, when executed by a processor, implement the expression recognition method as described in any one of the embodiments of the first aspect.

According to the expression recognition method, the expression recognition device, the expression recognition equipment and the computer storage medium, a plurality of features of the face of a target object in target image data are recognized, the features are compared with the features of the preset reference expression of the target object, a feature vector corresponding to each feature in the features is obtained, the feature vector corresponding to each feature in the features is further compared with the preset feature vector corresponding to the preset micro expression, the target micro expression corresponding to each feature is obtained, and therefore the target micro expressions corresponding to the features are combined, and the target facial expression of the target object is obtained. Therefore, the complex expression change of the target object can be captured, so that the complex expression change of the virtual image is accurately driven, and the accuracy of the expression recognition result is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an expression recognition method according to an embodiment of the present application;

FIG. 2 is a system architecture diagram of emotion recognition software provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an expression classifier provided in one embodiment of the present application;

FIG. 4 is a schematic diagram of a facial feature training phase provided by one embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating operation of a kinetic capture system provided by one embodiment of the present application;

fig. 6 is a schematic structural diagram of an expression recognition apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As described in the background art, there is a problem in the prior art that the accuracy of the expression recognition result is low when the complex and various expression changes are faced. In order to solve the above problem, embodiments of the present application provide an expression recognition method, apparatus, device, and storage medium. The expression recognition method includes the steps of recognizing a plurality of features of the face of a target object in target image data, comparing the plurality of features with features of a preset reference expression of the target object to obtain a feature vector corresponding to each feature in the plurality of features, comparing the feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature, and combining the target micro expressions corresponding to the plurality of features to obtain a target facial expression of the target object. Therefore, the complex expression change of the target object can be captured, so that the complex expression change of the virtual image is accurately driven, and the accuracy of the expression recognition result is improved. First, a method for recognizing an expression provided in an embodiment of the present application is described below.

Fig. 1 shows a flowchart of an expression recognition method according to an embodiment of the present application.

As shown in fig. 1, the expression recognition method may specifically include the following steps:

s110, a plurality of characteristics of the face of the target object in the target image data are identified.

The target image data may be image data containing facial features of a target object, the target object may be a human body, the plurality of features of the target object may be facial overall features, such as facial height, width, height to width ratio, positions of the left eyebrow, the right eyebrow, the left eye, the right eye, the nose, the mouth on the face, and the like, or facial features, such as width, height, shape of the left eyebrow, width, height, shape of the right eyebrow, width, height, shape of the left eye, width, height, shape of the right eye, width, height, shape, and the like, width, height of the nose, width, height of the mouth, height of the upper lip, thickness, shape of the lower lip, thickness, shape, and the like.

And S120, comparing the plurality of features with the features of the preset reference expression of the target object to obtain a feature vector corresponding to each feature in the plurality of features.

The preset reference expression may be an expression of the target object in a neutral state, that is, an expression of which the face is still and does not make any motion, and the preset reference expression may also be arbitrarily set according to the needs of the user, which is not limited herein. The preset reference expression can be confirmed through an expression recognition algorithm, a plurality of features of the face of the target object are compared with the features of the preset reference expression, and a feature vector corresponding to each feature in the plurality of features in the current state is recognized.

S130, comparing the feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature.

The preset micro expression can be a facial expression which can be set by a user according to needs, for example, the preset micro expression can be a left frown, a right frown, a left eye raised, a right lip raised, an upper lip raised, a left eye closed, a right eye closed, a left eye downward looking, a right eye downward looking, a left eye middle looking, a right eye middle looking, a left eye opening large, a right eye opening large, a left eye outward looking, a right eye outward looking, a left eye squinting, a right eye squinting, a left eye upward looking, a right eye upward looking, a chin chewing, a chin forward extending, a chin left falling left throwing, a chin right throwing, a poult blowing, a lower lip inward fringed, a lower lip downward turning, a mouth close, a left throwing, a right lip throwing, an upper lip inward, an upper lip upward turning, a left lip upward turning, a dimple, a right lip dimple, a lower lip turning left lip, a lower lip left mouth, a lower lip turning downward throwing, a right mouth, a left mouth, a smiling, a right mouth, a left lip inward turning, a nipple, smiling in the right mouth, breath holding in the face, jeopardy, etc.

And S140, combining the target micro expressions corresponding to the multiple features to obtain the target facial expression of the target object.

Illustratively, the target object face is divided into each organ: the target micro expressions of the left eyebrow, the right eyebrow, the left eye, the right eye, the nose, the mouth and the chin are combined to obtain the complete expression of the target face, namely the target facial expression.

Therefore, a plurality of features of the face of the target object in the target image data are recognized, the features are compared with the features of the preset reference expression of the target object to obtain a feature vector corresponding to each feature in the features, the feature vector corresponding to each feature in the features is further compared with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature, and therefore the target micro expressions corresponding to the features are combined to obtain the target facial expression of the target object. Therefore, the complex expression change of the target object can be captured, so that the complex expression change of the virtual image is accurately driven, and the accuracy of the expression recognition result is improved.

In one embodiment, at S140 above: after combining the target micro expressions corresponding to the multiple features and performing integration processing to obtain the target facial expression of the target object, the method further comprises the following steps:

In a specific example, as shown in fig. 2, the user may drive the avatar expression to change according to the target facial expression of the user in real time at the user interface layer 21, so as to realize the expression reproduction of the avatar corresponding to the target facial expression.

Therefore, various complex expressions of the user can be accurately simulated by reproducing the expression of the avatar corresponding to the target facial expression based on the target facial expression.

Based on this, in one embodiment, the target image data includes a face size of the target object; in the above S110: before recognizing a plurality of features of the face of the target object in the target image data, the expression recognition method may further include:

collecting video stream data;

identifying a plurality of features of a face of a target object in video stream data;

and preprocessing image data corresponding to the video stream data to obtain target image data, wherein the target image data comprises each feature of the multiple features, and the size of the face of a target object in the target image data is consistent with the size of a preset reference expression.

The specific identification mode can be realized by an identification Software Development Kit (SDK) carried by the camera or an SDK identified by third-party video data, and further, image data corresponding to the video stream data is preprocessed.

In a specific example, as shown in fig. 2, the video stream collecting module 22 may collect real-time video stream data from the camera product 23, track and locate the human body and the human face through the video stream preprocessing layer 24, identify feature points of facial features and outlines, and transmit the identified feature point data to the image preprocessing layer 25 for further processing. Because the image data collected by the camera is affected by various factors, for example, the head does not face the camera, the distance from the camera changes, and the size of the face in the video is inconsistent, the face image data collected by the camera needs to be preprocessed first, the face image data is preprocessed by the image preprocessing layer 25, so that the face image faces the camera, and the face image has the same size as the preset reference expression, and the like, and finally, the standardized face image data, that is, the target image data is obtained.

Therefore, by identifying a plurality of characteristics of the face of the target object in the video stream data and preprocessing the image data corresponding to the video stream data, target image data including each of the plurality of characteristics is obtained, and the size of the face of the target object in the target image data is consistent with the size of the preset reference expression, more standardized target image data can be obtained.

In one embodiment, pre-processing image data corresponding to the video stream data includes performing image rotation correction, image stretching, and image alignment on the image data corresponding to the video stream data.

In a specific example, as shown in fig. 2, when the facial image data collected by the camera is preprocessed, the image preprocessing layer 25 performs calculations such as image rotation correction, image stretching and alignment to make the facial image face the camera, and make the facial image have the same size as the preset reference expression, so as to obtain the standardized facial image data, that is, the target image data.

Therefore, the image data corresponding to the video stream data is subjected to image rotation correction, image stretching, image alignment and other processing, so that the face identification of the target object in the target image data can be more accurate and complete.

In one embodiment, the step S120: comparing the plurality of features with the features of the preset reference expression of the target object to obtain a feature vector corresponding to each feature of the plurality of features, which may specifically include:

comparing the plurality of features with the features of the preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of each feature in the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

The preset facial basic feature library may include features of a preset reference expression of the target object, for example, the features may include a facial overall structural feature, a left eyebrow structural feature, a right eyebrow structural feature, a left eye structural feature, a right eye structural feature, a nose structural feature, a mouth structural feature, and a chin structural feature.

In a specific example, as shown in fig. 3, after the feature data of the target object is preprocessed by the image preprocessing engine, the micro expression feature extraction engine 31 extracts features of a reference neutral expression of the target object from the facial infrastructure feature library 32, and compares a plurality of facial features of the target object with the features of the reference neutral expression to obtain a displacement vector of each facial feature of the target object in a three-dimensional space and a variation vector of a curve formed by the displacement, for example, a facial variation feature vector, a five sense organ variation feature vector, or a bar variation feature vector, thereby determining a feature vector corresponding to each feature.

Therefore, the plurality of features of the face of the target object are compared with the features of the preset reference expression of the target object in the preset face basic feature library to obtain the displacement vector of each feature in the plurality of features in the three-dimensional space and the change vector of the curve formed by the displacement, and therefore the feature vector corresponding to each feature can be determined more accurately.

In one embodiment, before comparing the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of the plurality of features in a three-dimensional space and a variation vector of a curve formed by displacement, the expression recognition method may further include:

acquiring first image data of a target object;

identifying features of a preset reference expression of the target object in the first image data, wherein the features of the preset reference expression comprise a whole facial structure feature, a five sense organs structure feature and a chin structure feature, and the five sense organs structure feature comprises position information of the five sense organs on the face;

and storing the characteristics of the preset reference expression of the target object into a preset facial basic characteristic library.

The first image data may be image data in which the target object is in a neutral state, that is, there is no expression change, and the first image data may include a preset reference expression of the target object, identify a feature of the preset reference expression of the target object in the first image data, and store the feature in a preset facial basic feature library, to finally obtain a preset facial basic feature library including all features of the preset reference expression of the target object.

In a specific example, as shown in fig. 4, after preprocessing the image data of the target object in the neutral state, the neutral expression feature extraction engine 41 extracts the facial structure feature, the five sense organs structure feature, and the chin structure feature of the target object, and stores the above structure features in the facial infrastructure feature library 42, and the facial infrastructure feature library 42 may specifically include a facial overall structure feature, a left eyebrow structure feature, a right eyebrow structure feature, a left eye structure feature, a right eye structure feature, a nose structure feature, a mouth structure feature, and a chin structure feature.

Therefore, by identifying the features of the preset reference expression of the target object in the first image data and storing the features of the preset reference expression of the target object in the preset facial basic feature library, a complete preset facial basic feature library can be obtained, and the feature extraction of the preset reference expression is more accurate.

In one embodiment, the step S130: comparing the feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature, which may specifically include:

determining a preset micro expression with the maximum matching degree with the features as a first micro expression corresponding to the features;

and when the matching degree of the features and the first micro expressions is larger than a preset threshold value, determining the first micro expressions as target micro expressions corresponding to the features.

The preset expression classification model can be an engine capable of accurately classifying expression features, the preset expression feature vector library can comprise preset feature vectors corresponding to preset micro expressions, the feature vectors corresponding to each feature of the face of the target object are matched with the preset feature vectors corresponding to the preset micro expressions in the preset expression feature vector library to obtain the matching degree of each feature and the preset micro expressions, the preset micro expressions with the maximum matching degree of each feature are obtained through calculation, if the matching degree of a certain feature and the preset micro expressions is maximum and exceeds a preset threshold value, matching is successful, and the preset micro expressions are determined to be the target micro expressions corresponding to the feature.

In a specific example, as shown in fig. 3, after extracting a feature vector corresponding to each feature of the target object face by the micro expression feature extraction engine 31, extracting a feature vector corresponding to each micro expression from the micro expression feature vector library 34 by using the expression feature classification engine 33, and matching the feature vector corresponding to each feature of the target object face with the feature vector corresponding to each micro expression, for example, matching feature vectors of the left eyebrow and the right eyebrow, matching feature vectors of the left eye and the right eye, matching feature vectors of the nose, matching feature vectors of the mouth and matching feature vectors of the chin to obtain a matching degree of each feature with a preset micro expression, and if the matching degree of the feature vector corresponding to the left eyebrow and the feature vector corresponding to the micro expression left eyebrow is the maximum and exceeds a preset threshold, matching is successful, and the left eyebrow is determined to be the target micro expression corresponding to the left eyebrow.

Therefore, the feature vector corresponding to each feature is matched with the preset feature vector corresponding to the preset micro-expression in the preset expression feature vector library through the preset expression classification model, the matching degree of each feature and the preset micro-expression is obtained, the preset micro-expression with the maximum matching degree of the features and exceeding a preset threshold is determined as the target micro-expression corresponding to the feature, and the matching of the expressions can be more accurate.

In an embodiment, before matching a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro-expression in a preset expression feature vector library based on a preset expression classification model to obtain a matching degree between each feature and the preset micro-expression, the expression recognition method may further include:

acquiring a plurality of second image data of the target object;

identifying a feature of a preset micro expression of the target object in each of the plurality of second image data, wherein the feature of the preset micro expression comprises a whole facial structure feature, a structural feature of five sense organs and a structural feature of a chin;

comparing the characteristics of the preset micro expression of the target object in each second image data with the characteristics of the preset reference expression of the target object, and determining a characteristic vector corresponding to each preset micro expression;

and storing the feature vector corresponding to each preset micro expression into a preset expression feature vector library.

The second image data can be a plurality of image data including preset micro expressions of the target object, characteristics of the preset micro expressions of the target object in each second image data are identified and compared with characteristics of preset reference expressions of the target object, the characteristics of the preset micro expressions of the target object are displaced in a three-dimensional space, paths formed by the displacement are regular, curves formed by the characteristic displacements and the displacements are calculated to generate characteristic vectors, each preset micro expression corresponds to one group of characteristic vectors, and all the characteristic vectors of the preset micro expressions are stored in a preset expression characteristic vector library.

Illustratively, the feature vector e of the preset micro expression can be represented in the following vector form:

e＝(X₁，Y₁，Z₁，X₂，Y₂，Z₂，......，X_n，Y_n，Z_n)

(X_n，Y_n，Z_n) The vertex coordinates of the facial feature which is the target object may include vertex coordinates of the entire structural feature of the face, vertex coordinates of the structural feature of the five sense organs, and vertex coordinates of the structural feature of the chin, n is the number of vertices of the facial feature, and the respective vertex coordinates are arranged in a fixed order of vertices in the feature vector e. The neutral expression and each micro expression of the target object can be expressed in the form of the vector, the feature vectors corresponding to all the micro expressions are combined together to form an expression matrix, and the expression matrix is subjected to Principal Component Analysis (PCA) to obtain a micro expression feature vector library.

In a specific example, as shown in fig. 4, after preprocessing the micro expression image data of the target object, the micro expression feature extraction engine 43 extracts the facial change feature vector, the facial feature vector, and the jaw feature vector of the target object, and stores the above change feature vectors into the micro expression change feature vector library 44, and the micro expression change feature vector library 44 may specifically include a facial change feature vector, a left eyebrow change feature vector, a right eyebrow change feature vector, a left eye change feature vector, a right eye change feature vector, a nose change feature vector, a mouth change feature vector, and a jaw change feature vector.

Therefore, by identifying the characteristics of the preset micro expression of the target object in each second image data of the plurality of second image data, comparing the characteristics of the preset micro expression of the target object in each second image data with the characteristics of the preset reference expression of the target object, determining the characteristic vector corresponding to each preset micro expression, and storing the characteristic vector into the preset expression characteristic vector library, a complete preset expression characteristic vector library can be obtained, so that the expression identification result is more accurate.

To better describe the overall scheme, a specific example is given based on the above embodiments.

For example, fig. 5 shows a flow chart of the operation of the motion capture system based on the expression recognition method. Firstly, a user starts a capturing system to enter a training stage, neutral expressions and 46 micro expressions without expression changes are trained in sequence to obtain feature data of facial expressions, secondly, a dynamic capturing function is started, the system collects a user video data stream in real time, real-time expressions of the user are obtained through matching and classification with the facial expression features obtained through training, then virtual image expression changes are driven according to the real-time expressions of the user obtained through matching and classification, and finally, a dynamic capturing task is finished to exit the dynamic capturing system.

By improving the expression recognition accuracy rate and driving the virtual image to simulate the complex expression of a user, the dynamic capturing system can realize the following corresponding service scenes under the conditions that the virtual image needs to replace a real person and the requirement on the expression and action of the virtual image is high:

(1) in the anchor field, E-commerce, media, games and the like have high requirements on the anchor, the virtual anchor can bring uniform images to users of enterprises, and the influence of anchor state change on the users is avoided;

(2) in the social field, scenes such as virtual character interaction, communication, meetings and the like are used, the virtual images can be used for communication when the user does not want to use the real images, and the expression is rich and exquisite and has more sense of reality;

(3) in the field of advertisement, enterprises use the virtual images as advertisement images, expression is rich, and the requirements of the enterprises on the advertisement images can be met;

(4) in the customer service field, the virtual customer service can show a unified image for enterprise users, and is not influenced by reasons such as enterprise employee job leaving and the like;

(5) the virtual role has great advantages in the aspects of multimedia video production, military sand table production and sub-epoch game production;

(6) in the entertainment field, scenes such as virtual synthesis activities, virtual singing meetings and the like are performed by using virtual characters.

Fig. 6 is a schematic structural diagram illustrating an expression recognition apparatus according to an exemplary embodiment.

As shown in fig. 6, the expression recognition apparatus 600 may include:

an identifying module 601 for identifying a plurality of features of a face of a target object in target image data;

a comparing module 602, configured to compare the multiple features with features of a preset reference expression of the target object, to obtain a feature vector corresponding to each feature in the multiple features;

the comparison module is also used for comparing a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each feature;

and the combining module 603 is configured to combine the target micro expressions corresponding to the multiple features to obtain a target facial expression of the target object.

In an alternative embodiment, the target image data includes a face size of the target object; the expression recognition device 600 may further include an acquisition module and a preprocessing module;

a capture module to capture video stream data prior to identifying a plurality of features of a face of a target object in target image data;

an identifying module 601, further configured to identify a plurality of features of a face of a target object in the video stream data;

the preprocessing module is used for preprocessing image data corresponding to the video stream data to obtain target image data, the target image data comprises each feature of the multiple features, and the face size of a target object in the target image data is consistent with the size of a preset reference expression.

In an optional embodiment, the expression recognition apparatus 600 may further include a determination module;

the comparison module 602 is further configured to compare the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library, so as to obtain a displacement vector of each feature in the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

and the determining module is used for determining a characteristic vector corresponding to each characteristic according to the displacement vector of each characteristic in the plurality of characteristics in the three-dimensional space and the change vector of the curve formed by the displacement.

In an optional embodiment, the expression recognition apparatus 600 may further include a storage module;

the comparison module 602 is further configured to acquire first image data of the target object before comparing the plurality of features with features of a preset reference expression of the target object in a preset facial basic feature library to obtain a displacement vector of the plurality of features in a three-dimensional space and a change vector of a curve formed by displacement;

the recognition module 601 is further configured to recognize features of a preset reference expression of the target object in the first image data, where the features of the preset reference expression include a whole facial structure feature, a structural feature of five sense organs and a structural feature of a chin, and the structural feature of the five sense organs includes position information of the five sense organs on the face;

and the storage module is used for storing the characteristics of the preset reference expression of the target object into a preset facial basic characteristic library.

In an optional embodiment, the expression recognition apparatus 600 may further include a matching module;

the determining module is further used for determining the preset micro expression with the maximum matching degree with the features as a first micro expression corresponding to the features;

the determining module is further used for determining the first micro expression as a target micro expression corresponding to the feature when the matching degree of the feature and the first micro expression is larger than a preset threshold.

In an optional implementation manner, the acquisition module is further configured to acquire a plurality of second image data of the target object before matching, based on a preset expression classification model, a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression in a preset expression feature vector library to obtain a matching degree of each feature with the preset micro expression;

the recognition module 601 is further configured to recognize a feature of a preset micro expression of the target object in each of the plurality of second image data, where the feature of the preset micro expression includes a whole facial structure feature, a five sense organ structure feature, and a chin structure feature;

the comparison module 602 is further configured to compare the feature of the preset micro expression of the target object in each second image data with the feature of the preset reference expression of the target object, and determine a feature vector corresponding to each preset micro expression;

the storage module is further used for storing the feature vector corresponding to each preset micro expression into a preset expression feature vector library.

In an optional embodiment, the pre-processing the image data corresponding to the video stream data includes performing image rotation correction, image stretching, and image alignment on the image data corresponding to the video stream data.

In an optional embodiment, the expression recognition apparatus 600 may further include a recurrence module;

and the reproduction module is used for reproducing the expression of the virtual image corresponding to the target facial expression based on the target facial expression after the target micro-expression corresponding to the multiple characteristics is combined and integrated to obtain the target facial expression of the target object.

Fig. 7 shows a hardware schematic diagram of an electronic device provided in an embodiment of the present application.

The electronic device may include a processor 701 and a memory 702 that stores computer program instructions.

Specifically, the processor 701 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 702 may include a mass storage for data or instructions. By way of example, and not limitation, memory 702 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 702 may include removable or non-removable (or fixed) media, where appropriate. The memory 702 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 702 is non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), Random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the methods according to an aspect of the present disclosure.

The processor 701 may read and execute the computer program instructions stored in the memory 702 to implement any one of the expression recognition methods in the above embodiments.

In one example, the electronic device may also include a communication interface 703 and a bus 710. As shown in fig. 7, the processor 701, the memory 702, and the communication interface 703 are connected by a bus 710 to complete mutual communication.

The communication interface 703 is mainly used for implementing communication between modules, apparatuses, units and/or devices in this embodiment of the application.

Bus 710 includes hardware, software, or both to couple the components of the expression recognition device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 710 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The electronic device may perform the expression recognition method in the embodiment of the present application based on a plurality of features of the face of the target object in the target image data and features of a preset reference expression, thereby implementing the expression recognition method described in conjunction with fig. 1.

In addition, in combination with the expression recognition method in the foregoing embodiments, embodiments of the present application may provide a computer storage medium to implement. The computer storage medium having computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement any of the expression recognition methods in the above embodiments.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. An expression recognition method, comprising:

2. The method of claim 1, wherein the target image data includes a face size of a target object; prior to said identifying a plurality of features of a target object's face in the target image data, the method further comprises:

collecting video stream data;

3. The method according to claim 1, wherein the comparing each of the plurality of features with a feature of a preset reference expression of the target object to obtain a feature vector corresponding to each feature comprises:

4. The method according to claim 3, wherein before the comparing the plurality of features with the features of the preset reference expression of the target object in a preset facial basic feature library to obtain the displacement vector of the plurality of features in the three-dimensional space and the variation vector of the curve formed by the displacement, the method further comprises:

acquiring first image data of the target object;

5. The method of claim 1, wherein comparing the feature vector corresponding to each of the plurality of features with a preset feature vector corresponding to a preset micro expression to obtain a target micro expression corresponding to each of the features comprises:

6. The method according to claim 5, wherein before the matching, based on a preset expression classification model, a feature vector corresponding to each feature in the plurality of features with a preset feature vector corresponding to a preset micro expression in a preset expression feature vector library to obtain a matching degree of each feature with the preset micro expression, the method further comprises:

acquiring a plurality of second image data of the target object;

7. The method of claim 2, wherein the pre-processing image data corresponding to the video stream data comprises performing image rotation correction, image stretching, and image alignment on the image data corresponding to the video stream data.

8. The method according to any one of claims 1 to 7, wherein after the combining the target micro expressions corresponding to the plurality of features for integration processing to obtain the target facial expression of the target object, the method further comprises:

9. An expression recognition apparatus, characterized in that the apparatus comprises:

10. An electronic device, characterized in that the device comprises: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the expression recognition method according to any one of claims 1 to 8.

11. A computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the expression recognition method of any one of claims 1-8.