CN115294624A - Facial expression capturing method and device, storage medium and terminal - Google Patents

Facial expression capturing method and device, storage medium and terminal Download PDF

Info

Publication number
CN115294624A
CN115294624A CN202210744443.6A CN202210744443A CN115294624A CN 115294624 A CN115294624 A CN 115294624A CN 202210744443 A CN202210744443 A CN 202210744443A CN 115294624 A CN115294624 A CN 115294624A
Authority
CN
China
Prior art keywords
recognized
expression
facial
facial expression
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210744443.6A
Other languages
Chinese (zh)
Inventor
赵天奇
段盼
巴君
渠源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Juli Dimension Technology Co ltd
Original Assignee
Beijing Juli Dimension Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Juli Dimension Technology Co ltd filed Critical Beijing Juli Dimension Technology Co ltd
Priority to CN202210744443.6A priority Critical patent/CN115294624A/en
Publication of CN115294624A publication Critical patent/CN115294624A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a facial expression capturing method, a device, a storage medium and a terminal, wherein the method comprises the following steps: acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized; inputting a facial image to be recognized and a target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.

Description

Facial expression capturing method and device, storage medium and terminal
Technical Field
The invention relates to the technical field of machine vision, in particular to a facial expression capturing method, a device, a storage medium and a terminal.
Background
Expressions are external expressions of emotion and emotion, and can be divided into six types according to a basic emotion model: anger, aversion, fear, joy, sadness and surprise. The human face expression recognition has very important research significance all the time, and has great market value in a plurality of fields such as human-computer interaction, public safety, intelligent film and television and the like. With the continuous development of machine learning, researchers are eager to improve the capturing precision of facial expression information.
In the prior art, 2d/3d key points of a human face are used for driving, but the human face occupies a small area in the whole image, and the problems of motion blur, occlusion and the like are common, so that the accuracy and stability of key point detection are limited. Some people model the face and then regress expression parameters through deep learning, but because the modeling precision is not enough, the expression characteristic space coverage is limited, so that the problems of inaccurate expression and insufficient expression degree exist, and meanwhile, the stability under various natural scenes is improved, so that the capturing precision of facial expression information is reduced.
Disclosure of Invention
The embodiment of the application provides a facial expression capturing method and device, a storage medium and a terminal. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a facial expression capturing method, where the method includes:
acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
inputting a facial image to be recognized and a target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics.
Optionally, the obtaining of the face image to be recognized and the extracting of the target implicit code of the face image to be recognized include:
acquiring a face image to be recognized;
and extracting a target implicit code of the face image to be recognized from a pre-generated random implicit code matrix by adopting a meta-learner.
Optionally, obtaining a pre-generated random implicit coding matrix according to the following steps, including:
collecting facial expression data;
calculating prior distribution of the facial expression data;
and constructing a random implicit coding matrix according to the prior distribution.
Optionally, the pre-trained facial expression capturing model includes a general feature extraction module, an identity ID extraction module, an expression optimization module, a head pose extraction module and a fusion module;
inputting a face image to be recognized and a target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the face image to be recognized, wherein the method comprises the following steps:
inputting a face image to be recognized into a general feature extraction module, and outputting a feature space of the face image to be recognized;
extracting the identity ID features corresponding to the face image to be recognized from the feature space by adopting a target implicit coding and identity ID extraction module;
extracting final expression features corresponding to the facial image to be recognized from the feature space by adopting an expression extraction module and an expression optimization module;
extracting head pose features corresponding to the face image to be recognized from the feature space by adopting a head pose extraction module;
inputting the identity ID characteristic, the final expression characteristic and the head posture characteristic into a fusion module for characteristic fusion, and outputting a facial expression characteristic corresponding to the facial image to be recognized;
and restoring the facial expression information corresponding to the facial image to be recognized according to the facial expression characteristics.
Optionally, the expression extraction module and the expression optimization module are used to extract final expression features corresponding to the facial image to be recognized from the feature space, and the method includes:
extracting the regional characteristics of different face regions corresponding to the face image to be recognized from the characteristic space by adopting an expression extraction module;
acquiring a plurality of natural expression images under different illumination conditions associated with a face image to be recognized;
extracting natural expression characteristics of a plurality of natural expression images under different illumination conditions by adopting an expression optimization module;
and determining the final expression characteristics corresponding to the face image to be recognized according to the area characteristics of different face areas and the extracted natural expression characteristics.
Optionally, determining a final expression feature corresponding to the face image to be recognized according to the region features of different face regions and the extracted natural expression features, including:
carrying out feature fusion on the regional features of different face regions and the extracted natural expression features to obtain a first feature;
performing expression coding operation on a plurality of natural expression images under different illumination conditions to obtain a second characteristic;
and calculating the difference value of the first characteristic and the second characteristic to obtain the final expression characteristic corresponding to the face image to be recognized.
Optionally, the generating of the pre-trained facial expression capture model according to the following steps includes:
collecting and labeling a plurality of facial expression data to obtain labeled facial expression data;
performing data enhancement on the labeled facial expression data to obtain enhanced data;
creating a facial expression capturing model;
acquiring implicit codes of each datum in the enhanced data;
sequentially inputting each data in the enhanced data and the implicit codes corresponding to the data into a facial expression capturing model for model training, and outputting a model loss value;
and when the model loss value reaches the minimum value, generating a pre-trained facial expression capturing model.
In a second aspect, an embodiment of the present application provides a facial expression capturing apparatus, including:
the implicit code acquisition module is used for acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
the facial expression information output module is used for inputting the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model and outputting facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, a facial expression capturing device firstly acquires a facial image to be recognized and extracts a target implicit code of the facial image to be recognized, then inputs the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputs facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flowchart of a facial expression capturing method according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a process of capturing facial expressions according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for training a facial expression capture model according to an embodiment of the present application;
FIG. 4 is a block diagram of an overall flow of a facial expression capturing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a facial expression capture device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The application provides a facial expression capturing method, a device, a storage medium and a terminal, which are used for solving the problems in the related technical problems. In the technical scheme provided by the application, as the model training is carried out by extracting various features and fusing the features into new features, the capturing precision of the model on the facial expression information can be effectively improved, and the following description is carried out in detail by adopting an exemplary embodiment.
The facial expression capturing method provided by the embodiment of the present application will be described in detail below with reference to fig. 1 to 4. The method may be implemented in dependence on a computer program, executable on a von neumann-based facial expression capture device. The computer program may be integrated into the application or may run as a separate tool-like application.
Please refer to fig. 1, which provides a schematic flow chart of a facial expression capturing method according to an embodiment of the present application.
As shown in fig. 1, the method of the embodiment of the present application may include the steps of:
s101, acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
the facial image to be recognized is acquired through the image acquisition equipment, the image acquisition equipment can be a single common camera, and the single common camera is combined with the facial expression capturing device to simplify the facial expression capturing process.
In a possible implementation manner, when extracting the target implicit coding of the facial image to be recognized, firstly, the facial image to be recognized is obtained through a single common camera, and then the target implicit coding of the facial image to be recognized is extracted from a random implicit coding matrix generated in advance by adopting a meta-learner.
Further, a pre-generated random implicit coding matrix is obtained according to the following steps of firstly collecting facial expression data, then calculating prior distribution of the facial expression data, and finally constructing the random implicit coding matrix according to the prior distribution.
S102, inputting a face image to be recognized and a target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the face image to be recognized;
the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics.
Generally, a pre-trained facial expression capture model comprises a general feature extraction module, an identity ID extraction module, an expression optimization module, a head pose extraction module and a fusion module.
In the embodiment of the application, when facial expression information corresponding to a facial image to be recognized is output, the facial image to be recognized is firstly input into a general feature extraction module, a feature space of the facial image to be recognized is output, an identity ID feature corresponding to the facial image to be recognized is extracted from the feature space by adopting a target implicit code and identity ID extraction module, a final expression feature corresponding to the facial image to be recognized is extracted from the feature space by adopting an expression extraction module and an expression optimization module, a head pose feature corresponding to the facial image to be recognized is extracted from the feature space by adopting a head pose extraction module, the identity ID feature, the final expression feature and the head pose feature are input into a fusion module for feature fusion, the facial expression feature corresponding to the facial image to be recognized is output, and the facial expression information corresponding to the facial image to be recognized is recovered according to the facial expression feature.
Specifically, when the expression extraction module and the expression optimization module are used for extracting the final expression features corresponding to the face image to be recognized from the feature space, firstly the expression extraction module is used for extracting the region features of different face regions corresponding to the face image to be recognized from the feature space, then a plurality of natural expression images under different illumination conditions associated with the face image to be recognized are obtained, secondly the expression optimization module is used for extracting the natural expression features of the natural expression images under the different illumination conditions, and finally the final expression features corresponding to the face image to be recognized are determined according to the region features of the different face regions and the extracted natural expression features.
Specifically, when the final expression feature corresponding to the face image to be recognized is determined according to the region features of different face regions and the extracted natural expression features, the region features of the different face regions and the extracted natural expression features are subjected to feature fusion to obtain a first feature, then expression coding operation is performed on a plurality of natural expression images under different illumination conditions to obtain a second feature, and finally the difference between the first feature and the second feature is calculated to obtain the final expression feature corresponding to the face image to be recognized.
In one possible implementation, facial expressions are a technical difficulty of face-complementary capture due to their richness, complexity, and dynamics. Most techniques compromise between the three, i.e., either rich enough but not complex enough, or complex enough but not dynamic stability. According to the method and the device, the expression extraction module is used for coding the positions of the expressions in a blocking mode, namely, the face is divided into different areas to extract features, so that the local accuracy is guaranteed, and correct extraction of up to 600 micro-expressions is guaranteed.
Furthermore, because different areas can be combined, the characteristics of the human face are fully utilized, and a spatial association concept and a time sequence mechanism are introduced into the subsequent network design to further process the features extracted by the expression extraction module. The spatial association concept can lead the network to learn self-adaptive selection combination, thereby achieving the overall richness and complexity; and finally, introducing a time sequence mechanism into the network, so that the network utilizes the front frame and the rear frame to finally realize the dynamic stability of the expression.
On the basis, the expressions have similarity and strong personalized characteristics. Such as laughing, a person may have grinned and a person may not have so much grin. Aiming at the problem that the semantics are consistent but the expression degree is different from person to person, the expression optimization model is introduced as a fine adjustment module to participate in the training of the whole network. The method comprises the steps of collecting 5-10 natural expression images under different illumination conditions in advance for each person, extracting natural expression features from the natural expression images through the expression optimization module, performing primary fusion on the extracted natural expression features of the expression extraction module to obtain first features, and performing expression coding on the natural expression images of the person under the 5-10 different illumination conditions to obtain second features. After the two characteristics are obtained, the characteristic pair difference value is calculated, the result characteristic is used as the expression characteristic finally output by the expression optimization module, the finally output expression characteristic can effectively extract the personalized representation, and the personalized expression control is completed by combining other networks.
Furthermore, after the expression features finally output by the expression optimization module are obtained, considering that the human body moves with the head posture orientation when doing expression, the expression is more natural and expressive only by combining the facial expression and the head movement. Therefore, the head posture extraction module is designed, the module has the task of accurately capturing the head movement of a person, including left-right rotation, up-down pitching and combined posture, and posture characteristics are generated according to captured combined posture information.
Further, when facial expression information corresponding to a facial image to be recognized is recovered according to facial expression features, extreme conditions exist in the expression capturing process, for example, the face movement is blurred too much, the face cannot be completely seen due to an excessively large angle, and the final capturing effect is affected by the ill-conditioned problems. In order to solve the problems, the space-time information is integrated on the basis of keeping the generalization capability of the original network, and the expression information of the current frame is restored by finding out the similar frame or the front and back normal frames of the pathological frame according to the design idea of the network, so that the network has better stability and robustness.
For example, as shown in fig. 2, fig. 2 is a schematic process diagram of a facial expression capturing process of the present application, which includes first obtaining a facial image to be recognized and extracting a target implicit code of the facial image to be recognized, then inputting the facial image to be recognized into a general feature extraction module to extract a feature space, extracting an identity ID feature in the feature space by using the target implicit code and the identity ID extraction module, then extracting a final expression feature in the feature space by using an expression extraction module and an expression optimization module, then extracting a head pose feature in the feature space by using a head pose extraction module, and finally inputting the identity ID feature, the final expression feature, and the head pose feature into a fusion module to perform feature fusion, outputting a facial expression feature corresponding to the facial image to be recognized, and recovering facial expression information corresponding to the facial image to be recognized based on the facial expression feature.
Further, after the facial expression information is recovered, the facial expression information can be transferred to a virtual character for practical application.
In the embodiment of the application, a facial expression capturing device firstly acquires a facial image to be recognized and extracts a target implicit code of the facial image to be recognized, then inputs the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputs facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression features, and the facial expression features are generated by fusing the extracted multiple features. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.
Please refer to fig. 3, which provides a schematic flow chart of a facial expression capturing model training method according to an embodiment of the present application. As shown in fig. 3, the method of the embodiment of the present application may include the following steps:
s201, collecting and labeling a plurality of facial expression data to obtain labeled facial expression data;
s202, performing data enhancement on the labeled facial expression data to obtain enhanced data;
in a possible implementation manner, firstly, a plurality of facial expression data are collected, then, the collected facial expression data of a plurality of persons are manually labeled, and finally, data enhancement operation is performed on the labeled data so as to perform data expansion on the labeled data. Wherein, the data enhancement at least comprises data translation and rotation operations.
S203, establishing a facial expression capturing model;
s204, acquiring implicit codes of each datum in the enhanced data;
s205, sequentially inputting each data in the enhanced data and the corresponding implicit codes into a facial expression capturing model for model training, and outputting a model loss value;
in a possible implementation manner, in a model training process, random implicit codes of each training data are introduced at input, a coding consistency loss function is adopted at output to carry out consistency constraint on the identity ID feature of each training data, namely, the similarity between the identity ID feature and the corresponding implicit codes is calculated, the similarity is determined as a loss value of the model to be output, and when the loss value reaches a minimum value, the model can be determined to finish convergence.
And S206, when the model loss value reaches the minimum value, generating a pre-trained facial expression capturing model.
In one possible implementation, a pre-trained facial expression capture model is generated when the model loss value reaches a minimum, or when the model loss value does not reach a minimum, the loss value is propagated backwards to update parameters of the model, and the model training is continued until the loss value reaches a minimum.
For example, as shown in fig. 4, fig. 4 is a general flow diagram of a facial expression capturing method provided by the present application, where a basic model in fig. 4 is a facial expression capturing model, n facial pictures are collected and input into the basic model in combination with an implicit code of each facial picture for model training, a pre-trained facial expression capturing model is obtained after training of the basic model is completed, in an actual application scenario, after a new facial picture is obtained, an implicit code of the new facial picture is extracted by a meta-learner, and the new facial picture and the implicit code of the new facial picture are input into the trained basic model together, so that an ID feature and other features of the new facial picture can be obtained, and feature fusion can be performed subsequently.
The method and the device introduce a meta-learning mechanism based on the prior characteristic of implicit coding, and simultaneously introduce a coding consistency loss function through an ID coding network, so that the network can extract unique ID coding for any person.
In the embodiment of the application, a facial expression capturing device firstly acquires a facial image to be recognized and extracts a target implicit code of the facial image to be recognized, then inputs the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputs facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression features, and the facial expression features are generated by fusing the extracted multiple features. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 5, a schematic structural diagram of a facial expression capture device according to an exemplary embodiment of the present invention is shown. The facial expression capturing device can be realized by software, hardware or a combination of the software and the hardware to form all or part of the terminal. The device 1 comprises an implicit code acquisition module 10 and a facial expression information output module 20.
The implicit code acquisition module 10 is configured to acquire a face image to be recognized and extract a target implicit code of the face image to be recognized;
the facial expression information output module 20 is configured to input the facial image to be recognized and the target implicit code into a pre-trained facial expression capture model, and output facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression features, and the facial expression features are generated by fusing the extracted multiple features.
It should be noted that, when the facial expression capturing apparatus provided in the foregoing embodiment executes the facial expression capturing method, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the facial expression capturing apparatus and the facial expression capturing method provided by the above embodiments belong to the same concept, and the detailed implementation process thereof is referred to as the method embodiment, which is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
In the embodiment of the application, a facial expression capturing device firstly acquires a facial image to be recognized and extracts a target implicit code of the facial image to be recognized, then inputs the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputs facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.
The present invention also provides a computer readable medium, on which program instructions are stored, which program instructions, when executed by a processor, implement the facial expression capturing method provided by the above-mentioned method embodiments.
The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the facial expression capturing method of the various method embodiments described above.
Please refer to fig. 6, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 6, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 1001 may include one or more processing cores, among other things. The processor 1001, which is connected to various parts throughout the electronic device 1000 using various interfaces and lines, performs various functions of the electronic device 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 1005 and calling data stored in the memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may alternatively be at least one memory device located remotely from the processor 1001. As shown in fig. 6, a memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a facial expression capture application.
In the terminal 1000 shown in fig. 6, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the facial expression capture application stored in the memory 1005, and specifically perform the following operations:
acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
inputting a facial image to be recognized and a target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics.
In one embodiment, when performing target implicit coding for acquiring a face image to be recognized and extracting the face image to be recognized, the processor 1001 specifically performs the following operations:
acquiring a face image to be recognized;
and extracting a target implicit code of the face image to be recognized from a pre-generated random implicit code matrix by adopting a meta-learner.
In one embodiment, the processor 1001 further performs the following operations in generating the random implicit encoding matrix:
collecting facial expression data;
calculating prior distribution of the facial expression data;
and constructing a random implicit coding matrix according to the prior distribution.
In one embodiment, when the processor 1001 performs the operations of inputting the facial image to be recognized and the target implicit code into the pre-trained facial expression capture model and outputting facial expression information corresponding to the facial image to be recognized, the following operations are specifically performed:
inputting a face image to be recognized into a general feature extraction module, and outputting a feature space of the face image to be recognized;
extracting the identity ID characteristics corresponding to the face image to be recognized from the characteristic space by adopting a target implicit coding and identity ID extraction module;
extracting final expression features corresponding to the facial image to be recognized from the feature space by adopting an expression extraction module and an expression optimization module;
extracting head posture characteristics corresponding to the face image to be recognized from the characteristic space by adopting a head posture extraction module;
inputting the identity ID characteristic, the final expression characteristic and the head posture characteristic into a fusion module for characteristic fusion, and outputting a facial expression characteristic corresponding to the facial image to be recognized;
and restoring the facial expression information corresponding to the facial image to be recognized according to the facial expression characteristics.
In an embodiment, when the processor 1001 extracts the final expression feature corresponding to the facial image to be recognized from the feature space by using the expression extraction module and the expression optimization module, the following operations are specifically performed:
extracting the regional characteristics of different face regions corresponding to the face image to be recognized from the characteristic space by adopting an expression extraction module;
acquiring a plurality of natural expression images under different illumination conditions associated with the face image to be recognized;
extracting natural expression characteristics of a plurality of natural expression images under different illumination conditions by adopting an expression optimization module;
and determining the final expression characteristics corresponding to the face image to be recognized according to the area characteristics of different face areas and the extracted natural expression characteristics.
In one embodiment, when the processor 1001 determines the final expression feature corresponding to the face image to be recognized according to the region features of different face regions and the extracted natural expression feature, the following operations are specifically performed:
carrying out feature fusion on the regional features of different face regions and the extracted natural expression features to obtain a first feature;
performing expression coding operation on a plurality of natural expression images under different illumination conditions to obtain a second characteristic;
and calculating the difference value of the first characteristic and the second characteristic to obtain the final expression characteristic corresponding to the face image to be recognized.
In one embodiment, the processor 1001, when generating the pre-trained facial expression capture model, specifically performs the following operations:
collecting and labeling a plurality of facial expression data to obtain labeled facial expression data;
performing data enhancement on the labeled facial expression data to obtain enhanced data;
creating a facial expression capturing model;
acquiring implicit codes of each datum in the enhanced data;
sequentially inputting each data in the enhanced data and the implicit codes corresponding to the data into a facial expression capturing model for model training, and outputting a model loss value;
and when the model loss value reaches the minimum value, generating a pre-trained facial expression capturing model.
In the embodiment of the application, a facial expression capturing device firstly acquires a facial image to be recognized and extracts a target implicit code of the facial image to be recognized, then inputs the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputs facial expression information corresponding to the facial image to be recognized; the facial expression information is generated according to facial expression characteristics, and the facial expression characteristics are generated by fusing the extracted multiple characteristics. Because the method and the device extract various features to be fused into new features to carry out model training, the capturing precision of the model to the facial expression information can be effectively improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program to instruct associated hardware, and the program for facial expression capture may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. A method for capturing facial expressions, the method comprising:
acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
inputting the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model, and outputting facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression features, and the facial expression features are generated by fusing the extracted multiple features.
2. The method of claim 1, wherein the obtaining of the face image to be recognized and the extracting of the target implicit code of the face image to be recognized comprise:
acquiring a face image to be recognized;
and extracting a target implicit code of the face image to be recognized from a pre-generated random implicit coding matrix by adopting a meta-learner.
3. The method of claim 2, wherein obtaining the pre-generated random implicit coding matrix comprises:
collecting facial expression data;
calculating prior distribution of the facial expression data;
and constructing a random implicit coding matrix according to the prior distribution.
4. The method of claim 1, wherein the pre-trained facial expression capture model comprises a generic feature extraction module, an identity ID extraction module, an expression optimization module, a head pose extraction module, and a fusion module;
the step of inputting the facial image to be recognized and the target implicit code into a pre-trained facial expression capture model and outputting facial expression information corresponding to the facial image to be recognized comprises the following steps:
inputting the face image to be recognized into the general feature extraction module, and outputting a feature space of the face image to be recognized;
extracting the identity ID features corresponding to the face image to be recognized from the feature space by adopting the target implicit code and the identity ID extraction module;
extracting final expression features corresponding to the facial image to be recognized from the feature space by adopting the expression extraction module and the expression optimization module;
extracting head pose features corresponding to the face image to be recognized from the feature space by adopting the head pose extraction module;
inputting the identity ID feature, the final expression feature and the head posture feature into the fusion module for feature fusion, and outputting the facial expression feature corresponding to the facial image to be recognized;
and restoring the facial expression information corresponding to the facial image to be recognized according to the facial expression characteristics.
5. The method of claim 4, wherein the extracting, by using the expression extraction module and the expression optimization module, the final expression feature corresponding to the facial image to be recognized from the feature space comprises:
extracting the regional characteristics of different face regions corresponding to the face image to be recognized from a characteristic space by adopting the expression extraction module;
acquiring a plurality of natural expression images under different illumination conditions associated with the face image to be recognized;
extracting natural expression characteristics of a plurality of natural expression images under different illumination conditions by adopting the expression optimization module;
and determining the final expression characteristics corresponding to the face image to be recognized according to the regional characteristics of the different face regions and the extracted natural expression characteristics.
6. The method of claim 5, wherein determining the final expression features corresponding to the face image to be recognized according to the region features of the different face regions and the extracted natural expression features comprises:
carrying out feature fusion on the regional features of the different face regions and the extracted natural expression features to obtain first features;
performing expression coding operation on the plurality of natural expression images under different illumination conditions to obtain a second characteristic;
and calculating the difference value of the first characteristic and the second characteristic to obtain the final expression characteristic corresponding to the face image to be recognized.
7. The method of claim 1, wherein generating a pre-trained facial expression capture model comprises:
collecting and labeling a plurality of facial expression data to obtain labeled facial expression data;
performing data enhancement on the labeled facial expression data to obtain enhanced data;
creating a facial expression capturing model;
obtaining an implicit code of each data in the enhanced data;
sequentially inputting each data in the enhanced data and the implicit codes corresponding to the data into the facial expression capturing model for model training, and outputting a model loss value;
and when the model loss value reaches the minimum value, generating a pre-trained facial expression capturing model.
8. A facial expression capture apparatus, the apparatus comprising:
the implicit code acquisition module is used for acquiring a face image to be recognized and extracting a target implicit code of the face image to be recognized;
the facial expression information output module is used for inputting the facial image to be recognized and the target implicit code into a pre-trained facial expression capturing model and outputting facial expression information corresponding to the facial image to be recognized; wherein,
the facial expression information is generated according to facial expression features, and the facial expression features are generated by fusing the extracted multiple features.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method according to any one of claims 1 to 7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of claims 1-7.
CN202210744443.6A 2022-06-28 2022-06-28 Facial expression capturing method and device, storage medium and terminal Pending CN115294624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210744443.6A CN115294624A (en) 2022-06-28 2022-06-28 Facial expression capturing method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210744443.6A CN115294624A (en) 2022-06-28 2022-06-28 Facial expression capturing method and device, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN115294624A true CN115294624A (en) 2022-11-04

Family

ID=83819498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210744443.6A Pending CN115294624A (en) 2022-06-28 2022-06-28 Facial expression capturing method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN115294624A (en)

Similar Documents

Publication Publication Date Title
US11748934B2 (en) Three-dimensional expression base generation method and apparatus, speech interaction method and apparatus, and medium
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
EP3872766A2 (en) Method and device for processing image, related electronic device and storage medium
CN108876886B (en) Image processing method and device and computer equipment
JP2024522287A (en) 3D human body reconstruction method, apparatus, device and storage medium
CN113507627B (en) Video generation method and device, electronic equipment and storage medium
CN111291674B (en) Method, system, device and medium for extracting expression actions of virtual figures
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN109035415B (en) Virtual model processing method, device, equipment and computer readable storage medium
CN114821675B (en) Object processing method and system and processor
CN113705295A (en) Object posture migration method, device, equipment and storage medium
CN111667588A (en) Person image processing method, person image processing device, AR device and storage medium
WO2024174422A1 (en) Model generation method and apparatus, electronic device, and storage medium
CN114187165A (en) Image processing method and device
CN112308977A (en) Video processing method, video processing apparatus, and storage medium
CN115497149A (en) Music interaction method for automobile cabin
CN111028318A (en) Virtual face synthesis method, system, device and storage medium
WO2024104144A1 (en) Image synthesis method and apparatus, storage medium, and electrical device
WO2024066549A1 (en) Data processing method and related device
CN111597926A (en) Image processing method and device, electronic device and storage medium
CN114677476B (en) Face processing method, device, computer equipment and storage medium
CN115294624A (en) Facial expression capturing method and device, storage medium and terminal
CN115035219A (en) Expression generation method and device and expression generation model training method and device
CN114862716A (en) Image enhancement method, device and equipment for face image and storage medium
CN112132107A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination