CN116109476A

CN116109476A - Expression migration method, device, equipment and medium

Info

Publication number: CN116109476A
Application number: CN202310117466.9A
Authority: CN
Inventors: 李冰川
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2023-02-09
Filing date: 2023-02-09
Publication date: 2023-05-12

Abstract

The application discloses an expression migration method, an expression migration device, expression migration equipment and expression migration medium, wherein a first face image and a second face image are acquired, a first face deformation coefficient is acquired from the first face image, and a second face deformation coefficient is acquired from the second face image. And performing graph rendering by using the expression coefficient in the second facial deformation coefficient and the non-expression coefficient in the first facial deformation coefficient to obtain a first model graph. Inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image, wherein the expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image. Namely, when the expression migration is carried out, the deformation coefficient of the first face image and the deformation coefficient of the second face image are extracted, information referenced by the expression migration is increased, and the accuracy of the expression migration is improved.

Description

Expression migration method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of computers, and in particular relates to an expression migration method, an expression migration device, expression migration equipment and an expression migration medium.

Background

The facial expression is processed and analyzed to become a research hot spot in the fields of computer vision and graphics, and facial expression migration is also widely applied. The facial expression migration refers to mapping the captured expression of the user to another target image, so as to achieve the purpose of migrating the facial expression to the target image.

At present, when facial expression migration is realized, an expression migration model needs to be trained, and a large amount of computing resources need to be consumed when the expression migration model is trained, so that the expression migration cost is high.

Disclosure of Invention

In view of this, the present disclosure provides an expression migration method, apparatus, device, and medium, so as to improve expression migration efficiency and reduce migration cost.

In order to achieve the above purpose, the technical scheme provided by the present disclosure is as follows:

in a first aspect of the present disclosure, there is provided an expression migration method, including:

acquiring a first face image and a second face image;

acquiring a first face deformation coefficient according to the first face image, and acquiring a second face deformation coefficient according to the second face image;

obtaining a first model diagram by using the expression coefficient in the second facial deformation coefficient and the non-expression coefficient in the first facial deformation coefficient;

Inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image, wherein the expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image.

In a second aspect of the present disclosure, there is provided an expression migration apparatus, including:

the first acquisition unit is used for acquiring a first face image and a second face image;

the second acquisition unit is used for acquiring a first face deformation coefficient according to the first face image and acquiring a second face deformation coefficient according to the second face image;

the second obtaining unit is further configured to obtain a first model diagram by using an expression coefficient in the second facial deformation coefficient and a non-expression coefficient in the first facial deformation coefficient;

the third obtaining unit is used for inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image, the expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image.

In a third aspect of the present disclosure, there is provided an electronic device, the device comprising: a processor and a memory;

the memory is used for storing instructions or computer programs;

the processor is configured to execute the instructions or the computer program in the memory, so that the electronic device performs the method according to the first aspect.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having instructions stored therein which, when run on a device, cause the device to perform the method of the first aspect.

In a fifth aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of the first aspect.

From this, the present disclosure has the following beneficial effects:

in the present disclosure, a first face image and a second face image, which is a face image providing an expression, are first acquired. And respectively acquiring first face deformation coefficients from the first face image and acquiring second face deformation coefficients from the second face image. And performing graph rendering by using the expression coefficient in the second facial deformation coefficient and the non-expression coefficient in the first facial deformation coefficient to obtain a first model graph. Inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image, wherein the expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image. That is, through the scheme provided by the present disclosure, when performing the expression migration, the deformation coefficient of the first face image and the deformation coefficient of the second face image are extracted, the information referred by the expression migration is increased, the expression migration can be realized by using the single Zhang Cankao image (the second face image), and the accuracy of the expression migration is improved due to the increase of the referred information.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a flowchart of a method for generating an expression migration model according to an embodiment of the present disclosure;

FIG. 2a is a schematic diagram of obtaining a training 3D model diagram according to an embodiment of the present disclosure;

fig. 2b is a schematic structural diagram of an expression migration model according to an embodiment of the present disclosure;

fig. 2c is a schematic structural diagram of an alignment module according to an embodiment of the disclosure;

FIG. 2d is a schematic diagram of a feature deformation process according to an embodiment of the present disclosure;

fig. 3 is a flowchart of an expression migration method provided in an embodiment of the present disclosure;

fig. 4 is a block diagram of an expression migration apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that those skilled in the art will better understand the present disclosure, a technical solution in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

At present, when expression migration is realized, at least two reference images are needed to realize the expression migration, and when the traditional expression migration is used for generating the migrated images, the expression of the generated images is inaccurate due to limited reference information, so that the migration efficiency is influenced, and the migration quality is also influenced.

Based on this, the present disclosure provides an expression migration method, specifically, when performing expression migration, a source image (first face image) and an expression reference image (second face image) are acquired. The method comprises the steps of respectively obtaining a first face deformation coefficient from a first face image and a second face deformation coefficient from a second face image, performing image rendering through an expression coefficient in the first face deformation coefficient and a non-expression coefficient in the second face deformation coefficient to obtain a first model image, and further generating a third face image according to the first face image, the second face image and the first model image. That is, when the expression migration is performed, the facial deformation coefficient is referred to, and the reference information is increased, so that the number of the input expression reference images is reduced, and meanwhile, the accuracy of the expression migration is improved.

It will be appreciated that prior to using the technical solutions of the various embodiments in the present disclosure, the user should be informed of the type of personal information involved, the scope of use, the use scenario, etc. and obtain the authorization of the user in an appropriate manner according to the relevant laws and regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can select whether to provide personal information to the software or hardware such as the electronic equipment, the application program, the server or the storage medium for executing the operation of the technical scheme according to the prompt information. Wherein, the synthesized service may cause public confusion or misidentification, and the generated or edited information content should be marked in a reasonable position and area.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative, and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

In order to facilitate understanding of the technical solution provided in the present disclosure, the following will describe the expression migration model training process in the present disclosure.

Referring to fig. 1, the flowchart of a method for generating an expression migration model according to an embodiment of the present disclosure is shown in fig. 1, where the method may be performed by an expression migration client, and the client may be deployed in an electronic device. The electronic device may include a mobile phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted terminal, a wearable electronic device, an integrated machine, an intelligent home device, or a virtual machine or a simulator. As shown in fig. 1, the method may include the steps of:

s101: a training set is obtained, wherein the training set comprises an original face image sample and a target face image sample.

In this embodiment, an expression migration model is generated for training, and a large number of training sets including facial image samples with rich expressions may be acquired. Specifically, one face image sample may be selected from the training set as an original face image sample, and another face image sample may be selected as a target face image sample for providing the expression to be migrated. For example, a face image sample X is selected from the training set as an original face image sample, and a face image sample Y is selected from the training set as a target face image sample. Or selecting a face image sample from the training set, wherein the face image sample can be used as an original face image sample or a target face image sample. For example, a face image sample X is selected from the training set as the original face image sample and also as the target face image sample.

S102: and obtaining an original face deformation coefficient according to the original face image sample, and obtaining a target face deformation coefficient according to the target face image sample.

After the original face image sample and the target face image sample are obtained, the original face deformation coefficient and the target face deformation coefficient are respectively obtained from the original face image sample. In particular, face 3D deformation coefficients may be extracted using an emotion-driven monocular face capture and animation (Emotion Driven Monocular Face Capture and Animation, EMOCA) model. The facial deformation coefficients can comprise expression coefficients and non-expression coefficients. The non-expression coefficients may include facial marks, head pose information, illumination information, and the like.

In some embodiments, since the original face image sample is used for providing the migrated face image, expression information is not required to be provided, when the original face deformation coefficient is obtained according to the original face image sample, the original face image sample is selected to be subjected to de-expression processing, a processed original face image sample is obtained, and the original face deformation coefficient is extracted from the processed original face image sample. Specifically, the pre-trained de-expression model may be used to process the original face image sample to convert any expression in the original face image sample into a neutral expression.

S103: and performing graph rendering based on the expression coefficient in the target facial deformation coefficient and the non-expression coefficient in the original facial deformation coefficient to obtain a first training model graph.

In this embodiment, after the target facial deformation coefficient and the original facial deformation coefficient are respectively obtained, the first training model diagram may be rendered and obtained by combining the expression coefficient in the target facial deformation coefficient and the non-expression coefficient in the original facial deformation coefficient. The first training model diagram is an expression model diagram and is used for providing geometric information of expressions, and the first training model diagram is a 3D model diagram.

For example, in the processing frame diagram shown in fig. 2a, the target face image sample extracts the 3D deformation coefficient of the target face through the EMOCA model, and the original face image sample firstly inputs the 3D morphological coefficient of the original face to the EMOCA model through the de-expression model. And inputting the expression coefficient in the target face 3D deformation coefficient and the non-expression coefficient in the original face 3D morphological coefficient into a renderer to obtain a first training 3D model diagram.

S104: and inputting the original face image sample, the target face image sample and the first training model image into an initial expression migration model to obtain an expression migration image.

In the example, an obtained original face image sample, a target face image sample and a first training model image are input into an initial expression migration model to obtain an expression migration image. The expression in the expression migration image is from the expression provided by the target face image sample, and the face in the expression migration image is the face provided by the original face image sample.

In some embodiments, if the original face image sample is subjected to the de-expression processing, the processed original face image sample, the target face image sample and the first training model image are input into an initial expression migration model, and an expression migration image is obtained.

S105: the initial expression migration model is trained by judging a first judgment loss corresponding to the original face image sample and the expression migration image through a first discriminator, and the initial expression migration model is trained by judging a second judgment loss corresponding to the expression migration image and the target face image sample through a second discriminator.

After the expression migration image is obtained through the initial expression migration model, the face in the original face image sample and the face in the expression migration image are distinguished through a first discriminator, a first distinguishing loss is obtained, and the initial expression migration model is trained through the first distinguishing loss. Meanwhile, the expression in the expression migration image and the expression in the target face image sample are judged through a second discriminator, a second judgment loss is obtained, and the initial expression migration model is trained through the second judgment loss. The determining, by the second discriminator, the expression in the expression migration image and the expression in the target face image sample may include extracting a migration face deformation coefficient from the expression migration image, and determining, by the second discriminator, similarity between the expression coefficient in the migration face deformation coefficient and the expression coefficient in the target face deformation coefficient, to obtain a second determination loss.

The initial expression migration model can be trained by using the first discrimination loss and the second discrimination loss in a serial mode. For example, the initial expression migration model is trained by using the first discrimination loss to obtain a trained initial expression migration model, and the trained initial expression migration model is trained again by using the second discrimination loss. The initial expression migration model can also be trained by utilizing the first discrimination loss and the second discrimination loss in a parallel mode.

In particular implementations, a generated countermeasure network can be constructed to train a generated expression migration model, the generated countermeasure network including a generator, a first discriminant, and a second discriminant. The generator is equivalent to a generating network, the first discriminator and the second discriminator form a discriminating network, the input of the discriminating network is a real sample and a reconstructed image output by the generating network, the purpose of the generator is to distinguish the reconstructed image output by the generating network from the real sample as far as possible, the generating network generates the reconstructed image which can deceive the discriminating network as far as possible, the two networks are mutually opposed, parameters are continuously adjusted, and finally the discriminating network cannot judge the authenticity of the reconstructed image of the generating network.

In some embodiments, the initial expression migration model includes an encoder and a generator, and the method includes inputting an original face image sample, a target face image sample, and a first training model map into the initial expression migration model to obtain an expression migration image, including: extracting original features from the original face image sample, model features from the first training model diagram and target features from the target face image sample by using an encoder; and inputting the original features, the model features and the target features into a generator to obtain the expression migration image. After the original face image sample, the target face image sample and the first training model image are input into an initial expression migration model, the corresponding features are extracted through an encoder, and the extracted features are input into a generator to obtain an expression migration image. And if the original face image sample is subjected to expression removal processing, extracting original features from the processed original face image sample.

Wherein the extracted original, model and target features may each comprise global and local features, or global features, or local features. Or a part of the original features, the model features and the target features are extracted to obtain local features, and the other part is extracted to obtain local features and global features. For example, the original features and the target features are local features, and the model features include global features and local features. The global feature is that a feature extraction is performed on global information of each pedestrian picture, and the global feature does not have any spatial information. The local feature is to extract a feature of a certain region in the image, and finally, a plurality of features are fused together to be used as a final feature. Specifically, in the local feature extraction, the extraction may be performed by horizontal segmentation, gesture information, segmentation information (e.g., segmenting a person), or a mesh.

In extracting the local features, features of different resolution levels may be extracted, and specifically, features of a high resolution level, features of a medium resolution level, and features of a low resolution level may be included.

In some implementations, to increase the efficiency of feature extraction, the encoder may include a first encoder, a second encoder, and a third encoder. Specifically, the original features are extracted from the original face image sample by a first encoder, the model features are extracted from the first training model map by a second encoder, and the target features are extracted from the target face image sample by a third encoder.

In some embodiments, since the first training model map may provide expression information and the target face image sample may also provide expression information, but the face images of the first training model map and the target face image sample are inconsistent, there may be phenomena such as different face orientations, different face sizes, and different face shapes, and in order to keep the face information of the generated expression migration image consistent with the face image in the target face image sample, feature alignment processing is performed on the model feature and the target feature. Specifically, the expression migration model further comprises an alignment module, wherein before the original feature, the model feature and the target feature are input into the generator, the model feature and the target feature are input into the alignment model, and the fourth feature is obtained. The method comprises the steps of carrying out alignment processing on model features through target features to obtain aligned features, inputting original features and the aligned features into a generator, and obtaining expression migration images.

In some embodiments, if the model feature includes a global feature and a local feature, and the target feature is a local feature, when performing alignment processing, the local feature and the target feature in the model feature are input into an alignment module, so as to obtain an aligned feature. And when the expression migration is carried out, inputting the original features, the aligned features and global features in the model features into a generator to obtain an expression migration image.

To facilitate understanding of the structure of the initial expression migration model, the model structure shown in fig. 2b includes a first encoder, a second encoder, a third encoder, an alignment module MFAT, and a producer G. The first encoder inputs an original face image sample or an original face image sample subjected to expression removal processing, extracts local features and inputs the local features into the generator G; the second encoder inputs the first training 3D model mapping, extracts local features and global features, and inputs a local feature input alignment module MFAT and a global feature input generator G; the third encoder inputs the target face image sample, extracts local features, inputs the local features to the alignment module MFAT, and then inputs the aligned features to the generator G. After each feature is acquired, the generator G generates an expression migration image based on the feature.

The alignment module MFAT has a structure as shown in fig. 2c, and includes a transcoder (Transformer encoders, TEs) and a Multi-head Cross-attention module (Multi-head Cross-attention). Where TEs is a self-attention (self-attention) module. The local features extracted by the second encoder and the third encoder are subjected to Tes extraction, the extracted features are input into a multi-head cross attention module for feature fusion calculation, and the fused features are output. Wherein the target features comprise features of different resolution levels, the model features may comprise only features of the original resolution level.

In some embodiments, the second training model diagram may also be obtained using non-expressive coefficients in the original face shape coefficients; calculating optical flow information between the first training model diagram and the second training model diagram; and performing deformation processing on the original features by utilizing the optical flow information to obtain fifth features. When the expression migration image is obtained, the original features, the model features, the target features and the fifth features are input into a generator to obtain the expression migration image. The optical flow information is the instantaneous speed of the pixel motion of the space moving object on the observation imaging plane, and the corresponding relation between the previous frame and the current frame is found by utilizing the change of the pixels in the image sequence on the time domain and the correlation between the adjacent frames, so that the motion information of the object between the adjacent frames is calculated. Specifically, optical flow information may be obtained based on the coordinate difference of the first training model map and the second training model map. Since the optical flow information can reflect the change between images, the original features can be deformed by using the optical flow information, and then the deformed features are input into the generator. Wherein the generation of the second training model diagram may be seen in fig. 2 a.

For example, in the processing frame diagram shown in fig. 2d, local features of different resolution levels are extracted from an original face image sample or an original face image sample after the expression is removed, and the extracted optical flow information is used to perform deformation processing on the features of different resolution levels and input the deformed features into the generator G.

In a specific training process, in order to enable the expression migration model to migrate for any type of image, two different training strategies can be set during training, wherein one training strategy is that an original face image sample and a target face image sample are the same face image; the other training strategy is that the original face image sample and the target face image sample are different face images, and the two training strategies can be used for training alternately, so that the expression migration model can not only carry out expression migration of the same identity, but also carry out expression migration of different identities.

The foregoing embodiment describes a training process of the expression migration model, and a description will be given below of a use process of the expression migration model with reference to the accompanying drawings.

Referring to fig. 3, the flowchart of an expression migration method provided in an embodiment of the present application, as shown in fig. 3, the method may include:

S301: and acquiring a first face image and a second face image, wherein the second face image is a face image providing expressions.

The first face image and the second face image may be the same face image, i.e. the same face identity, or may be different face images, i.e. different face identities.

S302: and acquiring a first face deformation coefficient according to the first face image, and acquiring a second face deformation coefficient according to the second face image.

Specifically, a first face image is input into an EMOCA model to obtain a first face 3D deformation coefficient, and a second face image is input into the EMOCA model to obtain a second face 3D deformation coefficient. The deformation coefficients may include, among others, the facial expression coefficients and other coefficients (non-expression coefficients). For a specific implementation of extracting the deformation coefficient, reference may be made to the above description of S102, and this embodiment is not repeated here.

In some embodiments, to simplify the processing, the first face image may be first subjected to a de-expression processing to obtain a fourth face image; and extracting the first face deformation coefficient from the fourth face image. Specifically, the first face image is input into a pre-trained de-expression model, and a fourth face image with neutral expression is obtained.

S303: obtaining a first model diagram by using the expression coefficient in the second facial deformation coefficient and the non-expression coefficient in the first facial deformation coefficient;

for graphics rendering using the expression coefficient in the second facial distortion coefficient and the non-expression coefficient in the first facial distortion coefficient, the specific implementation of obtaining the first model diagram may be referred to the above description of S103, which is not repeated here in this embodiment.

S304: and inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image.

The expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image. The identity matching means that the face in the third face image and the face in the first face image are the same face.

In some embodiments, if the first face image is subjected to the de-expression processing, when the third face image is obtained, the fourth face image, the second face image and the first model image are input into the expression migration model, and the third face image is obtained.

In some embodiments, the expression migration model includes an encoder and a generator, and the inputting the first face image or the fourth face image, the second face image, and the first model map into the expression migration model to obtain a third face image includes: extracting first features from the first face image or the fourth face image, second features from the first model image, and third features from the second face image by using an encoder; and inputting the first feature, the second feature and the third feature into a generator to obtain a third face image.

In some embodiments, to improve migration efficiency, the encoder may include a first encoder, a second encoder, and a third encoder, and then the first encoder is used to extract the first feature from the first face image or the fourth face image; extracting a second feature from the first model map using a second encoder; and extracting a third feature from the second face image by using a third encoder. The first feature may be a local feature, the second feature may include a global feature and a local feature, and the third feature may be a local feature. For the specific embodiments of the first feature, the second feature, and the third feature, reference may be made to the relevant descriptions in the example of the method shown in fig. 1.

In some implementations, the expression migration model further includes an alignment module, the method further including, prior to inputting the first feature, the second feature, and the third feature into the generator: inputting the second feature and the third feature into the alignment module to obtain a fourth feature; and inputting the first feature and the fourth feature into a generator to obtain a third face image. For a specific implementation of the alignment module, reference may be made to the relevant description of fig. 2b and 2 c.

In some embodiments, if the third feature is a local feature, the second feature includes a global feature and a local feature, inputting the second feature and the third feature into an alignment module to obtain a fourth feature, including: inputting the local features and the third features in the second features into an alignment module to obtain fourth features; and inputting the global features in the first features, the second features and the fourth features into a generator to obtain a third face image.

In some embodiments, the method further comprises: obtaining a second model diagram by using non-expression coefficients in the first facial deformation coefficients; calculating optical flow information between the first model map and the second model map; and performing deformation processing on the first feature by utilizing the optical flow information to obtain a fifth feature. And inputting the first feature, the fifth feature, the second feature and the third feature into a generator to obtain a third face image, or inputting the global feature and the fourth feature in the first feature, the fifth feature, the second feature into the generator to obtain the third face image.

For a specific implementation of calculating optical flow information and performing the morphing processing on the first feature using the optical flow information, reference may be made to the description of the above embodiment.

In some embodiments, the expression migration model may also be corrected to improve the accuracy of the expression migration model. Specifically, the expression migration model is corrected using the first discrimination loss and the second discrimination loss. The first discrimination loss is a similarity loss between the first face image and the third face image, and the second discrimination loss is a similarity loss between the expression coefficient of the third face image and the expression coefficient of the second face image.

When the expression migration is performed, a first face image and a second face image are acquired, wherein the second face image is a face image providing the expression. And respectively acquiring first face deformation coefficients from the first face image and acquiring second face deformation coefficients from the second face image. And performing graph rendering by using the expression coefficient in the second facial deformation coefficient and the non-expression coefficient in the first facial deformation coefficient to obtain a first 3D model graph. Inputting the first face image, the second face image and the first 3D model image into an expression migration model to obtain a third face image, wherein the expression of the third face image is matched with the expression of the second face image, and the identity of the third face image is matched with the identity of the first face image. That is, through the scheme provided by the present disclosure, when performing the expression migration, the deformation coefficient of the first face image and the deformation coefficient of the second face image are extracted, the information referred by the expression migration is increased, the expression migration can be realized by using the single Zhang Cankao image (the second face image), and the accuracy of the expression migration is improved due to the increase of the referred information.

Based on the above method embodiments, the embodiments of the present application provide an expression migration device and an electronic device, and the description will be made with reference to the accompanying drawings.

Referring to fig. 4, the diagram is a structural diagram of an expression migration apparatus provided in an embodiment of the present application, and as shown in fig. 4, the apparatus may include: a first acquisition unit 401, a second acquisition unit 402, and a third acquisition unit 403.

Wherein, the first acquiring unit 401 is configured to acquire a first face image and a second face image;

a second obtaining unit 402, configured to obtain a first face deformation coefficient according to the first face image, and obtain a second face deformation coefficient according to the second face image;

the second obtaining unit 402 is further configured to obtain a first model map by using an expression coefficient in the second facial deformation coefficient and a non-expression coefficient in the first facial deformation coefficient;

a third obtaining unit 403, configured to input the first face image, the second face image, and the first model image into an expression migration model, obtain a third face image, where an expression of the third face image matches an expression of the second face image, and an identity of the third face image matches an identity of the first face image.

In some embodiments, the second obtaining unit 402 is specifically configured to perform de-expression processing on the first face image to obtain a fourth face image; extracting a first face deformation coefficient from the fourth face image;

The third obtaining unit 403 is specifically configured to input the fourth face image, the second face image, and the first model map into an expression migration model, and obtain a third face image.

In some embodiments, the third obtaining unit 403 is specifically configured to extract a first feature from the first face image, extract a second feature from the first model map, and extract a third feature from the second face image by using an encoder in an expression migration model; and inputting the first feature, the second feature and the third feature into a generator in an expression migration model to obtain a third face image.

In some embodiments, the apparatus further comprises: an alignment unit;

the alignment unit is used for inputting the first feature, the second feature and the third feature into an alignment module in an expression migration model before inputting the second feature and the third feature into the generator to obtain a fourth feature;

the third obtaining unit 403 is specifically configured to input the first feature and the fourth feature into a generator in an expression migration model, and obtain a third face image.

In some embodiments, the alignment unit is specifically configured to input the local feature of the second feature and the third feature into an alignment module in an expression migration model to obtain a fourth feature;

the third obtaining unit 403 is specifically configured to input the global feature and the fourth feature of the first feature, the second feature into the generator, and obtain a third face image.

In some embodiments, the second obtaining unit 402 is further configured to obtain a second model map using non-expression coefficients in the first facial deformation coefficients; calculating optical flow information between the first model map and the second model map; performing deformation processing on the first feature by utilizing the optical flow information to obtain a fifth feature;

the third obtaining unit 403 is specifically configured to input the first feature, the fifth feature, the second feature, and the third feature into the generator, and obtain a third face image.

In some embodiments, the apparatus further comprises: a correction unit;

the correction unit is configured to correct the expression migration model by using a first discrimination loss and a second discrimination loss, where the first discrimination loss is a similarity loss between the first face image and the third face image, and the second discrimination loss is a similarity loss between an expression coefficient of the third face image and an expression coefficient of the second face image.

It should be noted that, for specific implementation of each unit in this embodiment, reference may be made to the related description in the above method embodiment. The division of the units in the embodiments of the disclosure is illustrative, and is merely a logic function division, and there may be another division manner when actually implemented. Each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. For example, in the above embodiment, the processing unit and the transmitting unit may be the same unit or may be different units. The integrated units may be implemented in hardware or in software functional units.

Referring to fig. 5, a schematic structural diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure belongs to the same inventive concept as the method provided by the above embodiment, and technical details not described in detail in the present embodiment can be seen in the above embodiment, and the present embodiment has the same beneficial effects as the above embodiment.

The present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method provided by the above embodiments.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (Hyper Text Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method described above.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Where the names of the units/modules do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that, in the present description, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system or device disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple, and the relevant points refer to the description of the method section.

It should be understood that in this disclosure, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An expression migration method, characterized in that the method comprises the following steps:

acquiring a first face image and a second face image;

2. The method of claim 1, wherein the obtaining a first face deformation coefficient from the first face image comprises:

performing expression removal processing on the first face image to obtain a fourth face image;

extracting a first face deformation coefficient from the fourth face image;

inputting the first face image, the second face image and the first model image into an expression migration model to obtain a third face image, including:

and inputting the fourth face image, the second face image and the first model image into an expression migration model to obtain a third face image.

3. The method of claim 1, wherein the inputting the first face image, the second face image, and the first model map into an expression migration model to obtain a third face image comprises:

extracting a first feature from the first face image, a second feature from the first model map, and a third feature from the second face image by using an encoder in the expression migration model;

and inputting the first feature, the second feature and the third feature into a generator in the expression migration model to obtain a third face image.

4. The method of claim 3, wherein prior to inputting the first feature, the second feature, and the third feature into a generator in the expression migration model, the method further comprises:

inputting the second feature and the third feature into an alignment module in an expression migration model to obtain a fourth feature;

inputting the first feature, the second feature and the third feature into a generator in the expression migration model to obtain a third face image, including:

and inputting the first feature and the fourth feature into a generator in the expression migration model to obtain a third face image.

5. The method of claim 4, wherein if the third feature is a local feature and the second feature includes a global feature and a local feature, the inputting the second feature and the third feature into the alignment module in the expression migration model obtains a fourth feature, comprising:

inputting the local features and the third features in the second features into an alignment module in the expression migration model to obtain fourth features;

inputting the first feature and the fourth feature into the generator to obtain a third face image, including:

And inputting the global feature and the fourth feature in the first feature, the second feature into the generator to obtain a third face image.

6. A method according to claim 3, characterized in that the method further comprises:

obtaining a second model diagram by using the non-expression coefficients in the first facial deformation coefficients;

calculating optical flow information between the first model map and the second model map;

performing deformation processing on the first feature by utilizing the optical flow information to obtain a fifth feature;

and inputting the first feature, the fifth feature, the second feature and the third feature into a generator in the expression migration model to obtain a third face image.

7. The method according to claim 1, wherein the method further comprises:

correcting the expression migration model by using a first discrimination loss and a second discrimination loss, wherein the first discrimination loss is the similarity loss between the first face image and the third face image, and the second discrimination loss is the similarity loss between the expression coefficient of the third face image and the expression coefficient of the second face image.

8. An expression migration apparatus, characterized in that the apparatus comprises:

9. An electronic device, the device comprising: a processor and a memory;

the memory is used for storing instructions or computer programs;

the processor for executing the instructions or computer program in the memory to cause the electronic device to perform the method of any of claims 1-7.

10. A computer readable storage medium having instructions stored therein which, when executed on a device, cause the device to perform the method of any of claims 1-7.